Blog

Company Updates & Technology Articles

December 15, 2025

Beyond the Algorithm: The Economic Impact of the Data Annotation Industry

A first of its kind study from Oxford Economics shows how data annotation is an important part of AI innovation and creating flexible earning opportunities for people across the US. Meet the contributors behind the industry and learn about their impact on the economy.

December 12, 2025

Government

Setting the Global AI Standard: Our Response to the American AI Exports Program

Scale AI outlines how exporting the full U.S. AI tech stack can secure global leadership, standards, and economic competitiveness.

December 9, 2025

Engineering

The Shoggoth of AI Risk

We often discuss "AI Risk" as if it were a single, shapeless shoggoth. But the truth is that risk comes from specific sources, each requiring a different defense. This article dismantles the monolith, categorizing the six distinct vectors of danger: Adversaries, Unforced Errors, Misaligned Goals, Dependencies, Societal Impact, and Emergent Behavior. Learn to distinguish between these threats so you can move from panic to precise preparation.

November 25, 2025

Research

Crumbling Under Pressure: PropensityBench Reveals AI’s Weaknesses

To measure the propensity of agents to make unsafe choices, Scale, the University of Maryland, and other collaborators developed PropensityBench. This benchmark simulates real-world pressure by allowing agents to choose between a safe approach that consistently fails and a functional, harmful shortcut, revealing their true inclinations. The benchmark reveals that agent safety compromises significantly under pressure.

November 24, 2025

Engineering

Foundations of Agency for the Agentic Era

The next generation of AI agents is shifting from passive workers that receive user commands and generate outputs to active agents that plan, act, observe, and improve on their own. Agents now choose how to complete a task, which tools to use, and whom (or which agent) to collaborate with. LLMs didn’t invent agency, but they democratized it by turning frontier-level reasoning into a simple API call, letting teams compose complex systems from simple building blocks.

November 20, 2025

Company

Building the Human Frontier Collective: Where Experts Shape the Future of AI

The Human Frontier Collective is a premier community of PhDs, academics, and industry leaders advancing AI through research, collaboration, and shared expertise.

November 20, 2025

Research

SEAL Showdown: Insights from GPT-5

Today, we add several new models to Showdown. A surprising finding is that users consistently rank GPT-5 significantly lower than other models. In this blog post, we share our preliminary analysis of GPT-5's ranking on Showdown, where we examine the effect of thinking effort, task type, and evaluation setting.

November 20, 2025

Engineering

Agentex Tutorial: How to Build and Scale Long-Running Enterprise Agents

Earlier this week, we open-sourced Agentex to enable long-running enterprise agents. Today, we’re releasing a tutorial we created with Temporal that shows how to build a long-running procurement agent. It’s a concrete example of an agent that manages extended workflows, responds to external signals, and escalates to humans only when needed.

November 19, 2025

Research

The Limits of Data Filtering in Bio-Foundation Models

In collaboration with Princeton University, UMD, SecureBio, and the Center for AI Safety, we introduce BioRiskEval, the first comprehensive framework for assessing dual-use risks in open-weight bio-foundation models. Our stress tests on the Evo 2 model reveal a critical vulnerability: dangerous knowledge removed via data filtering often persists in hidden layers or can be rapidly restored with minimal compute. These findings challenge the reliance on simple data curation and underscore the urgent need for "defense-in-depth" strategies to secure the future of biological AI.

November 18, 2025

General

What Enterprises Can Learn from Public GenAI Failures | Human in the Loop Episode 15

Today on the podcast, the team is talking about what happens when enterprise GenAI goes wrong. The team digs into recent public AI failures, reviewing the impact of each, whether they could have been prevented, and if so, how.