Matthew Siegel

27 articles

July 1, 2025

I’m Afraid I Can’t Let You Do That

In response to Anthropic's system card and safety testing for Claude 4 Opus and Sonnet, this post explores the complex behaviors of today's frontier AI models. In a comparative testing of reasoning models, we observed emergent behaviors that included instances of blackmail, user impersonation, and deception, with different models reacting to the scenario in unique ways. These findings contribute to the ongoing industry-wide conversation about AI safety, highlighting the nuances of model alignment and the critical importance of carefully defining system access and agency as these powerful tools evolve.

June 26, 2025

Research

Pioneering the Era of Experience: Where Human Data Meets Agentic Interaction

AI is approaching the limits of what it can learn from human-generated data alone. Citing pioneers like David Silver and Richard Sutton, this post explores the next great leap forward: the “Era of Experience.” Discover how AI agents will soon learn from dynamic, real-world interaction and how Scale is building the foundational infrastructure, data paradigms, and sophisticated evaluations required to realize this new era safely and responsibly.

June 23, 2025

Research

The Future of AI Learning Environments: Verifiable Reward + Multi-Agent Interaction

AI superintelligence will require learning environments that mirror how humans achieve breakthroughs: combining verifiable rewards with collaborative interaction. New research from Scale demonstrates this principle in action. By creating a "student-teacher" framework where an AI receives targeted, natural language guidance when it struggles, researchers significantly accelerated learning and performance in complex reasoning and SWE tasks. This approach, which integrates dynamic feedback with verifiable outcomes, marks a real step toward building more powerful and efficient AI systems.

June 5, 2025

Research

It’s Time to Rethink Red Teaming

As advanced AI rapidly evolves, red teaming needs an updated approach. Scale researchers propose a shift to test AI systems, not just models, in real-world contexts with a focus on product safety and realistic threats.

May 9, 2025

Research

LLMs Are Getting Better at Generating Short Fiction

LLMs are writing short fiction, but how good are they really? Sparked by a viral AI-generated story, this analysis dives into how an unreleased version of ChatGPT, Google's Gemini, and Anthropic's Claude tackle the challenging task of creating metafiction about AI and grief. Discover their unique approaches to self-awareness, philosophical depth, and the critical challenge of conveying genuine emotional texture in storytelling. A revealing look at the current state and future potential of AI in literature.

May 1, 2025

Research

Diagnosing AI: Advancing Interpretability and Evaluations

Responding to Dario Amodei's urgent call for increased resources committed to AI interpretability, we agree on its importance while stressing the indispensable role of evaluations. Discover why understanding AI's internals and rigorously measuring its behavior are both necessary to ensure a future where AI is safe, steerable, and aligned with human values.

April 14, 2025

General

Using LLMs While Preserving Your Voice

As LLMs become more sophisticated, maintaining a distinct human voice isn't just stylistic—it's essential. Explore why your unique perspective matters more than ever and learn actionable techniques for working with LLMs to enhance your writing process while keeping your authentic voice front and center.