Before we ship a Claude model, these teams try to break it.

They don't just test the latest Claude models, they put them through the wringer. Working at the Frontier goes inside that process: what they build, what they push back on, and how their feedback shapes what ships.

May 28, 20263mWatch on YouTube ↗

EVERY SPOKEN WORD

3 min read · 580 words

0:00 – 0:01
Intro
1. SPSpeaker
  [instrumental music]
0:01 – 0:20
Early access: Customers try to break Claude before it ships
1. SPSpeaker
  Before a new Claude model ships, a small group of customers is already testing it, breaking it, and shaping what ships with it. We sat down to see what they are learning.
0:20 – 0:36
The adrenaline of a new model drop: “A storm’s ahead”
1. SPSpeaker
  When you get something new from Anthropic, what is that energy like?
2. SPSpeaker
  We know a storm's ahead, but there's something exciting about a storm because it's all hands on deck.
3. SPSpeaker
  Yeah, it feels like we're moving at the speed of light.
4. SPSpeaker
  That's like getting the call and jumping from whatever you're working on. We have something new, let's, let's figure out what it's like.
0:36 – 0:44
Recalibrating assumptions: “The grounding has changed”
1. SPSpeaker
  The moment we get a new model from Anthropic, we realize the grounding has changed.
0:44 – 1:11
Building at the frontier: fun, learning, and responsibility
1. SPSpeaker
  What's it like to work at a company that's helping to shape the frontier?
2. SPSpeaker
  It's insanely fun.
3. SPSpeaker
  All of us are just in learning mode.
4. SPSpeaker
  This moment just feels like a generational opportunity for anyone in this industry.
5. SPSpeaker
  I feel very lucky and also very responsible. We need to continue to push the envelope, continue innovating, be more secure, and making things easier to build with. In a way, I love that I can unlock a new class of developers and builders.
1:11 – 1:19
First tests: automated evals as the baseline gate
1. SPSpeaker
  What's the first thing you throw at a new model?
2. SPSpeaker
  The very first thing is we will start automated evals, uh, just so that they start running in the background.
1:19 – 1:41
Ambitious real-world target: drafting an S-1 with agentic workflows
1. SPSpeaker
  One use case that is kind of pipe dream that's easy to point to as a particularly complex legal task is drafting an S1. And now with agentic capabilities where these models can go out and find information that they need, synthesize it, edit documents, we're getting to larger and larger chunks of the S1 that you can kind of just send the model on its way to do.
1:41 – 1:54
Breakthrough feel: swapping the model makes the agent “just work”
1. SPSpeaker
  Just by swapping in that one model, every question I ever wanna ask it started getting answered. You know, it went from this agent can sometimes answer questions, sometimes gets stuck, to, "Oh my God, it is answering every question quickly and accurately."
1:54 – 2:01
Measured gains: success-rate dashboards and step-function improvements
1. SPSpeaker
  The dashboard of the testing agent success rate has just increased-
2. SPSpeaker
  [laughs]
3. SPSpeaker
  ... like by, I think it's 20%.
4. SPSpeaker
  Cool. [laughs]
2:01 – 2:15
Reading the tea leaves: when previously-failing evals start passing
1. SPSpeaker
  Things that don't work today are, like, the best sign for here's what the next models are gonna be way better at. Seeing evals that have never worked start working, and then start working consistently, this model is gonna be, like, something special.
2:15 – 2:35
Customer–Anthropic collaboration: high-touch iteration and trust
1. SPSpeaker
  What's it like working with Anthropic?
2. SPSpeaker
  It feels like I have a conversation with you almost every other day. The engineers on the team, I feel like are, are almost on the same team.
3. SPSpeaker
  It's less like we're just buying something [laughs] from you.
4. SPSpeaker
  Yeah, yeah.
5. SPSpeaker
  More like we can kind of build, build with you.
6. SPSpeaker
  We have a very high trust bar that anything you publish is not gonna be slop.
2:35 – 3:09
What the frontier feels like: dazzling, compounding, and wave-riding
1. SPSpeaker
  What is one word or phrase that characterizes what it feels like to actually be building at the frontier?
2. SPSpeaker
  Dazzling. [laughs]
3. SPSpeaker
  Yeah.
4. SPSpeaker
  If that makes sense. It can be kind of blinding at times.
5. SPSpeaker
  Yeah.
6. SPSpeaker
  [laughs] Just the, uh, brightness and opportunity and excitement.
7. SPSpeaker
  Compounding. We get the latest tools, which leads to our customers getting a better product, which leads to us getting better products.
8. SPSpeaker
  You have a big wave under you-
9. SPSpeaker
  Yeah
10. SPSpeaker
  ... that is changing the way your user is working and changing the way you are working.
11. SPSpeaker
  Yeah.
12. SPSpeaker
  And you have to keep your balance, and you know there are bigger waves coming. [instrumental music]