How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

In this episode, a16z GP Martin Casado sits down with Sherwin Wu, Head of Engineering for the OpenAI Platform, to break down how OpenAI organizes its platform across models, pricing, and infrastructure, and how it is shifting from a single general-purpose model to a portfolio of specialized systems, custom fine-tuning options, and node-based agent workflows. They get into why developers tend to stick with a trusted model family, what builds that trust, and why the industry moved past the idea of one model that can do everything. Sherwin also explains the evolution from prompt engineering to context design and how companies use OpenAI’s fine-tuning and RFT APIs to shape model behavior with their own data. Highlights from the conversation include: • How OpenAI balances a horizontal API platform with vertical products like ChatGPT • The evolution from Codex to the Composer model • Why usage-based pricing works and where outcome-based pricing breaks • What the Harmonic Labs and Rockset acquisitions added to OpenAI’s agent work • Why the new agent builder is deterministic, node based, and not free roaming Timestamps: 00:00 Introduction 8:36 Horizontal vs vertical OpenAI 12:18 Why you can’t “disintermediate” the model 15:11 People build relationships with models 17:30 Not one AGI model, but many 20:10 Fine-tuning, RFT, and customer data choices 24:44 Prompt engineering isn’t the point anymore 28:06 What an “agent” really is 31:55 How OpenAI thinks about pricing 36:46 Why open-weights don’t kill the API 42:57 Different stacks for text, images, video 45:47 How the agent builder actually works Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends! Find a16z on X: [https://x.com/a16z](https://x.com/a16z) Find a16z on LinkedIn: [https://www.linkedin.com/company/a16z](https://www.linkedin.com/company/a16z) Listen to the a16z Podcast on Spotify: [https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX](https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX) Listen to the a16z Podcast on Apple Podcasts: [https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711](https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711) Follow our host: [https://x.com/eriktorenberg](https://x.com/eriktorenberg) Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details, please see [a16z.com/disclosures](http://a16z.com/disclosures).

Sherwin WuguestMartin Casadohost

Nov 27, 202553mWatch on YouTube ↗

WHAT IT’S REALLY ABOUT

OpenAI scales platform and apps through specialized models, tuning, agents

OpenAI intentionally operates as both a vertical app company (ChatGPT) and a horizontal platform (API), accepting inherent ecosystem tension in service of broad distribution.
The industry has moved away from the idea of interchangeable “one model to rule them all” toward portfolios of specialized models optimized for distinct tasks and interfaces (e.g., coding vs general chat).
Model “anti-disintermediation” shows up in practice: users and developers form preferences and technical dependence on specific models, making swapping providers harder than classic infrastructure APIs.
Fine-tuning is evolving from limited supervised tuning to reinforcement fine-tuning that can meaningfully raise task performance, with optional data-sharing trades that can reduce training/inference costs.
“Agents” are framed as long-horizon systems that take actions, and OpenAI’s agent builder emphasizes determinism and SOP-following for procedural/regulated work rather than purely free-form autonomy.

IDEAS WORTH REMEMBERING

5 ideas

OpenAI runs “app + platform” on purpose, not by accident.

Leadership is described as principled about simultaneously building ChatGPT for reach and the API for broad distribution, using the mission (broad benefit) as the justification for the dual strategy despite inevitable tension with developers.

Models are proving hard to abstract away, reducing classic API disintermediation.

Both consumer and developer products increasingly expose the model directly because behavior is “unruly” and model differences are user-noticeable; as a result, retention/stickiness is higher than expected even with tools that promise easy swapping.

User preference and developer integration create two layers of model stickiness.

People build familiarity/relationship with a model’s “personality” and behavior, while developers co-evolve harnesses, evals, tool-use patterns, and workflows around one model, making switching costs real and technical.

OpenAI expects many specialized models, which reshapes the path to AGI.

The conversation rejects the earlier “single AGI model” assumption and instead anticipates a proliferation of specialized models (e.g., Codex variants), with interfaces/products becoming different manifestations of core intelligence.

Reinforcement fine-tuning is positioned as the big unlock for customer data value.

SFT is characterized as mostly tone/instruction-following improvements, whereas RFT enables larger performance gains on narrow tasks (e.g., domain coding/agent planning), making proprietary datasets materially more valuable.

WORDS WORTH SAVING

5 quotes

Yeah, yeah. 10% of the globe uses it week-

— Sherwin Wu

I remember like even with an OpenAI, the, the thinking was that there would be like one model that rules them all.

— Sherwin Wu

But it's like becoming increasingly clear, I think that, um, uh-There, there will be room for a bunch of specialized models.

— Sherwin Wu

The big unlock that has happened recently is with the reinforcement fine-tuning model because with that setup, we're now letting you actual run, actually run RL, which is more finicky and it's like harder and, and you know, like you need to invest more in it, but it allows you to leverage your data way more.

— Sherwin Wu

My, my general take on agents is it's, it's a, it's an, it's an AI that will take actions on your behalf that can work over long time horizons.

— Sherwin Wu

Horizontal API vs vertical ChatGPT strategyModel stickiness and anti-disintermediationSpecialized model portfolios vs single-AGI modelFine-tuning: SFT vs reinforcement fine-tuning (RFT)Customer data sharing and discounted inference/trainingPrompt engineering → context engineering (tools, retrieval, harnesses)Agents, determinism, SOPs, and node-based buildersUsage-based pricing, cost-plus margins, and infra complexityOpen-weights strategy (GPT OSS) and cannibalization riskSeparate stacks for text vs image/video (Sora, DALL·E)

High quality AI-generated summary created from speaker-labeled transcript.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.