How OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning

In this episode, a16z GP Martin Casado sits down with Sherwin Wu, Head of Engineering for the OpenAI Platform, to break down how OpenAI organizes its platform across models, pricing, and infrastructure, and how it is shifting from a single general-purpose model to a portfolio of specialized systems, custom fine-tuning options, and node-based agent workflows. They get into why developers tend to stick with a trusted model family, what builds that trust, and why the industry moved past the idea of one model that can do everything. Sherwin also explains the evolution from prompt engineering to context design and how companies use OpenAI’s fine-tuning and RFT APIs to shape model behavior with their own data. Highlights from the conversation include: • How OpenAI balances a horizontal API platform with vertical products like ChatGPT • The evolution from Codex to the Composer model • Why usage-based pricing works and where outcome-based pricing breaks • What the Harmonic Labs and Rockset acquisitions added to OpenAI’s agent work • Why the new agent builder is deterministic, node based, and not free roaming Timestamps: 00:00 Introduction 8:36 Horizontal vs vertical OpenAI 12:18 Why you can’t “disintermediate” the model 15:11 People build relationships with models 17:30 Not one AGI model, but many 20:10 Fine-tuning, RFT, and customer data choices 24:44 Prompt engineering isn’t the point anymore 28:06 What an “agent” really is 31:55 How OpenAI thinks about pricing 36:46 Why open-weights don’t kill the API 42:57 Different stacks for text, images, video 45:47 How the agent builder actually works Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends! Find a16z on X: [https://x.com/a16z](https://x.com/a16z) Find a16z on LinkedIn: [https://www.linkedin.com/company/a16z](https://www.linkedin.com/company/a16z) Listen to the a16z Podcast on Spotify: [https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX](https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX) Listen to the a16z Podcast on Apple Podcasts: [https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711](https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711) Follow our host: [https://x.com/eriktorenberg](https://x.com/eriktorenberg) Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details, please see [a16z.com/disclosures](http://a16z.com/disclosures).

Sherwin WuguestMartin Casadohost

Nov 28, 202553mWatch on YouTube ↗

CHAPTERS

Sherwin Wu’s role at OpenAI: developer platform, API, and government deployments
Sherwin explains he leads engineering for OpenAI’s developer platform, with the API as the core product but also including special deployments like government work. He highlights a local deployment at Los Alamos National Labs as an example of how OpenAI runs models in highly constrained environments.
Career background: Opendoor pricing ML and Quora’s newsfeed/ML culture
Sherwin traces his background from Opendoor’s housing pricing models to Quora’s newsfeed ranking, plus MIT computer science. He frames these experiences as foundational for thinking about applied ML, operations, and product/infrastructure tradeoffs.
Running both a horizontal platform and vertical apps: the OpenAI “two-business” tension
The conversation turns to how OpenAI simultaneously serves developers via an API (horizontal) and end users via first-party apps like ChatGPT (vertical). Sherwin acknowledges inherent platform/app tension but says mission alignment and rapid growth reduce internal conflict.
Why models resist abstraction: “anti-disintermediation” dynamics
Martin proposes that foundation models are hard to hide behind layers of software, making classic API disintermediation difficult. Sherwin agrees: users and developers notice model differences, and the model itself becomes sticky in a way traditional infrastructure isn’t.
People form relationships with models—and developers hard-code around them
They dig into why switching models is difficult: it’s both emotional/user-facing and technical. Sherwin notes end users get accustomed to a model, while builders create evaluation harnesses, tools, and workflows tuned to a specific model’s behavior.
From “one AGI model” to a portfolio of specialized models
Both reflect on the industry’s shift away from a single, all-purpose AGI model toward many specialized models. Sherwin argues this proliferation is not necessarily bad; it reflects unexpected paths toward capability and broadens the ecosystem.
Fine-tuning evolution: from SFT to Reinforcement Fine-Tuning (RFT) as the unlock
Sherwin explains why fine-tuning matters: companies have large proprietary datasets and want deeper customization than prompting or basic RAG. He contrasts earlier supervised fine-tuning (mostly tone/instruction tweaks) with reinforcement fine-tuning, which can drive major capability improvements on specific tasks.
Customer data sharing tradeoffs: incentives, discounts, and control
They discuss whether customers will share training data back with OpenAI. Sherwin describes pilots where customers can receive discounted inference or free training in exchange for sharing data, while emphasizing it remains a customer choice.
Prompt engineering isn’t dead—context engineering becomes the real lever
Sherwin argues the earlier belief that prompting would disappear was wrong, but the focus has shifted. Instead of clever phrasing, the differentiator is increasingly “context engineering”: selecting tools, retrieving the right information, and structuring inputs for reasoning models.
What an agent is: long-horizon action-taking intelligence across interfaces
Sherwin defines agents as AIs that take actions on a user’s behalf over long time horizons. He emphasizes OpenAI treats agents not as a separate “modality” but as a capability that manifests across products (ChatGPT, Codex/CLI, API) as interfaces to the same underlying intelligence.
How OpenAI thinks about pricing: usage-based, cost-plus discipline, and outcome pricing skepticism
Sherwin describes why the API remains usage-based: it matches real consumption and aligns with high, variable compute costs. They discuss outcome-based pricing as a concept, but Sherwin notes it often correlates with usage when test-time compute increases—making usage pricing a workable proxy.
Open-weights strategy: why it doesn’t kill the API business
They explore OpenAI’s move into open-weights (“open source” in practice) and why cannibalization is limited. Sherwin argues customers and use cases differ, and that serving large frontier models requires elite inference infrastructure and tight training–inference feedback loops that most users can’t replicate.
Different stacks for text, images, and video—and how the API spans them
Sherwin explains that while the API provides access to text, image, and video generation, the underlying inference stacks differ substantially. He notes OpenAI’s image/video org operates with significant independence, enabling parallel roadmaps and optimizations (e.g., Sora), while some platform infrastructure is shared at the API layer.
How the Agent Builder works: determinism, SOP-driven workflows, and constrained autonomy
Sherwin addresses why OpenAI’s Agent Builder uses node-based, more deterministic workflows despite “AGI should just do it” intuition. He argues many enterprise tasks are procedural (SOP-based) and require controllability for policy, compliance, and reliability; structured agent graphs help enforce constraints while still leveraging model reasoning.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

Sherwin Wu’s role at OpenAI: developer platform, API, and government deployments

Career background: Opendoor pricing ML and Quora’s newsfeed/ML culture

Running both a horizontal platform and vertical apps: the OpenAI “two-business” tension

Why models resist abstraction: “anti-disintermediation” dynamics

People form relationships with models—and developers hard-code around them

From “one AGI model” to a portfolio of specialized models

Fine-tuning evolution: from SFT to Reinforcement Fine-Tuning (RFT) as the unlock

Customer data sharing tradeoffs: incentives, discounts, and control

Prompt engineering isn’t dead—context engineering becomes the real lever

What an agent is: long-horizon action-taking intelligence across interfaces

How OpenAI thinks about pricing: usage-based, cost-plus discipline, and outcome pricing skepticism

Open-weights strategy: why it doesn’t kill the API business

Different stacks for text, images, and video—and how the API spans them

How the Agent Builder works: determinism, SOP-driven workflows, and constrained autonomy

Get more out of YouTube videos.