a16zHow OpenAI Builds for 800 Million Weekly Users: Model Specialization and Fine-Tuning
CHAPTERS
Sherwin Wu’s role at OpenAI: developer platform, API, and government deployments
Sherwin explains he leads engineering for OpenAI’s developer platform, with the API as the core product but also including special deployments like government work. He highlights a local deployment at Los Alamos National Labs as an example of how OpenAI runs models in highly constrained environments.
Career background: Opendoor pricing ML and Quora’s newsfeed/ML culture
Sherwin traces his background from Opendoor’s housing pricing models to Quora’s newsfeed ranking, plus MIT computer science. He frames these experiences as foundational for thinking about applied ML, operations, and product/infrastructure tradeoffs.
Running both a horizontal platform and vertical apps: the OpenAI “two-business” tension
The conversation turns to how OpenAI simultaneously serves developers via an API (horizontal) and end users via first-party apps like ChatGPT (vertical). Sherwin acknowledges inherent platform/app tension but says mission alignment and rapid growth reduce internal conflict.
Why models resist abstraction: “anti-disintermediation” dynamics
Martin proposes that foundation models are hard to hide behind layers of software, making classic API disintermediation difficult. Sherwin agrees: users and developers notice model differences, and the model itself becomes sticky in a way traditional infrastructure isn’t.
People form relationships with models—and developers hard-code around them
They dig into why switching models is difficult: it’s both emotional/user-facing and technical. Sherwin notes end users get accustomed to a model, while builders create evaluation harnesses, tools, and workflows tuned to a specific model’s behavior.
From “one AGI model” to a portfolio of specialized models
Both reflect on the industry’s shift away from a single, all-purpose AGI model toward many specialized models. Sherwin argues this proliferation is not necessarily bad; it reflects unexpected paths toward capability and broadens the ecosystem.
Fine-tuning evolution: from SFT to Reinforcement Fine-Tuning (RFT) as the unlock
Sherwin explains why fine-tuning matters: companies have large proprietary datasets and want deeper customization than prompting or basic RAG. He contrasts earlier supervised fine-tuning (mostly tone/instruction tweaks) with reinforcement fine-tuning, which can drive major capability improvements on specific tasks.
Customer data sharing tradeoffs: incentives, discounts, and control
They discuss whether customers will share training data back with OpenAI. Sherwin describes pilots where customers can receive discounted inference or free training in exchange for sharing data, while emphasizing it remains a customer choice.
Prompt engineering isn’t dead—context engineering becomes the real lever
Sherwin argues the earlier belief that prompting would disappear was wrong, but the focus has shifted. Instead of clever phrasing, the differentiator is increasingly “context engineering”: selecting tools, retrieving the right information, and structuring inputs for reasoning models.
What an agent is: long-horizon action-taking intelligence across interfaces
Sherwin defines agents as AIs that take actions on a user’s behalf over long time horizons. He emphasizes OpenAI treats agents not as a separate “modality” but as a capability that manifests across products (ChatGPT, Codex/CLI, API) as interfaces to the same underlying intelligence.
How OpenAI thinks about pricing: usage-based, cost-plus discipline, and outcome pricing skepticism
Sherwin describes why the API remains usage-based: it matches real consumption and aligns with high, variable compute costs. They discuss outcome-based pricing as a concept, but Sherwin notes it often correlates with usage when test-time compute increases—making usage pricing a workable proxy.
Open-weights strategy: why it doesn’t kill the API business
They explore OpenAI’s move into open-weights (“open source” in practice) and why cannibalization is limited. Sherwin argues customers and use cases differ, and that serving large frontier models requires elite inference infrastructure and tight training–inference feedback loops that most users can’t replicate.
Different stacks for text, images, and video—and how the API spans them
Sherwin explains that while the API provides access to text, image, and video generation, the underlying inference stacks differ substantially. He notes OpenAI’s image/video org operates with significant independence, enabling parallel roadmaps and optimizations (e.g., Sora), while some platform infrastructure is shared at the API layer.
How the Agent Builder works: determinism, SOP-driven workflows, and constrained autonomy
Sherwin addresses why OpenAI’s Agent Builder uses node-based, more deterministic workflows despite “AGI should just do it” intuition. He argues many enterprise tasks are procedural (SOP-based) and require controllability for policy, compliance, and reliability; structured agent graphs help enforce constraints while still leveraging model reasoning.
Get more out of YouTube videos.
High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.
Add to Chrome