Lenny's PodcastJason Droege: Why AI still needs someone to dig up the road
Through Meta's $14B stake and expert networks of doctors, engineers, and PhDs: production AI now takes 6 to 12 months, and Uber Eats lessons still apply.
CHAPTERS
AI hype vs enterprise reality: why “on-the-ground” work takes time
A cold open on why AI can feel overpromised in enterprises despite rapid headline progress. Jason argues robust automation requires unglamorous operational work and longer timelines than most expect.
- •Enterprise AI automation typically needs 6–12 months to become reliable
- •Headlines vs reality: major tech shifts require massive execution work
- •Analogy: broadband rollout required digging up roads and laying cables
- •Expectation-setting for what “delivery” should look like in practice
Meet Jason Droege: new CEO of Scale AI amid Meta’s $14B investment
Lenny introduces Jason’s background (Scour cofounder, Uber Eats creator) and frames the episode around how frontier models get smarter through expert input. He previews the Scale–Meta context and the evolution from generic labeling to expert-driven improvement.
- •Jason’s first major interview since becoming Scale AI CEO
- •Meta invested $14B+ for 49% non-voting stake; Alex Wang moved to Meta
- •Episode theme: how expert feedback and “what good looks like” improves models
- •Scale’s role in training data, labeling, and evals for frontier labs
Dorm-room startup Scour: ‘everything is negotiable’ and surviving legal reality
Jason recounts founding Scour with Travis Kalanick and learning early that business rules are malleable and incentives drive outcomes. He describes chaotic fundraising dynamics and the brutal wake-up call of being sued by the entertainment industry.
- •Lesson: in startups, ‘everything is negotiable’—there’s no single playbook
- •Early fundraising: shifting terms, pressure tactics, and incentive alignment
- •Scour’s product and how it was used for finding free content
- •RIAA/MPAA lawsuit (quarter-trillion claim) and eventual settlement
Scale AI today: independence after the Meta deal, structure, and growth
Jason clarifies what the Meta transaction did and didn’t change. He emphasizes Scale remains independent, governance is largely unchanged, and the company has two large revenue lines that continued growing post-deal.
- •Meta holds 49% non-voting stock; no new board seat; governance unchanged
- •No preferential Meta access; privacy and data security controls remain
- •Only ~15 people moved to Meta; Scale ~1,100 employees
- •Two major businesses, each at hundreds of millions in revenue; growth continued
From ‘basic’ labeling to expert work: why competitor narratives are misleading
Jason traces Scale’s evolution from autonomous vehicle labeling to computer vision and now GenAI. He argues the market increasingly needs sophisticated, expert-led tasks—and Scale has been operating at that frontier as model needs changed.
- •Scale’s throughline since 2016: adapt data to what models need next
- •GenAI increased task complexity from preference ranking to multi-hour expert work
- •Examples: top web developers building/annotating sites; medical experts explaining nuanced topics
- •Network stats: ~80% have bachelor’s+; ~15% have PhDs in the contributor base
Recruiting experts at scale: referrals, campuses, and ‘great contributor experience’
Lenny probes the hardest operational bottleneck: finding and retaining scarce experts. Jason explains Scale’s multi-pronged sourcing strategy and why referrals and community dynamics outperform purely scaled recruiting channels.
- •Experts are difficult to source; requires many parallel tactics
- •Primary channel: expert referrals driven by good experience and meaningful work
- •Campus programs: engaging professors/students to recruit specialized talent
- •Traditional sourcing (e.g., LinkedIn) helps, but grassroots channels yield best talent
Reinforcement learning environments: training agents to operate in real systems
The conversation shifts to RL and the rise of ‘environments’—sandboxes where agents learn to achieve goals. Jason explains why generalizability is the key challenge, given endless permutations across enterprise software and workflows.
- •RL environments help agents learn goal completion inside realistic sandboxes
- •Example: agents navigating a configurable Salesforce instance reliably
- •Key question: which tasks/data generalize vs exploding into trillions of combinations
- •Scale’s role: produce high-value, generalizable data for agent usefulness
What the data actually looks like: annotations, decisions, and enterprise judgment
Lenny asks for concrete examples of training data beyond simplistic labels. Jason explains how data can include artifacts plus rationale, and gives an enterprise healthcare example that highlights ‘digitizing judgment’ as the emerging bottleneck.
- •Training artifacts can include outputs plus ‘why’ behind decisions (not just final code)
- •Data can cover correct solutions, debugging cases, and decision rationales
- •Healthcare example: summarizing 200–300 pages into key diagnostic considerations
- •Enterprises increasingly must label their own domain-specific judgment to go beyond off-the-shelf models
Humans in the loop: timelines, adaptability, and why mission-critical AI is hard
Lenny challenges the long-term need for human experts; Jason argues labeling continually reinvents itself as model needs evolve. He also pushes back on near-term ‘white-collar apocalypse’ timelines, emphasizing how difficult high-accuracy, mission-critical automation is.
- •Data labeling shifts as domains mature; new bottlenecks continuously appear
- •Claim: if no new human knowledge is needed, the world is in an ‘unfathomable’ state
- •Skepticism about immediate labor disruption; technology and adoption take time
- •Enterprise meaning varies by organization; achieving accuracy/reliability is the hard part
Evals: defining ‘what good looks like’ (especially for enterprise and government)
Jason explains that a large share of Scale’s work is evals, particularly outside frontier labs. Evals create benchmarks for reliability and decision quality in probabilistic systems, where ‘good’ can matter more than a single ‘correct’ output.
- •Evals establish benchmarks for performance and reliability
- •Enterprise/government implementations are often eval-heavy
- •In probabilistic systems, ‘good’ guidance can be more practical than binary correctness
- •AI excels when it meaningfully improves low-to-mid accuracy workflows; harder at squeezing the last 2%
Where models go next: from knowing to doing, and why change management will dominate
Jason shares his view of the next 2–3 years: models move from knowledge to action via agents operating in environments. As capabilities rise, organizational change management and policy will increasingly become the limiting factors.
- •Major trend: models shift from ‘knowing things’ to ‘doing things’
- •Agents must navigate apps, workflows, and real-world constraints
- •Wide variance in forecasts because agent training is early-stage
- •Within 2–3 years, tech may pressure policy and change management to catch up
Building Uber Eats: customer incentives, urgency, and unit economics as product strategy
Switching to product-building lessons, Jason describes how Uber chose food delivery after exploring many ideas. He shares how deep economic understanding (and independent triangulation) shaped pricing, marketplace design, and ultimately rapid scaling.
- •Being close to customers = understanding incentives, urgency, and constraints (not literal requests)
- •Early research: reverse-engineered restaurant unit economics when owners wouldn’t share details
- •Tried multiple bets (convenience trucks, grocery, point-to-point delivery) before Eats emerged
- •Uber Eats scaled from $0 to $20B in ~4.5 years; later pushed toward ~$80B run rate
McDonald’s deal story: stubbornness, global rollout chaos, and marketplace pragmatism
Jason recounts initially rejecting McDonald’s to focus on local restaurants, then reversing course as the opportunity became undeniable. The partnership became a major growth lever, though onboarding a massive global chain created operational mayhem.
- •Jason initially said ‘no’ to McDonald’s due to strategic positioning and “vibe”
- •Team convinced him; eventual exclusive relationship brought huge customer volume
- •Solved low basket-size economics pragmatically (radius, pricing levers, operational tweaks)
- •Global rollout in ~6 months created intense operational strain for a young org
Independent thinking, high bars, and gross margin as a fast filter for business quality
Jason explains why independent insight matters in a crowded founder market and how he evaluates new ventures. He describes gross margin as a coarse but powerful lens to reveal differentiation, competitive dynamics, and inevitable margin compression.
- •Founding requires ‘alpha’: an insight others don’t have and the willingness to question assumptions
- •High bar: focus on business models/markets that can reach massive scale (network effects, lock-in)
- •Gross margin as a litmus test to expose alternatives, defensibility, and compression risk
- •Costco/Walmart example: low margins can work when scale and habit formation create moats
Risk, survival, and teams: ‘not losing’ before winning; hiring for adaptable operators
Jason argues that enduring success depends on surviving volatility and making asymmetric decisions, not just “going for it.” He closes with hiring philosophy: optimize for curiosity, collaboration, and leadership—while recognizing a small set of roles require exact experience.
- •‘Not losing is a precursor to winning’: survival enables eventual market timing and pivots
- •Over-indexing on risk during hype cycles can jeopardize the enterprise
- •Hiring: most roles should emphasize curiosity/problem-solving, humility/collaboration, and leadership
- •Compose teams as complementary ‘organisms’; continuity can outperform constant “top talent” swaps
AI Corner + lightning round: AI as tutor, summarizing docs, favorite tools and mottos
In the closing segment, Jason shares how he personally uses AI to learn quickly and extract signal from internal documents. The lightning round covers influential books, media, a favorite AI product, his core motto, and his surprisingly frequent McDonald’s orders.
- •AI as a tutor via voice mode to keep up with fast-moving concepts
- •Using AI to identify the most important points in internal documents
- •Tool highlight: VEO3 generating video from a photographed script page
- •Motto: ‘The end is never the end’; plus books and personal favorites