Skip to content
No PriorsNo Priors

No Priors Ep. 26 | With Weights & Biases CEO Lukas Biewald

How are ML developer tools helping to advance our capabilities? Lukas Biewald, CEO of Weights & Biases, joins Sarah Guo and Elad Gil this week on No Priors. Lukas explores the impact of ML in various industries like gaming, AgTech, and fintech through his insightful perspective. He discusses the impact of LLMs, puts them in context of the evolution of ML engineering over the past decade and a half, and tells the backstory of Weights & Biases' success. He gives advice for aspiring AI company founders, placing emphasis on customer feedback and using insecurity as a vehicle for better customer discovery. Prior to founding Weights & Biases, Lukas attacked the problem of data collection for model training as the Founder of Figure Eight, which he sold in 2019. He holds an MS in Computer Science and a BS in Mathematics from Stanford University. 00:00 - Lukas Biewald's Journey in AI 08:16 - Startup Evolution and Machine Learning 18:54 - Open Source Models Implications and Adoption 29:54 - ML Impact in Various Industries 40:27 - Advice for AI Company Founders

Elad GilhostLukas BiewaldguestSarah Guohost
Aug 3, 202343mWatch on YouTube ↗

CHAPTERS

  1. 0:00 – 0:39

    Why Weights & Biases matters: tooling as the leverage point in ML

    The hosts set the stage for a conversation about Weights & Biases (W&B) as a dominant developer tool in modern ML workflows. They position Lukas Biewald as a repeat founder focused on solving practitioner pain—first in training data, now in experimentation and MLOps.

    • W&B framed as a core platform for ML developers
    • Lukas’s arc: Figure Eight (data) → W&B (experimentation/tooling)
    • Tooling as a force multiplier for ML teams
    • Context: used by major AI orgs and enterprises
  2. 0:39 – 2:51

    Stanford roots: learning (and thinking rigorously) with Daphne Koller

    Lukas recounts his early fascination with AI through games and his initial attempts to connect with Daphne Koller. He reflects on Daphne’s exceptional teaching and her intolerance for sloppy thinking, which shaped his approach beyond ML itself.

    • Early motivation: games (notably Go) as an AI challenge
    • Daphne initially rejects his games focus; later he takes her course
    • Bayes nets era and the “nothing really worked” feeling pre-deep learning
    • Enduring lesson: clarity of thought and rigorous reasoning
  3. 2:51 – 7:36

    From research frustration to applied ML at Yahoo: the training data revelation

    He describes feeling disillusioned by incremental research progress and pivots toward applied work at Yahoo. In production search ranking, he discovers that model performance often hinges less on algorithms and more on training data quality and collection processes.

    • Research felt like marginal gains and questionable significance
    • Chooses Yahoo over Google for applied, well-defined work
    • Early production ML: translating models into deployable C code
    • Key insight: data collection quality drives outcomes more than model reuse
  4. 7:36 – 12:38

    Founding Figure Eight/CrowdFlower: making data labeling iterative and visible

    Lukas and Sarah revisit the early data-labeling market, when few external solutions existed beyond Mechanical Turk. Lukas explains how operational, non-iterative labeling workflows created failure modes—and why ML teams needed more direct control and feedback loops.

    • Dolores Labs → CrowdFlower → Figure Eight evolution
    • Early market lacked scalable, high-quality labeling services
    • Waterfall-style specs vs iterative labeling guidance and QA
    • Goal: give ML practitioners visibility/control over labeling
  5. 12:38 – 13:08

    The long slog and market discontinuities: chasms, AV booms, and competitive lessons

    Lukas details how Figure Eight hit a growth ceiling because the ML buyer set was small, producing years of stagnation. Autonomous vehicles reignited demand, but competition intensified—culminating in Scale’s rise and Lukas’s decision to sell, freeing him to start W&B.

    • Early marquee customers, then “nowhere else to go” in ML adoption
    • No real ‘chasm’ to cross when the mainstream market didn’t exist
    • AV wave triggers renewed growth and labeling demand
    • Scale’s execution in self-driving data reshapes the competitive landscape
  6. 13:08 – 17:32

    What W&B is: experiment tracking to end-to-end reliable ML workflows

    Lukas explains W&B’s expanding product surface, starting with experiment tracking and growing into data versioning/lineage, monitoring, and model registry. He ties the product philosophy to helping researchers and ML engineers ship reliable systems without excessive ops complexity.

    • Core: experiment tracking and performance over training time
    • Broader platform: data versioning/lineage, monitoring, model registry
    • Motivation: ML practitioners need simpler, purpose-built tooling
    • Bridging research and production as a persistent organizational reality
  7. 17:32 – 20:26

    LLMs change the workflow: why W&B built Prompts and leaned into LLMOps

    The conversation shifts to how LLMs disrupt traditional model-building workflows—many tasks can be solved by prompting rather than training. Lukas describes this as a potential existential threat to classic MLOps and why W&B rallied quickly to ship its Prompts suite.

    • LLMs replace many ‘train-a-model’ tasks with prompting workflows
    • Internal tension: stable business vs disruptive shift in user behavior
    • Company-wide pivot to support GPT-centric production realities
    • Infrastructure flexibility (from early design) enables rapid product moves
  8. 20:26 – 25:51

    Open source vs proprietary models: adoption patterns and the reality of production

    Sarah and Lukas explore how teams prototype with GPT and later consider open-source/fine-tuned models for cost and control. Lukas emphasizes hidden costs of self-hosting and notes how few organizations had even GPT-based LLM features truly in production at the time.

    • Common path: prototype with GPT → optimize cost/performance later
    • Constraints: limited fine-tuning options (especially GPT-4)
    • Self-hosting has underestimated operational and total-cost burdens
    • Tooling market saturation vs relatively small number of production deployments
  9. 25:51 – 31:33

    Where ML is landing: pharma, games, agtech, and ‘every Fortune 500’

    Lukas highlights surprising and exciting areas of W&B adoption, especially pharma’s large-scale investment in ML for drug discovery and testing. He also describes breadth across industries—games, agriculture, fintech—and why W&B tends to appear after teams reach real production pain.

    • Pharma as a standout: major hiring indicates serious operationalization
    • Clinical trial timelines make success look “invisible” in backward metrics
    • Examples: game personalization, cleaner farming, precision spraying
    • W&B adoption pattern: teams come after they’ve shipped and hit reliability issues
  10. 31:33 – 34:45

    Developer-first vs executive-first: how W&B avoided the ‘CIO-loved, engineer-hated’ trap

    Elad asks how W&B achieved broad grassroots adoption. Lukas contrasts developer-led product growth with earlier MLOps companies selling big enterprise deals, and explains how ML researchers and developers converged as deep learning forced more software complexity into research.

    • Avoiding multi-million-dollar, executive-driven sales dynamics
    • Developer-first ergonomics and workflow focus as differentiation
    • Deep learning + GPUs ‘broke the stack,’ forcing researchers into dev mode
    • DevOps-to-MLOps rebranding creates tools misfit for everyday ML builders
  11. 34:45 – 39:02

    Open vs closed source in ML tooling: telemetry, ergonomics, and trust

    Sarah probes how W&B overcame developers’ inclination to build tools themselves and prefer open source. Lukas argues W&B’s partially closed approach enabled stronger product ergonomics via real usage telemetry, while keeping key client-side components open to satisfy enterprise requirements.

    • Closed-source origin driven by building a sustainable business model
    • Telemetry advantage: observing clicks/options to improve UX rapidly
    • Pragmatic stance: open-source client + open components running on customer servers
    • Different personas: DevOps often demands open source more than ML researchers do
  12. 39:02 – 40:49

    Second-time founder lessons: pick the customer, think long-term, resist metric traps

    Elad asks what Lukas did differently the second time. Lukas stresses starting with a clear customer profile (and liking those customers), building confidence to prioritize long-term product quality, and resisting short-term ARR pressures that can harm compounding growth.

    • Start with a customer profile; customer love matters for founder stamina
    • Long-term product quality is harder to measure but more important
    • Short-term revenue pressure can lead to counterproductive tradeoffs
    • Founder maturity: more confidence applying lessons early
  13. 40:49 – 43:44

    Founder advice for AI companies: obsessive customer discovery and hard questions

    Lukas closes with advice he believes is even more true than founders realize: build something people want and spend real time with customers. He notes how difficult it is to get meaningful customer meetings early and encourages founders to show up prepared and probe for painful truths.

    • Make something people want—care more than you think you already do
    • Customer time is scarce; early meetings require ‘scrape and claw’ effort
    • Ask tough questions; don’t optimize for being liked
    • Leaning into insecurity can improve honest product discovery

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.