Skip to content
OpenAIOpenAI

Building AI for better healthcare — the OpenAI Podcast Ep. 14

Healthcare systems around the world are under strain, and both patients and clinicians are feeling the impact. OpenAI's Head of Health Dr. Nate Gross and Karan Singhal, who leads Health AI Research, discuss how AI can help address the biggest challenges. They cover how OpenAI is training models to handle sensitive health questions in collaboration with physicians, and how that foundation is unlocking a new generation of tools for patients, clinicians, and healthcare systems. Chapters: 00:00:38 – Origins of Nate and Karan’s interest in AI and healthcare 00:05:01 – Strategy for building AI tools for clinicians 00:06:57 – How AI models are trained for health use cases 00:10:15 – How OpenAI is able to score well on health evals 00:14:21 – Key challenges deploying AI in healthcare 00:21:05 – Collaboration with hospitals and healthcare systems 00:23:05 – Practical everyday uses of AI health assistants 00:26:43 – Biggest “wow” moment during development 00:28:46 – Feedback from clinicians and early users

Andrew MaynehostKaran SinghalguestDr. Nate Grossguest
Mar 16, 202630mWatch on YouTube ↗

CHAPTERS

  1. Why OpenAI is investing in healthcare AI: access, time, and better outcomes

    Andrew Mayne sets the stage with OpenAI’s healthcare focus: training models that can handle sensitive, high-stakes questions and support clinicians, patients, and systems. Nate and Karan frame healthcare as a domain where safety, privacy, and practical usefulness must be built in from the start.

    • Healthcare is a high-stakes domain requiring extra rigor in safety and privacy
    • Goal is to help patients, clinicians, and healthcare systems—not just answer consumer questions
    • Healthcare’s fragmentation and gaps in care make it a prime area for AI-driven impact
  2. Nate Gross’s path: from health policy to frustration with clinical IT

    Nate describes being drawn to healthcare through health policy and value-based care, then training in a public-hospital environment. He contrasts modern consumer tech with the outdated tools clinicians relied on, motivating his drive to modernize access and workflows.

    • Early interest in healthcare came through health policy and access
    • Medical training highlighted real-world constraints and inefficiencies
    • Gap between consumer tech (smartphones/apps) and clinical tools (fax/paper/EHR beginnings) shaped his motivation
  3. Karan Singhal’s motivation: AGI, safety, and the healthcare opportunity gap

    Karan explains his early interest in intelligence and the philosophy of mind, leading to AI research and a conviction that advanced AI would arrive within our lifetimes. He connects his safety/privacy background to healthcare, arguing the clinical world underestimated how impactful large language models could be.

    • Longstanding interest in machine intelligence evolved into applied AI work
    • Safety and privacy research became a foundation for healthcare applications
    • Healthcare presented a large, underappreciated opportunity for LLMs with major upside—and real risks
  4. Product strategy: ChatGPT Health as a secure, context-aware health companion

    Nate outlines how massive consumer demand is already driving health use: a large share of ChatGPT queries are health-related. The strategy emphasizes both security (strong privacy protections) and empowerment (bringing user context into conversations to avoid “amnesiac” one-off search).

    • Health queries are a major and growing share of overall ChatGPT usage
    • ChatGPT Health aims to make conversations secure (extra protections; no training on user health data)
    • Context matters in health—tools should adapt to the individual and persist consented context over time
    • Move beyond generic search toward personalized, grounded guidance
  5. How health models are trained: start with evaluation, not hype

    Karan explains that healthcare work began with safety/alignment motivations and an evaluation-first approach. OpenAI built HealthBench around realistic multi-turn conversations, co-designed with hundreds of physicians, to measure both usefulness and safety in situations that resemble real use.

    • Healthcare training effort is grounded in safety and alignment from the beginning
    • Evaluation is treated as the foundation for training and improvement
    • HealthBench uses realistic, multi-turn conversations (not just exam-style questions)
    • ~250 physicians helped design and generate evaluation data over ~1 year
  6. Inside HealthBench: context-seeking, audience adaptation, and multifaceted scoring

    HealthBench measures whether models ask for missing context, tailor responses to different users, and behave safely under uncertainty. Karan gives the example of ambiguous symptoms (“it burns”) where the safest move is to ask clarifying questions rather than overconfidently guess.

    • Key behavior: ask for context before answering when information is insufficient
    • Adaptive communication: respond differently to laypeople vs clinicians
    • Safety and helpfulness are evaluated across many dimensions, not a single score
    • Ambiguity handling is central to reducing risk in real-world health chats
  7. Why OpenAI models score strongly on health evals: health integrated across the training stack

    Karan attributes strong benchmark performance to a cross-functional effort that spans pre-training through post-training, plus pre-deployment evaluation and production monitoring. Nate adds that clinically meaningful training focuses on escalation, literacy adaptation, and uncertainty—not multiple-choice test performance.

    • Health is incorporated across major phases of model development (pre/mid/post training)
    • Cross-functional approach: evals, physician collaboration, and production monitoring
    • Clinical realism matters more than medical exam performance
    • Emphasis on escalation pathways, uncertainty calibration, and literacy-appropriate guidance
  8. Deployment challenges in real healthcare: trust, grounding, and siloed systems

    The discussion shifts to blockers for real-world adoption: clinicians need trustworthy, up-to-date, locally appropriate answers. Nate highlights the need to ground outputs in guidelines, literature, and institutional practices, while also connecting fragmented systems and data formats across organizations.

    • Clinician trust depends on grounding in current literature, guidelines, and institutional/regional policies
    • Different care settings vary in resources and practice patterns—answers must be situational
    • Healthcare data and tools are highly siloed (analog/digital, structured/unstructured, decentralized)
    • AI must integrate into secure environments (e.g., HIPAA-aligned use) while remaining useful
  9. Collaboration with hospitals, government, and device ecosystems: making patient context portable

    Nate explains that partnerships and standards are essential so patients can consent to bring records into ChatGPT in just a few taps. He points to EHR interoperability efforts (including government standards) and integrations with consumer devices, wearables, and biosensors to create richer, more actionable context.

    • Interoperability and standards enable easy, consented record syncing
    • Working across EHRs, government bodies, and health systems to make context portable
    • Wearables and biosensors add continuous data that can improve personalization
    • Combining data sources can unlock insights that individual apps alone may not provide
  10. Everyday AI health assistant use: from wearables to dinner plans and daily pacing

    Andrew and Nate discuss practical examples where health context improves daily decisions—menu planning, activity goals, scheduling, and stress/sleep-informed planning. Nate emphasizes ChatGPT as a unifying layer that complements partner technologies, extending health insights into many daily workflows.

    • Use cases: activity-based suggestions, menu planning, ordering guidance, and day planning
    • Using sleep and stress signals to prioritize tasks and reduce overload
    • ChatGPT complements specialized partner apps rather than replacing them
    • Goal: reduce friction in following care plans and managing health outside clinic visits
  11. Clinician workflow support: “raise the floor, sweep the floor, raise the ceiling”

    Nate introduces a three-part framework: broaden access to AI benefits, reduce administrative burden so clinicians regain time with patients, and enable new frontiers of medical capability. The emphasis is on augmenting care teams and improving continuity rather than replacing clinicians.

    • Raise the floor: expand access for patients and professionals
    • Sweep the floor: reduce administrative and bureaucratic load on clinicians
    • Raise the ceiling: enable new capabilities that accelerate medicine’s progress
    • AI as a time-multiplier and safety net for care delivery
  12. Wow moments: explosive adoption and new frontiers like repurposed medications

    Karan’s “wow” is the rapid growth of health and wellness usage even before a dedicated product launch. Nate highlights the shift from interesting demos to useful, potentially transformative work—such as scaling experiments and identifying new value for shelved medications.

    • Health is among the fastest-growing use categories in ChatGPT usage
    • Adoption creates feedback loops that improve models and product direction
    • Longer context and stronger capabilities may enable new discoveries and predictions
    • Potential to repurpose existing medications and accelerate experimentation
  13. Clinician feedback from the field: Nairobi co-pilot study and the safety-net effect

    Karan describes deploying an AI clinical co-pilot with Penda Health clinics in Nairobi, designed to monitor EHR entries and interrupt only when something looks concerning. The study found a statistically significant reduction in diagnostic and treatment errors, and clinicians became reluctant to run future trials that would deny AI to a control group.

    • Workflow design: monitor clinician documentation and intervene selectively to reduce disruption
    • Measured outcome: statistically significant reduction in diagnostic/treatment errors
    • Active change management: close collaboration with clinicians to fit real workflows
    • Perception shift: AI seen as protective enough that withholding it felt unsafe
  14. Early-user stories: caregiver support, clinician relief, and rare “miracle” cases

    Nate shares the most meaningful feedback: caregivers under strain, overwhelmed clinicians, and occasional cases where AI helps accelerate an elusive diagnosis or critical decision by surfacing missing context. The episode closes on AI as an amplifier that helps clinicians do more for patients.

    • Caregivers use AI to navigate complex health responsibilities
    • Clinicians benefit from compressed workload and extended expertise
    • Rare but growing examples of accelerated diagnosis or emergency support
    • AI positioned as an amplifier and collaborator, not a replacement

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.