No Priors Ep. 17 | With Karan Singhal

Sarah Guo and Karan Singhal on google’s Med-PaLM 2 Aims to Safely Transform Healthcare With AI.

Sarah GuohostKaran SinghalguestElad Gilhost

May 18, 202342m

Karan Singhal’s path into socially responsible AI and medical AIFrom PaLM to PaLM 2 and how Med-PaLM is specializedTechniques for domain alignment: instruction tuning, prompt tuning, and fine-tuningEvaluation of medical LLMs: benchmarks, human evaluation, and safetyRegulation, HIPAA, privacy, and federated learning in healthcare AIAI as physician assistant, documentation tool, and scientific research aidMedical AI as a laboratory for alignment, hallucination reduction, and scalable oversight

In this episode of No Priors, featuring Sarah Guo and Karan Singhal, No Priors Ep. 17 | With Karan Singhal explores google’s Med-PaLM 2 Aims to Safely Transform Healthcare With AI Karan Singhal, a lead researcher on Google’s Med-PaLM 2, explains how large language models are being adapted for high‑stakes medical use, with a focus on safety, accuracy, and responsible deployment. He traces the evolution from PaLM to PaLM 2 and Med-PaLM, detailing methods like instruction prompt tuning and improved pretraining objectives. The discussion digs into evaluation challenges, regulatory and privacy barriers, and why medical workflows can be a powerful testbed for AI alignment and safety research. Looking ahead, Singhal expects LLMs to raise the global standard of care, augment clinicians and scientists, and help solve key scientific and medical problems—if society can set realistic, benefit-aware safety bars.

WHAT IT’S REALLY ABOUT

Google’s Med-PaLM 2 Aims to Safely Transform Healthcare With AI

Karan Singhal, a lead researcher on Google’s Med-PaLM 2, explains how large language models are being adapted for high‑stakes medical use, with a focus on safety, accuracy, and responsible deployment. He traces the evolution from PaLM to PaLM 2 and Med-PaLM, detailing methods like instruction prompt tuning and improved pretraining objectives. The discussion digs into evaluation challenges, regulatory and privacy barriers, and why medical workflows can be a powerful testbed for AI alignment and safety research. Looking ahead, Singhal expects LLMs to raise the global standard of care, augment clinicians and scientists, and help solve key scientific and medical problems—if society can set realistic, benefit-aware safety bars.

IDEAS WORTH REMEMBERING

7 ideas

Start from strong general models, then cheaply align to the medical domain.

Med-PaLM builds on PaLM/PaLM 2’s general reasoning and knowledge, then uses data-efficient techniques like instruction prompt tuning with doctor-written examples to adapt behavior for long-form medical QA without costly full retraining.

Choose tuning method based on how much labeled domain data you actually have.

If you have only a handful of examples, use prompting; with tens of examples, consider prompt tuning; and once you have 100+ high-quality labeled examples, full fine-tuning tends to deliver the best performance relative to cost.

Robust evaluation in medicine must go beyond benchmarks to human review in real workflows.

Existing medical AI work often relied on limited multiple‑choice benchmarks; Med-PaLM emphasized systematic automated evaluation plus detailed human evaluation by clinicians and laypeople, but Singhal stresses the need for workflow‑embedded, outcomes‑aware studies.

The bar for safety should be “better than the realistic status quo,” not perfection.

While stakes are high in medicine, patients and physicians already rely on imperfect web searches and time‑pressed clinicians; indecision or inaction is also a decision, so policy must weigh both risks and missed benefits of withholding capable systems.

Near-term impact will come from assistant and back-office roles, then higher-stakes support.

LLMs are already being piloted for documentation and billing; Singhal expects next waves in radiology assistance, report QA, telemedicine augmentation, and clinical decision support—initially as physician co-pilots, not replacements.

Medical AI exposes core alignment problems early, like scalable oversight and hallucinations.

As models like Med-PaLM 2 approach physician-level performance, even experts struggle to judge answers, making simple RLHF less reliable; this forces exploration of techniques like self-critique, debate, and AI‑assisted oversight that matter for broader AI safety.

Grounding in authoritative sources and privacy-preserving infrastructure will be essential for trust.

To overcome institutional risk aversion, Singhal highlights tools that can query and cite trusted medical sources (e.g., Mayo Clinic) and architectures like automatic redaction or trusted execution environments that reduce exposure of sensitive patient data.

WORDS WORTH SAVING

5 quotes

Indecision is a decision as well.

— Karan Singhal

The ideal thing would be to have access to all of the data, but in a privacy-preserving way.

— Karan Singhal

It’s hard to tell the difference between models and physicians.

— Karan Singhal

There is a chance to go for the jugular here in terms of health information.

— Karan Singhal

In the long term when things go really well with AI, it’s going to be because we’ve solved a lot of the most pressing scientific problems of today.

— Karan Singhal

QUESTIONS ANSWERED IN THIS EPISODE

5 questions

What specific real-world clinical trials or deployments would best demonstrate that Med-PaLM 2 meaningfully improves patient outcomes and safety over the current standard of care?

Karan Singhal, a lead researcher on Google’s Med-PaLM 2, explains how large language models are being adapted for high‑stakes medical use, with a focus on safety, accuracy, and responsible deployment. He traces the evolution from PaLM to PaLM 2 and Med-PaLM, detailing methods like instruction prompt tuning and improved pretraining objectives. The discussion digs into evaluation challenges, regulatory and privacy barriers, and why medical workflows can be a powerful testbed for AI alignment and safety research. Looking ahead, Singhal expects LLMs to raise the global standard of care, augment clinicians and scientists, and help solve key scientific and medical problems—if society can set realistic, benefit-aware safety bars.

How should regulators and health systems define an acceptable risk–benefit threshold for medical LLMs, given that human clinicians and web search are far from perfect?

What concrete mechanisms could give patients more control over how their health data trains or interacts with large models, while still enabling powerful AI capabilities?

Once AI systems surpass human clinicians in some diagnostic tasks, who should be legally and ethically accountable when the model’s recommendation conflicts with a doctor’s judgment?

How can the medical community ensure that improvements from AI—like better diagnostics or decision support—translate into reduced health inequities rather than amplifying them?

EVERY SPOKEN WORD

Install uListen for AI-powered chat & search across the full episode — Get Full Transcript

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome

At a glance

Google’s Med-PaLM 2 Aims to Safely Transform Healthcare With AI

Start from strong general models, then cheaply align to the medical domain.

Choose tuning method based on how much labeled domain data you actually have.

Robust evaluation in medicine must go beyond benchmarks to human review in real workflows.

The bar for safety should be “better than the realistic status quo,” not perfection.

Near-term impact will come from assistant and back-office roles, then higher-stakes support.

Medical AI exposes core alignment problems early, like scalable oversight and hallucinations.

Grounding in authoritative sources and privacy-preserving infrastructure will be essential for trust.

What specific real-world clinical trials or deployments would best demonstrate that Med-PaLM 2 meaningfully improves patient outcomes and safety over the current standard of care?

How should regulators and health systems define an acceptable risk–benefit threshold for medical LLMs, given that human clinicians and web search are far from perfect?

What concrete mechanisms could give patients more control over how their health data trains or interacts with large models, while still enabling powerful AI capabilities?

Once AI systems surpass human clinicians in some diagnostic tasks, who should be legally and ethically accountable when the model’s recommendation conflicts with a doctor’s judgment?

How can the medical community ensure that improvements from AI—like better diagnostics or decision support—translate into reduced health inequities rather than amplifying them?

Get more out of YouTube videos.