Episode 16: Building AI for Life Sciences

Episode 16: Building AI for Life Sciences

OpenAIApr 16, 202644m

Joy Jiao (guest), Yunyun Wang (guest)

Life Sciences model series (biochemistry-focused)Genomics and protein mechanistic understandingAutonomous wet lab collaboration (Ginkgo Bioworks)Model orchestration, agents, and reproducible workflowsLife sciences research plugin (50+ skills)Biorisk, information hazards, and differentiated accessCompute scaling and test-time compute for scientific discovery

In this episode of OpenAI, featuring Joy Jiao and Yunyun Wang, Episode 16: Building AI for Life Sciences explores openAI’s Life Sciences models: accelerating labs, discovery, and safety safeguards OpenAI is launching a Life Sciences model series focused on biochemistry workflows, starting with genomics and protein understanding for early discovery use cases.

OpenAI’s Life Sciences models: accelerating labs, discovery, and safety safeguards

OpenAI is launching a Life Sciences model series focused on biochemistry workflows, starting with genomics and protein understanding for early discovery use cases.

The team describes “model-in-the-loop” autonomous lab work (e.g., with Ginkgo Bioworks) showing models can design experiments that yield measurable results and speed iteration cycles.

Life-sciences deployment requires orchestration, reproducibility, and workflow templates (e.g., a plugin with 50+ repeatable skills) across products like ChatGPT and Codex.

Because biology is highly dual-use and intent is hard to infer from prompts, OpenAI emphasizes differentiated access, enterprise controls, and incremental deployment to manage biorisk.

They argue that scaling compute—both larger models and test-time “thinking time”—is central to enabling harder scientific reasoning, with a long-term vision of autonomous, robot-run labs guided by humans.

Key Takeaways

Life-sciences AI must be built around real research workflows, not just chat.

They frame the Life Sciences model series as anchored on complex “long trajectory” research tasks—literature synthesis, pathway analysis, target selection—and delivered through product surfaces (ChatGPT, Codex) plus workflow templates for repeatability.

Get the full analysis with uListen

Tool use already makes models behave like competent computational biologists.

Joy highlights that models can call established tools (e. ...

Get the full analysis with uListen

Autonomous labs shift the bottleneck from human hands to compute.

The Ginkgo collaboration is used to illustrate a future where parallelized agent workflows and robotic execution reduce “human throughput” constraints (pipetting, protocol translation), making compute and orchestration the limiting factors.

Get the full analysis with uListen

Biosecurity is hard because benign and harmful steps look similar.

They emphasize that prompts like “help me clone a gene” are ambiguous (GFP vs. ...

Get the full analysis with uListen

Differentiated access is positioned as the key to unlocking stronger capabilities safely.

For verified researchers and regulated institutions (tracked reagents, controlled datasets, enterprise security), they argue more capable assistance can be provided than for anonymous general users, where misuse risk is harder to bound.

Get the full analysis with uListen

Evals in biology need creativity and grounding in messy reality.

They describe evaluation via experimental ground truth (predict outcomes of known experiments), synthetic datasets with planted biases/QC traps, and ultimately wet-lab validation as the “final” arbiter.

Get the full analysis with uListen

Test-time compute is a strategic lever for scientific breakthroughs.

Beyond scaling model size, they stress scalable inference-time “thinking” (potentially very long runs) as enabling new difficulty levels—captured in their internal motto: scaling test-time compute toward curing disease.

Get the full analysis with uListen

Notable Quotes

We’re really excited to build and deploy the Life Sciences model series.

Yunyun Wang

One of the taglines was to scale test time compute to cure all disease.

Joy Jiao

The precursor steps… look very benign, and it’s really hard to distinguish between.

Yunyun Wang

The safest model here would be a model that just had no capability… and it’s not very good, but it’s very safe.

Joy Jiao

Nothing in biology is really real until you can prove it in the real world.

Joy Jiao

Questions Answered in This Episode

What specifically makes the Life Sciences model series “biochemistry-focused” compared to general GPT models—data, architecture, toolchain, or evals?

OpenAI is launching a Life Sciences model series focused on biochemistry workflows, starting with genomics and protein understanding for early discovery use cases.

Get the full analysis with uListen AI

Can you describe a concrete example of one of the 50+ plugin “skills” end-to-end (inputs, tools called, outputs, and how reproducibility is ensured)?

The team describes “model-in-the-loop” autonomous lab work (e. ...

Get the full analysis with uListen AI

In the Ginkgo collaboration, what was the target outcome (protein, enzyme activity, yield), and how did model-suggested iterations compare to human-designed baselines?

Life-sciences deployment requires orchestration, reproducibility, and workflow templates (e. ...

Get the full analysis with uListen AI

How do you operationalize “differentiated access” in practice—identity verification, audit logs, scope limits, and escalation when a request becomes ambiguous?

Because biology is highly dual-use and intent is hard to infer from prompts, OpenAI emphasizes differentiated access, enterprise controls, and incremental deployment to manage biorisk.

Get the full analysis with uListen AI

Where do refusals most frustrate legitimate scientists today (e.g., cloning protocols, pathogen-adjacent assays), and what changes would reduce false positives without increasing risk?

They argue that scaling compute—both larger models and test-time “thinking time”—is central to enabling harder scientific reasoning, with a long-term vision of autonomous, robot-run labs guided by humans.

Get the full analysis with uListen AI

Transcript Preview

Speaker

Hello, I'm Andrew Main, and this is the OpenAI Podcast. On today's episode, we're talking with Research Lead Joy Jiao, and Product Lead Yunyun Wang about OpenAI for Life Sciences. We'll explore what new models are making possible in biology and medicine, and what it takes to deploy the most advanced capabilities responsibly.

Joy Jiao

This allows it to kind of reach new levels of difficulty and discovery that we didn't think was even possible before.

Yunyun Wang

Putting like really capable expert level knowledge in the hands of a greater amount of people.

Joy Jiao

One of the taglines was to scale test time compute to cure all disease.

Speaker

Yes.

Joy Jiao

So that is like [chuckles] our team tagline.

Speaker

We started off with just a basic API, and then we had ChatGPT, which is more conversational, was really good for text, as code became a capability, went through basically code models and then Codex. Now that you're getting more scientists in the life sciences working on these systems, does that mean things have to evolve to help with the way researchers might work with these tools?

Yunyun Wang

Yeah. We're really excited to, uh, build and deploy the Life Sciences model series. So this is a new biochemistry focused model series that's really anchored on these very complex life science research workflows, and we're focused on adding, um, new like mechanistic understanding, uh, starting with genomics understanding and protein understanding, and really focused on early discovery use cases because we feel like that's like one of the core bottlenecks that, uh, greater thinking time, greater compute, and really like leveraging like more capable AI models can help, um, meaningfully scale some of these like research barriers. And I think there's also like a, a model orchestration piece of actually how to embed this into workflows, and it's been really great, uh, first off, w- having all these different product surfaces to deploy to. We're seeing a lot of really great like literature synthesis workflows happening on, uh, ChatGPT, and, uh, these models really push the frontier of like long trajectory agentic workflows, and we're really able to empower that on, uh, Codex. And more on the model orchestration piece is that I think for enterprise use cases, there's like this reproducibility and repeatability element, and we are trying to overcome this by working on like some of the life sciences research plugins that we're shipping for very specific translational, uh, bio users. So the life sciences research plugin has over 50 skills, which are essentially templatized repeatable workflows that, um, if you need to whether do some sort of cross evidence match and search across various different papers or do, uh, pathway analysis, something that's like, uh, repeatable that you often do, you can have like almost like a one click deploy option by using our life sciences plugins on top, and that's also how we're kind of seeing, uh, the balance between, uh, scaling for very specialized, uh, purposes, uh, something we're hoping to get into is maybe clinical purposes, but also make it still very general use for all foundational biology.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome