Why most AI products fail: Lessons from 50+ AI deployments at OpenAI, Google & Amazon

Name: Why most AI products fail: Lessons from 50+ AI deployments at OpenAI, Google & Amazon
Uploaded: 2026-01-11T12:00:00Z
Duration: 1 h 26 min 22 s
Description: Aishwarya Reganti and Kiriti Badam explain why many AI products fail: teams treat LLMs like deterministic software and rush to fully autonomous agents without earning trust.

Lenny's PodcastJan 11, 20261h 26m

Lenny Rachitsky (host), Aishwarya Naresh Reganti (guest), Kiriti Badam (guest), Lenny Rachitsky (host)

Non-determinism in AI UX (input and output)Agency vs. control trade-off for agentsStepwise autonomy progression (V1→V3 patterns)Reliability and trust in enterprise deploymentsContinuous calibration / continuous development (CC/CD) frameworkEvals vs. production monitoring vs. “vibes”Leadership hands-on learning and org culture (SME buy-in)Workflow decomposition, context engineering, messy enterprise dataMulti-agent systems: misunderstood coordination patternsFuture: proactive/background agents; multimodal experiences“Pain is the new moat” (persistence as competitive advantage)

In this episode of Lenny's Podcast, featuring Lenny Rachitsky and Aishwarya Naresh Reganti, Why most AI products fail: Lessons from 50+ AI deployments at OpenAI, Google & Amazon explores why AI products fail: managing non-determinism, autonomy, and feedback loops Aishwarya Reganti and Kiriti Badam explain why many AI products fail: teams treat LLMs like deterministic software and rush to fully autonomous agents without earning trust.

Why AI products fail: managing non-determinism, autonomy, and feedback loops

Aishwarya Reganti and Kiriti Badam explain why many AI products fail: teams treat LLMs like deterministic software and rush to fully autonomous agents without earning trust.

They argue AI product development must account for non-deterministic inputs/outputs and the agency–control trade-off, which changes how you design, ship, and operate products.

Their core prescription is to start with low-risk, high-control versions (human-in-the-loop), build measurement and learning flywheels, and gradually increase autonomy as surprises diminish.

They introduce a CI/CD-inspired framework—continuous calibration / continuous development—combining scoped datasets, eval/monitoring, behavior analysis, and iterative fixes, with leadership and culture as key enablers.

Key Takeaways

Treat LLMs as non-deterministic APIs, not normal software components.

Unlike traditional UIs and workflows, users express intent in countless ways and LLM outputs vary with phrasing and context. ...

Get the full analysis with uListen AI

Autonomy must be earned—more agency means less control and higher risk.

Every added capability (tool use, decisions, actions) increases the chance of costly mistakes and trust erosion. ...

Get the full analysis with uListen AI

Build AI products in versions that progressively increase autonomy.

Examples: support agent (suggest → draft to customer → issue refunds/actions), coding assistant (inline snippets → refactors/tests → open PRs), marketing assistant (copy drafts → run campaigns → auto-optimize). ...

Get the full analysis with uListen AI

Use human-in-the-loop stages to create a learning flywheel, not just safety.

Copilot phases let you log edits, accept/reject decisions, and trace behavior—turning human oversight into training data and product insight to improve prompts, tools, and workflow design over time.

Get the full analysis with uListen AI

Successful AI adoption is a people-and-process transformation, not only technical.

They highlight a “success triangle”: leaders who rebuild intuition hands-on, a culture that empowers SMEs (vs. ...

Get the full analysis with uListen AI

“One-click agents” are usually marketing; meaningful ROI takes months.

Enterprise systems are messy—taxonomies, undocumented rules, tech debt, brittle tools. ...

Get the full analysis with uListen AI

Don’t pick sides: evals and production monitoring are complementary.

Evals catch known failure modes pre-deploy and prevent regressions; production monitoring surfaces unknown, emerging issues via explicit and implicit signals (thumbs up/down, regenerations, feature disablement). ...

Get the full analysis with uListen AI

Evals are overloaded as a term; focus on “actionable feedback loops.”

People call many things “evals” (benchmarks, label notes, PRD-like checks, LLM judges). ...

Get the full analysis with uListen AI

Adopt a CC/CD operating model: continuous development + continuous calibration.

Start by scoping capability and curating an initial dataset, build the app and evaluation dimensions, deploy, then repeatedly analyze real behavior, spot error patterns, fix, and update what you measure—while managing autonomy progression.

Get the full analysis with uListen AI

Competitive advantage shifts to persistence and lived experience: “pain is the new moat.”

With tools and implementation becoming cheap, winners are those who grind through iterations, learn non-negotiables, and encode that learning into processes, product decisions, and monitoring/eval systems.

Get the full analysis with uListen AI

Notable Quotes

“Most people tend to ignore the non-determinism.”
— Aishwarya Naresh Reganti

“Every time you hand over decision-making capabilities to agentic systems, you're kind of relinquishing some amount of control on your end.”
— Aishwarya Naresh Reganti

“When you start small... one easy slippery slope is to keep thinking about complexities of the solution and forget the problem that you're trying to solve.”
— Kiriti Badam

“It’s not about being the first company to have an agent… It’s about, have you built the right flywheels in place so that you can improve over time?”
— Aishwarya Naresh Reganti

“Pain is the new moat.”
— Kiriti Badam

Questions Answered in This Episode

In your CC/CD framework, what are the most reliable signals that “surprise has diminished” enough to move from V1 to V2 autonomy in a real product?

Aishwarya Reganti and Kiriti Badam explain why many AI products fail: teams treat LLMs like deterministic software and rush to fully autonomous agents without earning trust.

Get the full analysis with uListen AI

For customer support agents, how do you decide the exact boundary between “drafting” and “taking action” (refunds, account changes) without over-fitting to edge cases?

They argue AI product development must account for non-deterministic inputs/outputs and the agency–control trade-off, which changes how you design, ship, and operate products.

Get the full analysis with uListen AI

What are the most common hidden data/workflow issues (like messy taxonomies or undocumented rules) that derail agent deployments, and how do you systematically uncover them early?

Their core prescription is to start with low-risk, high-control versions (human-in-the-loop), build measurement and learning flywheels, and gradually increase autonomy as surprises diminish.

Get the full analysis with uListen AI

When do LLM-as-judge evals become counterproductive, and what lightweight alternatives do you recommend for complex, evolving domains?

Get the full analysis with uListen AI

Codex uses both evals and customer feedback—what are concrete examples of “core behaviors” you always protect with evals even though the product is highly customizable?

Get the full analysis with uListen AI

Transcript Preview

Lenny Rachitsky

We worked on a guest post together. They had this really key insight that building AI products is very different from building non-AI products.

Aishwarya Naresh Reganti

Most people tend to ignore the non-determinism. You don't know how the user might behave with your product, and you also don't know how the LLM might respond to that. The second difference is the agency control trade-off. Every time you hand over decision-making capabilities to agentic systems, you're kind of relinquishing some amount of control on your end.

Lenny Rachitsky

This significantly changes the way you should be building product.

Kiriti Badam

So we recommend building step by step. When you start small, it forces you to think about, what is the problem that I'm gonna solve? In all this advancements of the AI, one easy slippery slope is to keep thinking about complexities of the solution and forget the problem that you're trying to solve.

Aishwarya Naresh Reganti

It's not about being the first company to have an agent among your competitors. It's about, have you built the right flywheels in place so that you can improve over time?

Lenny Rachitsky

What kind of ways of working do you see in companies that build AI products successfully?

Aishwarya Naresh Reganti

I used to work with the CEO of now Rackspace. He would have this block every day in the morning, which would say, "Catching up with AI, four to six AM." Leaders have to get back to being hands-on. You must be comfortable with the fact that your intuitions might not be right, and you probably are the dumbest person in the room, and you want to learn from everyone.

Lenny Rachitsky

What do you think the next year of AI is gonna look like?

Kiriti Badam

Persistence is extremely valuable. Successful companies right now building in any new area, they are going through the pain of learning this, implementing this, and understanding what works and what doesn't work. Pain is the new moat.

Lenny Rachitsky

[upbeat music] Today, my guests are Aishwarya Reganti and Kiriti Badam. Kiriti works on Codex at OpenAI and has spent the last decade building AI and ML infrastructure at Google and at Kumo. Ash was an early AI researcher at Alexa and Microsoft and has published over thirty-five research papers. Together, they've led and supported over fifty AI product deployments across companies like Amazon, Databricks, OpenAI, Google, and both startups and large enterprises. Together, they also teach the number one rated AI course on Maven, where they teach product leaders all of the key lessons they've learned about building successful AI products. The goal of this episode is to save you and your team a lot of pain and suffering and wasted time trying to build your AI product. Whether you are already struggling to make your product work or want to avoid that struggle, this episode is for you. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. It helps tremendously. And if you become an annual subscriber of my newsletter, you get a year free of a ton of incredible products, including a year free of Lovable, Replit, Bold, Gamma, n8n, Linear, Devon, PostHog, Superhuman, Descript, WhisperFlow, Perplexity, Warp, Granola, MagicPad, AndroidCast, Chapter, dmobit, and Stripe Atlas. Head on over to lennysnewsletter.com and click Product Pass. With that, I bring you Aishwarya Reganti and Kiriti Badam, after a short word from our sponsors. This episode is brought to you by Merge. Product leaders hate building integrations. They're messy, they're slow to build, they're a huge drain on your roadmap, and they're definitely not why you got into product in the first place. Lucky for you, Merge is obsessed with integrations. With a single API, B2B SaaS companies embed Merge into their product and ship two hundred and twenty-plus customer-facing integrations in weeks, not quarters. Think of Merge like Plat, but for everything B2B SaaS. Companies like Mistral AI, Ramp, and Drata use Merge to connect their customers' accounting, HR, ticketing, CRM, and file storage systems to power everything from automatic onboarding to AI-ready data pipelines. Even better, Merge now supports the secure deployment of connectors to AI agents with a new product, so that you can safely power AI workflows with real customer data. If your product needs customer data from dozens of systems, Merge is the fastest, safest way to get it. Book and attend a meeting at merge.dev/lenny, and they'll send you a fifty-dollar Amazon gift card. That's merge.dev/lenny. This episode is brought to you by Strella, the customer research platform built for the AI era. Here's the truth about user research: It's never been more important or more painful. Teams want to understand why customers do what they do, but recruiting users, running interviews, and analyzing insights takes weeks. By the time the results are in, the moment to act has passed. Strella changes that. It's the first platform that uses AI to run and analyze in-depth interviews automatically, bringing fast and continuous user research to every team. Strella's AI moderator asks real follow-up questions, probing deeper when answers are vague, and surfaces patterns across hundreds of conversations, all in a few hours, not weeks. Product, design, and research teams at companies like Amazon and Duolingo are already using Strella for Figma prototype testing, concept validation, and customer journey research, getting insights overnight instead of waiting for the next sprint. If your team wants to understand customers at the speed you ship products, try Strella. Run your next study at strella.io/lenny. That's S-T-R-E-L-L-A.io/lenny. [upbeat music] Ash and Kiriti, thank you so much for being here, and welcome to the podcast.

Install uListen to search the full transcript and get AI-powered insights

Get Full Transcript

Get more from every podcast

AI summaries, searchable transcripts, and fact-checking. Free forever.

Add to Chrome