Skip to content
Aakash GuptaAakash Gupta

Stop Applying to AI PM Jobs Until You Watch This Safety & Ethics Mock

Apply to Land a PM Job Cohort 3 (starts May 4): https://www.landpmjob.com/ Most AI PM candidates underrate the safety and ethics round. In this episode, Ankit Virmani (AI PM at Uber, formerly GPM at Meta), Prasad Reddy (former CPO at L-Nutra, ex-VP at Danaher), and Dr. Bart Jaworski (coach to 12,000+ at Amazon, Microsoft, Zalando) join Aakash for four live mock rounds with real-time scoring, plus a framework you can use the next time medical chatbots, hiring bias, or autonomous agents come up in your loop. Full Writeup: https://www.news.aakashg.com/p/safety-ethics-interview --- Timestamps: 00:00 The safety round most AI PM candidates underrate 01:36 Why senior candidates freeze on safety questions 03:39 The SHIR framework: severity, harm scope, immediacy, reversibility 06:34 Mock 1: Medical chatbot contradicting clinical guidelines 11:06 Mock 2: Hiring tool with a 15% demographic gap 16:50 Mock 3: AI agent booking flights and sending emails 21:17 Mock 4: Right for users, wrong for short-term metrics 27:09 Bart's full scoring reveal 32:50 The 40 minute rule for proactive safety mentions 33:38 Anthropic vs OpenAI vs Google: hardest safety round 34:48 The one question every AI PM candidate should be ready for --- 🏆 Two things to consider: 1. AI Tools Bundle: A full year of Mobbin, Arize, Relay, Dovetail, Linear, Magic Patterns, Deepseek, Reforge, Build, Descript, and Speechify with an annual paid newsletter sub - https://bundle.aakashg.com 2. Land a PM Job: 12-week cohort with Aakash, Ankit, Prasad, and Bart - https://www.landpmjob.com/ --- Key Takeaways: 1. SHIR is the framework that buys you 30 seconds of structured thinking - Severity, Harm scope, Immediacy, Reversibility. Run any safety question through these four words before you say a word about your solution. Most candidates jump to "pull the feature" or "ship it anyway." SHIR gets you to a guardrail-plus-audit answer that actually matches how senior PMs think. 2. At the CPO and VP level, sizing business impact is table stakes - The pull costs $50M. The guardrails cost $200K and two weeks. The full retrain costs $2M and three months. If you cannot put numbers next to each path, you are not interviewing at the right altitude. 3. Safety is evaluated across the entire loop, not in one round - Meta embedded safety thinking inside the product sense rubric itself. If you make it 40 minutes into a 60-minute interview without mentioning safety, you have probably already lost points you cannot recover. 4. Reframe revenue arguments as headline arguments - When the VP says "we cannot pull this before earnings," your move is to ask whether the company can afford the headline that you knew the AI was giving dangerous medical advice and let it ship anyway. That converts a $50M quarter risk into a $5B brand risk in one sentence. 5. Agent safety has three pillars - Scope (spending caps and category limits), confirmation (forked by stakes, with push notifications and undo windows for medium actions), and reversibility (pending states, send delays, anomaly detection on top). Memorize this stack for any agent question. 6. Liability for AI agents almost always lands on the platform - Because you designed the guardrails. Frame your answer around how you reduce risk through scope limits and confirmation flows, then acknowledge the legal gray area and the jurisdiction-by-jurisdiction nuance. 7. No questions asked refunds create moral hazard - Prasad's pushback on Aakash here is the lesson. Refunds are the safety net. Scope limits are the railing. Build both. If you only build the refund, users will test the limit. 8. Anthropic has the hardest safety round in the industry - Expect 45 to 60 minutes on safety alone. Read up on constitutional AI and the founding story before you walk in. Practice both situational and historical behavioral answers out loud, and watch the recording back. --- 👨‍💻 Where to find Ankit Virmani: LinkedIn: https://www.linkedin.com/in/ankitvirmani/ 👨‍💻 Where to find Prasad Reddy: LinkedIn: https://www.linkedin.com/in/prasad-09/ 👨‍💻 Where to find Dr. Bart Jaworski: LinkedIn: https://www.linkedin.com/in/bart-jaworski/ 👨‍💻 Where to find Aakash: Twitter: https://www.x.com/aakashg0 LinkedIn: https://www.linkedin.com/in/aakashgupta/ Newsletter: https://www.news.aakashg.com #aipm #pminterview --- 🧠 About Product Growth: The world's largest podcast focused solely on product + growth, with over 200K+ listeners. 🔔 Subscribe and turn on notifications to get more videos like this.

Aakash GuptahostAnkit VirmaniguestPrasad ReddyguestDr. Bart Jaworskiguest
May 3, 202638mWatch on YouTube ↗

CHAPTERS

  1. Why the AI PM safety & ethics round is a stealth evaluation across interviews

    Aakash and Ankit argue that many candidates treat safety/ethics as a single checkbox round, but it’s embedded throughout product sense and decision-making. They emphasize that failure to proactively address harms can sink otherwise strong PM performance, especially in high-stakes domains.

    • Safety thinking is evaluated throughout the loop, not only in a dedicated round
    • At Meta, lack of harm/mitigation thinking hurts product sense scores
    • Real-world products (e.g., Uber) raise the stakes from digital to physical harm
    • Candidates often underestimate how central safety is to hiring decisions
  2. Why even senior candidates freeze: unformalized safety reasoning under pressure

    Prasad explains that experienced leaders often struggle because they’ve rarely had to explicitly structure their safety reasoning in interview form. At VP/CPO levels, inability to handle liability, board implications, and ethical tradeoffs can end the candidacy quickly.

    • Safety/ethics is “table stakes” in regulated or high-liability areas (e.g., healthcare)
    • Exec candidates are expected to reason about board-level implications and liability
    • Freezing happens because many haven’t practiced formal safety articulation
    • Interviews demand explicit frameworks and clear escalation/decision logic
  3. The SHIR framework (Severity, Harm scope, Immediacy, Reversibility) + sizing business impact

    Aakash introduces SHIR as a quick way to structure responses, including asking for a brief pause to organize thoughts. Prasad adds a crucial executive lens: quantify the cost of options (pull, guardrails, retrain) alongside risk to make tradeoffs concrete.

    • SHIR: severity of worst case, number affected, how soon harm occurs, and undo-ability
    • Use a short pause to structure an answer before speaking
    • Don’t just assess risk—quantify business costs for each mitigation path
    • High scorers size the problem before prescribing solutions
  4. Mock 1: Medical chatbot contradicts clinical guidelines—guardrails, audit, and escalation path

    Aakash responds to a scenario where a consumer chatbot occasionally gives medical advice contradicting guidelines. He prioritizes harm severity, proposes immediate guardrails, audits prior queries to measure incidence, and involves legal due to liability exposure.

    • Medical misinformation is high-severity physical harm and liability risk
    • Collect data: % of medical queries and % that contradict guidelines
    • Implement fast guardrails (classification + disclaimers/verified links)
    • Run an audit; escalate to filtering/disable medical topics if above threshold
    • Engage legal early; treat as safety decision, not feature optimization
  5. Earnings pressure follow-up: reframing to headline/brand risk and documenting dissent

    When a VP resists action due to upcoming earnings, Aakash reframes the decision as avoiding catastrophic headline and brand damage. He pushes for a minimally disruptive guardrail, re-sizes the actual revenue impact, and emphasizes documenting risk recommendations if overruled.

    • Shift framing from quarterly revenue to reputational/headline risk magnitude
    • Offer a compromise: guardrails instead of a full pull
    • Re-estimate the business impact based on actual affected query volume
    • Document concerns and route to safety teams to create an internal record
    • Balance escalation with organizational realities (“agree to disagree” if necessary)
  6. Mock 2: Hiring tool shows 15% demographic gap—pause auto-rejects and prepare board transparency

    Prasad addresses an AI hiring tool with a demographic recommendation gap, rejecting the “data vs model” debate in favor of outcome responsibility. He pauses auto-rejects for the affected segment, introduces human review, and plans transparent communication to the board to avoid later surprises.

    • Outcome matters more than whether root cause is data or model
    • Bias in hiring can trigger EEOC issues and class-action liability
    • Immediate mitigation: stop automated rejections; keep humans in the loop
    • Board strategy: disclose early with a plan and timeline; boards punish surprises
    • Draws on prior regulated-domain experience to justify decisive action
  7. Competitor pressure: speed vs safety as long-term advantage (audits, enterprise requirements, legal trend)

    Pressed that competitors test less and move faster, Prasad reframes safety work as strategic risk management and market advantage. He cites increasing enforcement and buyer expectations, arguing a short audit delay is trivial compared to multi-year legal exposure and reputational damage.

    • Reframe: safety isn’t “slow,” it prevents existential downside
    • Regulatory pressure is rising (e.g., discrimination cases)
    • Enterprise procurement increasingly demands bias audits
    • Compare timelines: audit days vs lawsuits lasting years and costing hundreds of millions
    • Position “moral high ground” as competitive advantage
  8. Program promotion interlude: coaching/cohort pitch (brief)

    Aakash briefly shifts to promoting his coaching cohort, describing structure, outcomes, and guarantees. This segment is largely logistical and marketing-focused before returning to the mock interview content.

    • 12-week cohort structure and weekly sessions
    • Resume/LinkedIn/portfolio plus mock interview coverage
    • Claims of strong placement outcomes and interview guarantees
    • Returns to episode after the pitch
  9. Mock 3: Agent safety for bookings/purchases/emails—caps, confirmations, undo windows, anomaly detection

    Aakash proposes a product safety framework for autonomous agents that can take financial actions. He focuses on spending scope limits, tiered confirmations, reversibility via pending states/undo windows, and anomaly detection when behavior deviates from user norms.

    • Set spending caps during onboarding (per transaction and per trip)
    • Use confirmation levels based on stakes (none/soft/hard confirmation)
    • Build reversibility: delays, pending states, and undo windows (like order cancel)
    • Prevent irreversible actions without explicit confirmation or buffer
    • Add anomaly detection for out-of-pattern high-risk actions
  10. Liability and refunds: balancing user trust vs moral hazard in agent errors

    In a follow-up about an agent mistakenly booking $5,000 in flights, Aakash distinguishes legal ambiguity from product strategy and argues for designing toward refunds and partner policies. Prasad pushes back that unconditional refunds can create moral hazard, emphasizing prevention via guardrails first and refunds as a safety net.

    • Liability can be jurisdiction-dependent and legally unsettled
    • Product strategy: protect user trust with refund/cancellation mechanisms
    • Negotiate partner policies recognizing agent-originated bookings
    • Counterpoint: overly generous refunds invite abuse (moral hazard)
    • Guardrails prevent incidents; refunds should be a backstop, not the default
  11. Mock 4: User-first decision that hurts short-term metrics—rebuilding the metric model to escape a local maximum

    Ankit shares a Facebook Reels ranking story where optimizing for clicks created clickbait and poor satisfaction. He reframed success around engagement quality and long-term retention, sequenced evidence to earn trust, and redesigned the value model to require success across multiple stages.

    • Click-optimized systems can degrade satisfaction and diversity
    • Reframe debate: short-term clicks vs long-term retention and health
    • Change value model from additive to multiplicative (must win across stages)
    • Sequence rollout: start with lower-risk signal to build credibility
    • Reported outcomes: DAU lifts and eventual revenue stabilization via better sessions
  12. Ethics escalation scenario: leadership ships with known safety issue—context, written escalation, and ethics channels

    Asked what to do if leadership knowingly ignores a safety issue, Ankit starts by gathering context and verifying the risk. If unresolved, he advocates formal written documentation to management chains, escalation to relevant teams, and using ethics channels; if active harm persists, he’d reconsider staying.

    • Start with context gathering—don’t assume motives or full information
    • Verify the issue is real and assess current mitigation plans
    • Escalate via formal written memo up the chain (not ephemeral Slack)
    • Use ethics channels if leadership still won’t act
    • If users face active harm and paths fail, consider leaving
  13. Scoring reveal + meta-lessons: what separated top answers and how to avoid sounding scripted

    Bart reveals scoring and declares Prasad the winner by a slim margin, noting all answers were hire-worthy. The group reflects on what made answers strong (clear structure, real examples, linear storytelling) and flags a modern pitfall: sounding overly polished or like you’re reading an AI-generated script.

    • Bart’s evaluation: all pros; differences are marginal and context-dependent
    • Prasad’s use of real-world precedent boosted credibility
    • Ankit’s clarity and linear story flow made answers easy to follow
    • Warning: overly structured/polished delivery can backfire in interviews
    • Tip: track practice transcripts (e.g., in an AI project) to measure improvement
  14. Rapid-fire: proactive safety (40-minute rule), hardest safety round (Anthropic), prep method, and the must-ask question

    The episode ends with rapid-fire guidance: mention safety proactively across interviews, not only in a “safety round.” Aakash calls Anthropic the hardest due to its safety-first culture, recommends SHIR plus out-loud recorded practice, and shares his single favorite interview question about unintended harm.

    • “40-minute rule”: if you haven’t mentioned safety by minute 40, bring it in
    • Safety should appear in most interviews during an on-site loop
    • Anthropic is portrayed as the toughest safety round (45–60 minutes)
    • Prep: learn SHIR, practice out loud on video—not just text prompting an LLM
    • One must-ask question: “Tell me about a time your product caused unintended harm.”

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.