Skip to content
YC Root AccessYC Root Access

Tavus: The AI Human Platform

Tavus is building real-time AI humans — systems that can see you, hear you, and respond with natural expression, emotion, and context. What began as personalized video has grown into a full platform used by companies from startups to the Fortune 10. The team recently raised a $40M Series B to advance this vision, introducing PALs: agentic AI humans that can perceive, reason, and act on their own. In this conversation with YC’s Diana Hu, founders Hassaan Raza and Quinn Favret share how they made the leap from generative video to real-time AI humans, the foundational models behind rendering and perception, and why they believe AI humans will become the next major interface for work and communication. Learn more about Tavus at https://www.tavus.io. Chapters: 00:24 – From Personalized Video to AI Humans 01:18 – Why Real-Time Matters 02:36 – How AI Humans See, Hear, and Respond 04:05 – Introducing PALs: Agentic AI Humans 05:42 – The Foundational Models Behind Tavus 07:28 – Building Emotion, Expression, and Context 09:10 – Use Cases From Startups to the Fortune 10 11:00 – Raising the $40M Series B 12:52 – The Future: AI Humans as the Next Interface

Diana HuhostHassaan RazaguestQuinn Favretguest
Nov 14, 202515mWatch on YouTube ↗

CHAPTERS

  1. What Tavus builds: real-time “AI humans” you can call, text, or video chat

    Diana Hu opens by asking what Tavus does, and Hassaan frames the company as an AI research lab building “AI humans.” They emphasize teaching machines to see, hear, respond, act, and present like humans, across video, audio, and text.

    • Tavus positions itself as an AI research lab focused on human-like interaction
    • Core goal: machines that can see, hear, respond, and appear human
    • Interfaces include video calls, audio calls, and texting
    • Emphasis on always-on availability and natural interaction
  2. Demo walkthrough: an AI coworker that manages schedule and drafts emails

    They show a short “PAL” demo where a human-like agent responds in real time, retrieves calendar context, and proactively drafts an email. The interaction is framed as feeling like a coworker rather than a bot.

    • PAL answers a scheduling question with specific agenda items
    • Agent handles a follow-up task (drafting a “running late” email)
    • Proactive completion: email is prepared without extra prompting
    • Focus on realism and conversational flow
  3. Why real-time latency is non-negotiable for human conversation

    Diana and Hassaan discuss that the experience only works if response times are fast enough to match human conversational rhythm. They cite sub-200ms back-and-forth as a benchmark for a “great” exchange.

    • Low latency is required to preserve conversational “feel”
    • Human-like turn-taking depends on rapid response timing
    • Real-time performance is positioned as a key differentiator
    • Latency ties directly to perceived naturalness and trust
  4. Who uses Tavus today: from startups to Fortune 10 teams building AI employees

    Tavus describes a broad customer base and how companies use Tavus models/interfaces to create “AI employees.” Named examples include Amazon, Better.com, and Alibaba, spanning experimentation to production use.

    • Customer spectrum: small startups through Fortune 10 enterprises
    • Examples mentioned: Amazon, Better.com, Alibaba
    • Customers use Tavus to build AI employee experiences
    • Tavus provides underlying models/interfaces rather than a single app
  5. Three main application buckets: training, healthcare, and go-to-market roles

    Quinn outlines the primary categories where AI humans are being deployed: learning & development, healthcare workflows, and go-to-market functions. The roles range from training instructors to patient intake assistants to AI SDRs and support managers.

    • Learning & development: training and education delivery
    • Healthcare: patient intake, nutrition coaching, elderly companionship
    • Go-to-market: AI SDR, solutions engineer, customer support manager
    • Breadth of use cases is part of the product’s appeal
  6. Origin story pre-ChatGPT: personalized video via lip-sync “infill”

    They rewind to 2020–2021 when model capabilities were limited, and Tavus’s best wedge was scalable personalized video. The approach: record once, then generate thousands of lip-synced variants with personalized names/details.

    • Early constraint: generative models couldn’t yet do full interactive humans
    • Initial product: AI personalized video for outreach/personalization
    • Technique described: lip-sync infill to swap targeted segments at scale
    • Model and research advances later enabled broader capabilities
  7. The pivotal pivot: choosing an SDK/API and research lab path over “AI sales company”

    After Series A, Tavus faced a strategic identity decision: remain a sales-focused application or invest in foundational human-computing models. They chose to churn/customers and refocus on serving the technology as an API/SDK for others to build on.

    • Crucible moment: define company DNA and long-term ambition
    • Decision to avoid being narrowly an “AI sales company”
    • Shift toward building core models and offering them via SDK/API
    • Tradeoff acknowledged: letting go of traction to pursue the bigger vision
  8. Two sides of the platform: rendering human realism + perception/context understanding

    Diana frames Tavus’s advancement as moving beyond rendering into perception. Hassaan explains that facial realism alone isn’t sufficient—AI humans must perceive expressions, gestures, and context to interpret meaning the way people do.

    • Rendering isn’t enough without perception and contextual understanding
    • Goal: capture nuanced nonverbal signals (expressions, gestures)
    • Model the relationship between what’s said and how it’s said
    • Aim: machines that “meet us where we are” across natural modalities
  9. Introducing PALs: agentic, emotionally intelligent AI humans for consumers and prosumers

    They preview an upcoming product launch: Tavus PALs, intended to bring AI humans to non-technical users. The vision is a new interface layer—like moving from command line to GUI—where AI humans are proactive, multimodal, and capable of taking actions.

    • PALs positioned as a step-function UI shift for AI adoption
    • Designed for non-technical users (not just SDK customers)
    • Agentic behavior: proactive, can do tasks on your behalf
    • Multimodal interaction: video, text, calling; emotionally intelligent responses
  10. If AI humans become the next interface: a world of ubiquitous assistants

    They discuss the long-term future: solving “human computing” so interacting with computers feels as natural as talking to a friend. Examples include AI doctors, therapists, and assistants—‘Jarvis/Cortana’ style companions accessible to everyone.

    • Vision: “human computing” makes computer use feel second nature
    • AI assistants integrated into everyday life (health, support, productivity)
    • Scaling expectation: massive volume of Tavus interactions over time
    • Platform expansion: initial PALs as first-class citizens, then broader creation
  11. Concerns and alignment: job displacement vs expanding access and improving experiences

    Diana raises worries about alignment and job replacement. Hassaan argues Tavus aims to replace “bad machines” (degraded automated experiences) and to fill gaps where the alternative is no service at all, especially in areas like therapy access.

    • Acknowledges some job displacement risk rather than denying it
    • Frames near-term target as replacing poor automation, not people
    • Healthcare example: empathetic AI intake can improve telehealth experiences
    • Access argument: AI therapy/help where human services are unavailable/unaffordable
  12. How Tavus builds empathy: better signals, nuanced perception, and human simulation

    Hassaan describes conversation as a nuanced “dance,” requiring the right data signals and models that connect verbal content to facial/behavioral context. They characterize their work as building human simulation models that generate realistic reactions and expressions.

    • Conversation requires modeling nonverbal nuance and timing
    • Perception pipeline collects subtle cues (e.g., eyebrow twitch, micro-smiles)
    • Core modeling: link expression + context + prior utterances to meaning
    • Described as “human simulation models” for reactions and expression
  13. Founder lessons: conviction, momentum, and moving fast as the primary moat

    They close with advice for early founders: maintain conviction in your vision, avoid being overly swayed by external opinions, and prioritize momentum. Quinn emphasizes daily progress; Diana connects it to speed as an early startup moat, echoed by Tavus’s internal motto.

    • Have conviction in the vision; don’t let external opinions steer the company
    • Pivots can require courage, including giving up traction/customers
    • Momentum is portrayed as the critical operating principle
    • Speed as moat: “faster, faster” culture and staying months ahead

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.