CHAPTERS
What Tavus builds: real-time “AI humans” you can call, text, or video chat
Diana Hu opens by asking what Tavus does, and Hassaan frames the company as an AI research lab building “AI humans.” They emphasize teaching machines to see, hear, respond, act, and present like humans, across video, audio, and text.
- •Tavus positions itself as an AI research lab focused on human-like interaction
- •Core goal: machines that can see, hear, respond, and appear human
- •Interfaces include video calls, audio calls, and texting
- •Emphasis on always-on availability and natural interaction
Demo walkthrough: an AI coworker that manages schedule and drafts emails
They show a short “PAL” demo where a human-like agent responds in real time, retrieves calendar context, and proactively drafts an email. The interaction is framed as feeling like a coworker rather than a bot.
- •PAL answers a scheduling question with specific agenda items
- •Agent handles a follow-up task (drafting a “running late” email)
- •Proactive completion: email is prepared without extra prompting
- •Focus on realism and conversational flow
Why real-time latency is non-negotiable for human conversation
Diana and Hassaan discuss that the experience only works if response times are fast enough to match human conversational rhythm. They cite sub-200ms back-and-forth as a benchmark for a “great” exchange.
- •Low latency is required to preserve conversational “feel”
- •Human-like turn-taking depends on rapid response timing
- •Real-time performance is positioned as a key differentiator
- •Latency ties directly to perceived naturalness and trust
Who uses Tavus today: from startups to Fortune 10 teams building AI employees
Tavus describes a broad customer base and how companies use Tavus models/interfaces to create “AI employees.” Named examples include Amazon, Better.com, and Alibaba, spanning experimentation to production use.
- •Customer spectrum: small startups through Fortune 10 enterprises
- •Examples mentioned: Amazon, Better.com, Alibaba
- •Customers use Tavus to build AI employee experiences
- •Tavus provides underlying models/interfaces rather than a single app
Three main application buckets: training, healthcare, and go-to-market roles
Quinn outlines the primary categories where AI humans are being deployed: learning & development, healthcare workflows, and go-to-market functions. The roles range from training instructors to patient intake assistants to AI SDRs and support managers.
- •Learning & development: training and education delivery
- •Healthcare: patient intake, nutrition coaching, elderly companionship
- •Go-to-market: AI SDR, solutions engineer, customer support manager
- •Breadth of use cases is part of the product’s appeal
Origin story pre-ChatGPT: personalized video via lip-sync “infill”
They rewind to 2020–2021 when model capabilities were limited, and Tavus’s best wedge was scalable personalized video. The approach: record once, then generate thousands of lip-synced variants with personalized names/details.
- •Early constraint: generative models couldn’t yet do full interactive humans
- •Initial product: AI personalized video for outreach/personalization
- •Technique described: lip-sync infill to swap targeted segments at scale
- •Model and research advances later enabled broader capabilities
The pivotal pivot: choosing an SDK/API and research lab path over “AI sales company”
After Series A, Tavus faced a strategic identity decision: remain a sales-focused application or invest in foundational human-computing models. They chose to churn/customers and refocus on serving the technology as an API/SDK for others to build on.
- •Crucible moment: define company DNA and long-term ambition
- •Decision to avoid being narrowly an “AI sales company”
- •Shift toward building core models and offering them via SDK/API
- •Tradeoff acknowledged: letting go of traction to pursue the bigger vision
Two sides of the platform: rendering human realism + perception/context understanding
Diana frames Tavus’s advancement as moving beyond rendering into perception. Hassaan explains that facial realism alone isn’t sufficient—AI humans must perceive expressions, gestures, and context to interpret meaning the way people do.
- •Rendering isn’t enough without perception and contextual understanding
- •Goal: capture nuanced nonverbal signals (expressions, gestures)
- •Model the relationship between what’s said and how it’s said
- •Aim: machines that “meet us where we are” across natural modalities
Introducing PALs: agentic, emotionally intelligent AI humans for consumers and prosumers
They preview an upcoming product launch: Tavus PALs, intended to bring AI humans to non-technical users. The vision is a new interface layer—like moving from command line to GUI—where AI humans are proactive, multimodal, and capable of taking actions.
- •PALs positioned as a step-function UI shift for AI adoption
- •Designed for non-technical users (not just SDK customers)
- •Agentic behavior: proactive, can do tasks on your behalf
- •Multimodal interaction: video, text, calling; emotionally intelligent responses
If AI humans become the next interface: a world of ubiquitous assistants
They discuss the long-term future: solving “human computing” so interacting with computers feels as natural as talking to a friend. Examples include AI doctors, therapists, and assistants—‘Jarvis/Cortana’ style companions accessible to everyone.
- •Vision: “human computing” makes computer use feel second nature
- •AI assistants integrated into everyday life (health, support, productivity)
- •Scaling expectation: massive volume of Tavus interactions over time
- •Platform expansion: initial PALs as first-class citizens, then broader creation
Concerns and alignment: job displacement vs expanding access and improving experiences
Diana raises worries about alignment and job replacement. Hassaan argues Tavus aims to replace “bad machines” (degraded automated experiences) and to fill gaps where the alternative is no service at all, especially in areas like therapy access.
- •Acknowledges some job displacement risk rather than denying it
- •Frames near-term target as replacing poor automation, not people
- •Healthcare example: empathetic AI intake can improve telehealth experiences
- •Access argument: AI therapy/help where human services are unavailable/unaffordable
How Tavus builds empathy: better signals, nuanced perception, and human simulation
Hassaan describes conversation as a nuanced “dance,” requiring the right data signals and models that connect verbal content to facial/behavioral context. They characterize their work as building human simulation models that generate realistic reactions and expressions.
- •Conversation requires modeling nonverbal nuance and timing
- •Perception pipeline collects subtle cues (e.g., eyebrow twitch, micro-smiles)
- •Core modeling: link expression + context + prior utterances to meaning
- •Described as “human simulation models” for reactions and expression
Founder lessons: conviction, momentum, and moving fast as the primary moat
They close with advice for early founders: maintain conviction in your vision, avoid being overly swayed by external opinions, and prioritize momentum. Quinn emphasizes daily progress; Diana connects it to speed as an early startup moat, echoed by Tavus’s internal motto.
- •Have conviction in the vision; don’t let external opinions steer the company
- •Pivots can require courage, including giving up traction/customers
- •Momentum is portrayed as the critical operating principle
- •Speed as moat: “faster, faster” culture and staying months ahead
