CHAPTERS
- 0:00 – 1:07
Kawal Gandhi’s path from Search/Ads to leading GenAI for Google Cloud
Sarah opens by introducing Kawal Gandhi and his background at Google. Kawal explains how working with advertisers’ needs around compute, storage, and ML pipelines naturally pulled him toward Cloud and ultimately generative AI.
- •From ads/search (shopping, travel) to infra-heavy customer needs
- •Cloud solutions: analytics, ML pipelines, conversational/document AI
- •GenAI focus: improving customer experiences using enterprise information
- 1:07 – 2:23
Why AI became a core differentiator for Google Cloud (data, latency, security)
Kawal describes the early motivations for building AI into GCP: customers wanted high performance and better ways to use data. He frames Google Cloud’s origins around bringing Google-grade data/AI infrastructure—plus privacy and security—to regulated and unregulated customers.
- •Customer pull: latency, response times, and data-driven experiences
- •AI/ML positioned as a differentiator from the beginning
- •Adapting internal Google tooling for external, regulated environments
- •Helping customers leverage existing investments on GCP
- 2:23 – 3:34
Internal “dogfooding” → Duet AI in Workspace: first real GenAI rollouts
Elad asks about Google’s internal use cases that shaped external offerings. Kawal points to Workspace as an early proving ground for summarization, personalization, and suggestions that later shipped as Duet AI.
- •Workspace as an early internal testbed for GenAI
- •Summarize, personalize, suggest content across docs/email
- •Internal experiments graduate into customer-facing products
- •Productivity and efficiency gains as the key validation
- 3:34 – 5:24
Most-used Workspace capabilities: docs, slides, email—and the security layer
Sarah probes which Duet features saw the strongest uptake. Kawal highlights broad adoption across document generation/summarization, slide image creation, and email drafting—while emphasizing that enterprise-grade security is a core requirement.
- •High adoption across Docs: generation and summarization
- •Slides: image suggestions and rapid creative iteration
- •Email generation as a major productivity lever
- •Security considerations in message/link handling
- 5:24 – 7:04
Vertex AI, Model Garden, and domain-specific models (Med-PaLM, Sec-PaLM)
Elad asks how Google decides what models to expose and when. Kawal lays out the stack: AI infrastructure, Vertex AI capabilities, and Model Garden—spanning first-party, customer-trained, and open-source models—while stressing operational tooling and safety.
- •Three layers: infra → Vertex capabilities → model access in Model Garden
- •Domain models (e.g., Med-PaLM) for structured clinical workflows
- •Support for customer-trained models and open-source options (e.g., LLaMA)
- •Differentiation via ops: drift management, tooling, safety, secure data
- 7:04 – 9:00
Enterprise adoption sequence: prototypes first, then deployment/monitoring reality
The discussion turns to how enterprises actually adopt LLMs. Kawal agrees that teams often start with quick prototypes, then confront production needs like monitoring, model chaining, and reliability—converting early excitement into sustained programs.
- •Common sequence: API prototype → iteration → productionization
- •Production requirements: monitoring, managing, chaining models
- •“Real excitement,” not hype—engineers exploring what’s now possible
- •Platform evolves alongside observed customer usage
- 9:00 – 10:03
Build vs fine-tune vs use existing models: maturity, cost, guardrails, and org change
Sarah asks when customers should train their own models. Kawal frames it as a responsibility and maturity question—covering guardrails, continuous learning costs, whether teams can operate models, and the need for board-level and cultural alignment.
- •Model building requires guardrails, clear use cases, and budget
- •Org maturity matters: skills to build, tune, deploy, maintain
- •Strategic/board-level planning plus cultural transformation
- •Where you are in the adoption cycle should drive the approach
- 10:03 – 12:02
Use-case flywheel: efficiency → productivity → trust → creativity (and agents)
Elad asks about common customer use cases and how they break down. Kawal presents a progression: start with efficiency (e.g., support), build productivity use cases, increase trust through KPI validation, and ultimately enable more creative work and agent-like systems.
- •Efficiency first: improve workflows like customer support without harming CSAT
- •Then productivity: recommendations, promotions, internal tooling
- •Trust as the gating factor (hallucinations, reliability)
- •Longer-term: systems of intelligence/agents as trust increases
- 12:02 – 13:21
Where adoption shows up first: sales/marketing, customer care, and employee experience
Sarah asks which verticals are adopting fastest. Kawal argues early uptake is often horizontal by department—sales/marketing content workflows, customer care, and internal employee experiences like HR/benefits discovery—spanning regulated and unregulated industries.
- •Early wins in sales/marketing: content creation and distribution
- •Customer care as a high-impact, cross-industry use case
- •Internal “employee experience” chat/search for HR and processes
- •Horizontal + vertical intersection rather than a single industry leader
- 13:21 – 15:07
Multimodal AI: roadmap, trust/safety risks, and early industries (e.g., gaming)
The conversation shifts to multimodality beyond text. Kawal describes an evolution from text to images to audio and combined media, noting safety challenges like deepfakes and voice misuse, plus infra needs for storage/retrieval and low-latency delivery.
- •Multimodal progression: text → images → audio + combined media
- •Trust and safety concerns: deepfakes, identity/voice misuse
- •Infra challenges: storing, retrieving, and scaling rich media
- •Gaming as an early multimodal adopter; broader rollout requires latency improvements
- 15:07 – 17:26
What makes GenAI programs succeed: commitment, positioning, and cost curves improving
Sarah asks about patterns that make customers successful and what’s expensive. Kawal emphasizes organizational commitment (visionary vs fast-follower) and notes that model/platform costs are falling quickly, accelerating adoption via training/certification and new productivity use cases.
- •Success correlates with strategic commitment and long-term investment mindset
- •Governments and enterprises show unusually high interest—still early days
- •Compute/model access costs trending down rapidly
- •Rapid workforce enablement: large-scale certifications and training
- •New use cases: coding productivity, system migration, workflow acceleration
- 17:26 – 21:20
Avoiding anti-patterns: production guardrails, security drills, and preventing costly rollbacks
Sarah asks about big mistakes customers make. Kawal focuses on preventing failures through rigorous security, tenant isolation, and checkpoints—because production rollbacks are expensive and erode trust in the industry and within organizations.
- •Primary risk area: data handling and IP protection (weights, adapters)
- •Tenant isolation and strict entitlement boundaries
- •Security drills and checkpoints to avoid moving “too fast”
- •Rollbacks are costly—better to prevent incidents than recover from them
- 21:20 – 23:02
Developer workflow integration: where the “bot in the room” should live (docs, IDEs)
Sarah asks where AI should integrate into real product and engineering workflows. Kawal says it starts in shared docs and can extend into notebooks/IDEs, but scaling requires governance: entitlements, logging, and guardrails—especially for generated operational tooling.
- •Docs as the current hub for AI-assisted product/spec work
- •Expansion into notebooks and IDEs for code and API scaffolding
- •Scaling concerns: governance, access control, and audit logging
- •Promise: auto-generating internal tools (e.g., drift monitoring) from specs
- 23:02 – 29:52
TPUs vs GPUs, inference scaling, and how the NVIDIA shortage changes the conversation
Elad explores Google’s silicon advantage and TPU/GPU trade-offs. Kawal highlights optionality and the need to abstract infrastructure, then pivots to inference as the bigger scaling challenge—especially as apps grow and multimodal increases load—while noting the GPU shortage often gives way to deeper discussions about deployment, security, and regionalization.
- •Goal: abstract infra so teams focus on projects, not hardware complexity
- •TPU vs GPU: optionality and expertise/training as practical constraints
- •Industry shift toward inference scaling and cost control (post-training)
- •Multimodal will amplify inference and infrastructure demands
- •GPU shortage comes up, but deployment/security/regionalization often dominate decisions
- 29:52 – 32:33
What’s next: multimodal UX, data for training, synthetic data, and partner data marketplaces
Sarah closes by asking what Kawal is most excited about. Kawal points to next-gen multimodal user experiences and the growing importance of datasets (including synthetic data for regulated industries), then explains how partners provide marketplace and RLHF/labeling services integrated into GCP with strong security boundaries.
- •Near-term focus: real multimodal use cases and customer partnerships
- •Data needs becoming central to model training strategies
- •Synthetic data as a tool for regulated industries and missing-data scenarios
- •Partner ecosystem provides data/labeling/RLHF services via marketplace integrations
- •Security and isolation requirements extend to partner pipelines
