This video isn’t embeddableWatch on YouTube →

Stanford CS230 | Autumn 2025 | Lecture 3: Full Cycle of a DL project

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai October 7, 2025 This lecture covers the full cycle of a DL project. To learn more about enrolling in this course, visit: https://online.stanford.edu/courses/cs230-deep-learning To follow along with the course schedule and syllabus, visit: https://cs230.stanford.edu/syllabus/ More lectures will be published regularly. View the playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNRRGdS0rBbXOUGA0wjdh1X Andrew Ng Founder of DeepLearning.AI Adjunct Professor, Stanford University’s Computer Science Department Kian Katanforoosh CEO and Founder of Workera Adjunct Lecturer, Stanford University’s Computer Science Department

Kian Katanforooshhost

Oct 15, 20251h 7mWatch on YouTube ↗

CHAPTERS

0:05 – 4:08
Why AI projects are inherently iterative (code + unpredictable data)
Katanforoosh contrasts traditional software engineering (deterministic code) with AI development, where performance is heavily shaped by data you don’t fully understand. He explains why this makes ML/LLM systems empirical and iterative—build, test, discover surprises, then improve.
- •Traditional software: you control behavior via code; AI: outcomes depend on both code and data
- •Data contains “strange and wonderful” edge cases you can’t anticipate upfront
- •Real-world future data is even less controllable than stored historical data
- •LLMs feel hard to control largely because training data is vast and opaque
4:08 – 5:40
The “full cycle” of a deep learning project beyond just modeling
The lecture broadens the view from training models to the full lifecycle: problem specification, data, modeling, iteration, deployment, and long-term maintenance. Katanforoosh emphasizes that many courses over-focus on modeling, while real systems require much more end-to-end work.
- •End-to-end steps: specify problem → get data → design/train/iterate → deploy → monitor/maintain
- •Rapid iteration loop is central: design → train → analyze → adjust
- •Model training is only one part of a practical DL system
- •Goal is to build something that works in the real world, not just a benchmark
5:40 – 9:16
Case study setup: face recognition for door access and key-card verification
He introduces a practical face recognition application: deciding whether to unlock a door (or validating that a swiped key card matches the person). This anchors later discussions about data collection, system design choices, and deployment constraints.
- •Door unlock decision based on camera image(s)
- •Enterprise variant: key-card swipe + photo verification to prevent stolen-card access
- •Constraints vary by deployment setting (home vs office complex)
- •Use case highlights security, latency, and robustness requirements
9:16 – 11:18
Face recognition architecture primer: Siamese networks and registration images
Katanforoosh explains the common approach to face recognition using Siamese networks that compare two images and predict “same person vs different.” This enables per-household or per-company setup without retraining for every new user base.
- •Siamese network takes two images and outputs similarity/same-different decision
- •System uses stored “registration” photos for authorized users
- •Avoids retraining the model for each deployment (e.g., every home)
- •Corporate flow: compare live capture to the ID-card’s registered face
11:18 – 16:02
Interactive scenario: how to collect training data without scraping the internet
Students propose multiple strategies to gather face data under legal constraints (no internet downloads). The discussion surfaces practical tradeoffs between creativity, feasibility, and time-to-first-model.
- •Ideas: build/partner with video services (e.g., Zoom-like), opt-in workplace capture, recruit friends/students
- •Key consideration: how to get users and consent ethically
- •Timeline question: how long to collect data before training starts?
- •Different approaches fit different org sizes (startup vs large company)
16:02 – 26:09
A speed-first principle for early data collection and learning
He advocates prioritizing speed of execution—get a small imperfect dataset fast (often in 1–2 days) to start the learning loop. He warns against large upfront data investments when you don’t yet know what data you actually need.
- •Speed of iteration is a major predictor of project/startup success
- •Aim for quick dataset creation (days), even if smaller/lower quality
- •Training can be fast; don’t let data collection become the bottleneck
- •Cautionary tale: expensive data acquisitions can be hard to monetize if value is unclear
26:09 – 30:50
Using error analysis to decide what data to improve (data-centric AI)
Katanforoosh answers how model failures guide next steps: examine errors, identify weak subcases (e.g., hats), then collect/engineer targeted data. He argues that “just get more data” is inefficient without knowing which slices matter.
- •Error analysis reveals which conditions/users cause failures (hats, glasses, scarves, angles, blur)
- •Data-centric AI: systematically improve datasets to drive performance
- •Blindly collecting more of everything is slow and expensive
- •Frontier LLM teams improve specific capabilities (e.g., coding) via targeted high-quality data
30:50 – 36:37
Data quality and distribution mismatch: how similar must training data be?
The lecture discusses data quality (e.g., sharp vs blurry images; edited text for LLMs) and whether training data must match the deployment distribution. He argues modern large models can absorb somewhat off-distribution but non-wrong data, reducing the old obsession with perfect matching.
- •High-quality data can outperform large amounts of low-quality data (books vs random chat; sharp vs blurry)
- •Exact train/test distribution matching is less critical with large-capacity models
- •Irrelevant-but-not-wrong data often doesn’t hurt and may help
- •Risk is more about incorrect/mislabeled data than mildly different data
36:37 – 39:39
Deployment reality: streaming constraints and the need for edge filtering
He shifts to deployment challenges: continuously streaming video to the cloud is too expensive/slow for many real systems. A practical architecture adds an inexpensive edge “gate” that decides when to invoke the heavier face recognition model.
- •Deploying models requires significant engineering beyond training
- •Full-frame, 24/7 cloud inference is often cost/latency prohibitive
- •Introduce a lightweight front-end filter (Visual Activity Detection) on-device
- •Only send/select frames when a person is likely present
39:39 – 50:21
Design choice: simple pixel-change VAD vs a small neural network (and why speed wins)
Two VAD options are compared: a non-ML pixel-difference threshold versus training a lightweight neural network. Katanforoosh emphasizes implementing the fastest workable approach first to learn from real data, then upgrading as needed.
- •Option 1: threshold on fraction of pixels changed (very fast to implement)
- •Option 2: small NN to detect a person (more effort, potentially more robust)
- •Many design questions are empirical (trees swaying, cars passing, pets triggering)
- •Start with quick-and-dirty, learn failures, then iterate toward the better approach
50:21 – 52:44
Practical insight discovered in deployment: frame selection and blur matters
He gives a concrete example of iterative learning: many frames are blurry as someone walks, but selecting a few in-focus frames can substantially improve recognition accuracy. This illustrates how deploying even a basic system reveals opportunities that aren’t obvious upfront.
- •Real video contains variable-quality frames; motion introduces blur
- •Selecting multiple high-quality frames can boost downstream face recognition
- •VAD can evolve into a richer pre-processing stage (detect face + assess focus)
- •Key theme: implementation reveals hidden bottlenecks and new features to build
52:44 – 54:26
What is “good” accuracy? Using human-level baselines and beyond
Katanforoosh addresses how to judge model performance and why a reference level matters for diagnosing issues like bias/variance. For face recognition, he notes that systems can outperform humans in controlled settings, which changes how benchmarking feels.
- •Reference/aspirational performance (often human-level) helps guide iteration
- •Face recognition can exceed human performance under controlled conditions
- •Once systems surpass humans, evaluation and improvement become trickier
- •Some tasks lack strong human baselines (e.g., recommendations)
54:26 – 1:01:34
Monitoring & maintenance: data drift, concept drift, and owning real-world performance
The final segment stresses that deployment isn’t the end: the world changes, distributions shift, and models degrade. He argues engineers should own production performance, detect drift, and refresh data/models as needed.
- •Data drift: input distribution changes (seasons, geography, attire like sunglasses/scarves)
- •Concept drift: the input-output mapping changes over time
- •Examples: new search trends, factory process changes, traffic lights differ by state
- •Engineering mindset: “works on test set” ≠ “works in production”—fix the real system
1:01:34 – 1:07:04
Dashboards, metrics, and alerting: practical operations for ML systems
He recommends brainstorming failure modes and instrumenting many metrics early, then pruning down to the few that matter. By plotting trends and setting alert thresholds, teams can catch issues early and trigger investigation before users are impacted.
- •Brainstorm what can go wrong; instrument metrics to detect it
- •Collect limited, permissioned telemetry while respecting privacy
- •Track operational metrics: re-authentication rate, accept/reject rates, latency, etc.
- •Start with many dashboards; later prune and add alarms/thresholds for anomalies

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

iOS

Android

Claude

Chrome

Why AI projects are inherently iterative (code + unpredictable data)

The “full cycle” of a deep learning project beyond just modeling

Case study setup: face recognition for door access and key-card verification

Face recognition architecture primer: Siamese networks and registration images

Interactive scenario: how to collect training data without scraping the internet

A speed-first principle for early data collection and learning

Using error analysis to decide what data to improve (data-centric AI)

Data quality and distribution mismatch: how similar must training data be?

Deployment reality: streaming constraints and the need for edge filtering

Design choice: simple pixel-change VAD vs a small neural network (and why speed wins)

Practical insight discovered in deployment: frame selection and blur matters

What is “good” accuracy? Using human-level baselines and beyond

Monitoring & maintenance: data drift, concept drift, and owning real-world performance

Dashboards, metrics, and alerting: practical operations for ML systems

Get more out of YouTube videos.