Stanford OnlineStanford CS230 | Autumn 2025 | Lecture 3: Full Cycle of a DL project
CHAPTERS
Why AI projects differ from traditional software: code + unpredictable data
Kian frames deep learning projects as fundamentally different from traditional software because performance depends heavily on data, which is rich and hard to fully understand upfront. This uncertainty makes ML development inherently iterative and discovery-driven.
The end-to-end lifecycle of a DL project (using face recognition as the running example)
The lecture outlines the full cycle: specify the problem, get data, design/train models with rapid iteration, deploy, then monitor and maintain. The goal is to broaden focus beyond “just model training” to the real work of shipping reliable systems.
Problem specification: face recognition for door access and key-card verification
Kian describes practical face-recognition deployments: unlocking doors and verifying that a swiped key card matches the person holding it. The use case clarifies the operational goal and the constraints of a security setting.
Modeling approach: Siamese networks and “same person vs different person” matching
The lecture briefly explains the standard face-rec architecture: a Siamese network that compares two images to decide if they’re the same person. This supports easy onboarding of new users via registration photos without retraining per household or office.
Data acquisition under constraints: “no downloading from the internet” brainstorming
Students propose ways to gather training data legally and quickly, from opt-in camera capture to leveraging communities like Stanford. The discussion highlights practicality, recruiting users, and realistic timelines for a small startup.
Guiding principle: optimize for speed of execution and short iteration loops
Kian argues that speed is often the best predictor of success—especially for startups—because you can’t predict what data issues matter until you train and inspect failures. He recommends time-boxing data collection to days, not months, when model training is fast.
From model errors to data-centric improvements: targeted data beats ‘more of everything’
Kian explains how error analysis guides what data to collect next—e.g., if the model fails on hats, gather more “hats” examples. He criticizes indiscriminate data collection and connects targeted data acquisition to how frontier LLM teams improve capabilities like coding.
How similar must training data be to deployment data? Bigger models tolerate mismatch
The class discusses distribution match between training and target tasks. Kian notes modern large neural networks can absorb some irrelevant-but-not-wrong data without harming performance, reducing the old obsession with perfect distribution matching.
Deployment realities: building a system, not just a model
Kian shifts from training to deployment, emphasizing the software engineering required to run inference reliably. He highlights cost constraints: streaming video continuously to the cloud is often infeasible, requiring edge-side filtering.
Edge filtering via Visual Activity Detection (VAD): two implementation options
To reduce cloud compute and bandwidth, Kian introduces VAD as a cheap front-end that decides whether a person is present before invoking full face recognition. He presents two approaches: a simple pixel-change heuristic vs training a small neural net.
Choosing what to build first: implement fast, learn from real data, then upgrade
Kian recommends starting with the simplest thing that can be deployed quickly (pixel-change VAD), then using observed failures to justify a more complex model. He stresses that many questions (wind, background motion, distance) are best answered empirically by deployment.
Frame quality selection: discovering hidden bottlenecks (blurry vs in-focus frames)
A concrete deployment insight: many frames of an approaching face are blurry, and choosing a handful of sharp frames can significantly boost recognition accuracy. This illustrates how system-level improvements emerge only after building and examining end-to-end behavior.
Monitoring & maintenance: data drift, concept drift, and production accountability
After deployment, the world changes—seasons, locations, behavior—causing drift that breaks models that performed well on test sets. Kian argues engineers must own real-world performance, not just benchmark metrics, and build monitoring systems to detect and fix issues.
Operational dashboards and alarms: metrics, privacy, and iterating on what to monitor
Kian recommends collecting limited, permissioned data and building many dashboards early because it’s hard to know in advance which metrics will matter. Teams should brainstorm failure modes, track a wide set of signals, then prune to the most useful and add alerting thresholds.