Stanford CS230 | Autumn 2025 | Lecture 1: Introduction to Deep Learning

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai September 23, 2025 This lecture covers: 1. Class introduction 2. Examples of deep learning projects 3. Course details To learn more about enrolling in this course, visit: https://online.stanford.edu/courses/cs230-deep-learning To follow along with the course schedule and syllabus, visit: https://cs230.stanford.edu/syllabus/ More lectures will be published regularly. View the playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNRRGdS0rBbXOUGA0wjdh1X Andrew Ng Founder of DeepLearning.AI Adjunct Professor, Stanford University’s Computer Science Department Kian Katanforoosh CEO and Founder of Workera Adjunct Lecturer, Stanford University’s Computer Science Department

Andrew Nghost

Oct 1, 20251h 0mWatch on YouTube ↗

CHAPTERS

0:05 – 2:07
Course format: flipped classroom, discussion-first sessions
Andrew Ng opens by welcoming new Stanford students and explaining CS230’s flipped classroom structure. Most lecture content is delivered via high-quality online videos, while in-person time is reserved for deeper discussion and Q&A.
- •Flipped format: watch lectures online, use class for discussion
- •Encouragement to ask questions frequently in class
- •Class sessions typically run ~80 minutes despite longer scheduled slot
- •Goal: maximize value of in-person time vs repeating lectures
2:07 – 4:40
Why deep learning took over: scaling data, model size, and compute
Ng describes the core reason deep learning surged: it keeps improving as you scale data and neural network size, unlike many traditional ML methods that plateau. He frames deep learning as a practical recipe for absorbing massive datasets to achieve exceptional performance.
- •Traditional ML often plateaus with more data
- •Neural networks keep improving with larger models and more data
- •Deep learning’s success tied to scalable training recipes
- •Performance gains correlate with model size and compute
4:40 – 6:41
Early GPU/Stanford foundations and the rise of scaling laws
He shares historical context on Stanford’s early GPU/CUDA work and how it enabled training larger networks. He then connects this to the idea of predictable performance improvements (scaling laws) that fueled major industry investment in large models.
- •CUDA/GPU training was controversial early on but transformative
- •Stanford’s early work helped accelerate deep learning scaling
- •Baidu paper: predictability of gains when scaling networks
- •OpenAI scaling laws popularized forecasting performance vs compute/data
6:41 – 12:03
Where CS230 sits in the AI stack: CS fundamentals → ML → DL → GenAI
Ng maps deep learning’s place within computer science and AI: CS fundamentals support ML, deep learning is a powerful subset of ML, and generative AI largely builds on deep learning (especially transformers). He emphasizes that understanding fundamentals improves results even in the era of AI-assisted coding.
- •CS fundamentals remain critical even with AI coding tools
- •Deep learning and neural networks used interchangeably in this course
- •GenAI (often transformers) builds on deep learning foundations
- •Prompting alone often isn’t enough; deeper DL work is sometimes required
12:03 – 15:13
Prereqs and course choices: CS129 vs CS229 vs CS230
In response to student questions, Ng clarifies that machine learning isn’t a strict prerequisite, though the first weeks can move fast for beginners. He compares common Stanford entry points—CS129 (gentler, applied), CS229 (fast, mathematical), and CS230 (applied deep learning focus)—and notes CS229+CS230 can be taken together with limited overlap.
- •ML not required, but helpful for early weeks
- •CS129: gentler introduction, more time on basics
- •CS229: highly mathematical/theoretical, broader ML survey
- •CS230: applied deep learning focus; low overlap with CS229
15:13 – 19:04
What the course will and won’t cover: practical DL over cutting-edge LLM training
Ng explains that CS230 will cover transformer foundations and practical workflows (e.g., fine-tuning), but not the very latest frontier model variants or training massive LLMs from scratch. He argues most industry roles build applications and adapt models rather than train the largest models end-to-end.
- •Transformers covered; newest LLM variants not a core focus
- •Most jobs are application-building, not training frontier LLMs
- •Fine-tuning pretrained models and data engineering are common in practice
- •Course is “math-light” and oriented toward building working systems
19:04 – 23:06
When to use GenAI prompting vs deep learning: modality and cost constraints
Ng contrasts text-centric LLM prompting with situations where teams drop down to deep learning—especially for images/video/audio, structured data, performance limits, or cost control. He highlights a common trajectory: prompting works early, but fine-tuning or smaller models become important as usage and bills grow.
- •LLMs excel for text; multimodal is improving but not universal
- •Deep learning often needed for audio, vision, and structured tables
- •Prompt tuning hits limits; model adaptation may be required
- •Cost at scale can force smaller models/fine-tuning to “bend the cost curve”
23:06 – 24:07
CS230 curriculum overview: five modules from basics to transformers
He lays out the course’s five-module structure, starting with building neural networks from scratch in Python and progressing to tuning, project strategy, CNNs for vision, and sequence models including transformers. The emphasis is on practical skill-building and an actionable “tool chest.”
- •Module 1: neural network/deep learning basics; implement from scratch
- •Module 2: improving/tuning networks; key hyperparameters
- •Project strategy: disciplined ML development and diagnostics
- •CNNs for vision; sequence models leading into transformers
24:07 – 30:14
Hyperparameter tuning and disciplined ML project development
Ng stresses that practical progress in deep learning often hinges on effective hyperparameter tuning and systematic project workflow. He warns against hype-driven decisions (e.g., collecting data or buying GPUs without diagnosis) and previews in-class exercises on making better engineering choices.
- •Hyperparameter tuning is a real differentiator in training success
- •Complex systems require disciplined debugging and prioritization
- •Avoid random/hype-driven tactics like “just collect more data”
- •Diagnostics guide whether to get data, compute, model changes, etc.
30:14 – 32:52
Breadth of applications and cross-campus collaboration opportunities
He motivates students by describing how deep learning skills transfer across many domains—from ads and search to fraud detection, shipping efficiency, and climate science. AI practitioners can collaborate widely because many fields have data and problems that benefit from modern ML techniques.
- •Deep learning unlocks work across diverse industries and domains
- •Examples: speech, search, e-commerce, safety, fraud, energy efficiency
- •Fraud detection as a high-stakes, rapid-response ML environment
- •Opportunities to collaborate with non-CS departments at Stanford
32:52 – 35:14
Data requirements: when you have ‘enough’ data is often unknowable upfront
Ng answers a question about how much data neural networks need, explaining it depends heavily on the task and what’s in the dataset. For new “greenfield” problems, he advises starting with a small dataset and training quickly to learn what’s required.
- •Data needs vary widely by domain and problem complexity
- •Prior experience/literature can provide rough heuristics
- •Greenfield projects require experimentation to estimate needs
- •Sometimes small data works; other times massive data still isn’t enough
35:14 – 36:29
Defining GenAI vs deep learning: GenAI as a subset centered on generation
He clarifies the terminology: generative AI is a subset of deep learning focused on generating text (and increasingly images/audio), largely powered by transformers trained on large-scale data. Deep learning is broader and includes many non-generative applications and model families.
- •Terms can be fuzzy; GenAI sits within deep learning
- •GenAI center of gravity: text generation, expanding to other modalities
- •Transformers dominate modern LLM-based GenAI
- •Examples: ChatGPT, Claude, Gemini, LLaMA
36:29 – 44:14
AI-assisted coding: prototypes vs production, and why building faster matters
Ng discusses how AI coding tools dramatically speed up quick prototypes while production systems require more caution (e.g., migrations, reliability). He argues that rapid prototyping enables faster feedback about data and user needs, and reframes “move fast and break things” into “move fast and be responsible.”
- •AI-assisted coding boosts individual productivity, especially for prototyping
- •Production use needs careful oversight (reliability, migrations, security)
- •Fast iteration reveals surprises in real-world data and user behavior
- •Responsible speed helps find and fix issues sooner
44:14 – 56:23
Careers and skills: why learning to code (and AI) is more important, not less
Ng argues that advice to stop learning coding because AI will automate it is historically wrong—easier coding leads to more builders. He also explains job market shifts: demand is rising for AI-enabled developers, while “pre-GenAI” skillsets are less competitive; fundamentals plus AI fluency create the strongest engineers.
- •History: each step that makes coding easier increases demand for coders
- •Companies want AI-enabled engineers; ‘code-by-hand only’ is fading
- •Strongest candidates combine experience, AI fluency, and CS fundamentals
- •Employers may still struggle to interview/hire effectively for GenAI roles
56:23 – 1:00:16
What it means to ‘know GenAI’: tooling stack beyond prompting + course comparison wrap-up
Ng describes GenAI competence as more than typing prompts: it includes AI coding workflows and a growing toolbox (RAG, vector DBs, evals, guardrails, fine-tuning, agentic workflows). He closes with guidance on balancing CS229 vs CS230: 229 is broader and more mathematical, while 230 goes deeper into practical deep learning.
- •GenAI skill beyond prompting: RAG, vector DBs, evals, guardrails, agents
- •Fine-tuning and multimodal workflows as applied GenAI skills
- •CS107/111-style fundamentals remain valuable foundations
- •CS229: broad/theoretical ML; CS230: practical deep learning depth