STANFORD · AUTUMN 2025 · SELF-PACED

Master Transformers & Large Language Models

A self-paced companion to Stanford's CME 295. Nine lectures take you from word vectors and attention all the way to reasoning agents — with the original videos, distilled notes, and the math that makes it click.

9Lectures
16h+Video
40+Topics
2017→25Attention → Agents
THE COURSE

One architecture, the whole modern AI stack

CME 295 traces the evolution of NLP, the core components of the Transformer, and how they scale into the large language models behind today's AI — blending theory with practical engineering. This hub reorganizes the public material into a clean, self-paced path.

OUTCOMES

What you'll be able to do

INTERACTIVE

See self-attention in action

Hover or tap any word to see how strongly it attends to every other word — the single operation at the heart of every Transformer.

Query: darker = more attention · line thickness = weight
CURRICULUM

Nine lectures, one continuous arc

Each lecture pairs the original Stanford recording with distilled notes, key formulas and takeaways. Click any card to dive in.

COMPANION HANDBOOK · 中文

机器学习的概率视角

一本硬核中文推导手册 —— 这门课底下的数学地基。为什么 y 是随机的、每个损失函数如何从一个概率假设里掉出来、偏差-方差、优化器,一路推到 Scaling Law。

10 200+ 公式 MLE → MAP → Scaling Law
打开手册
INSTRUCTORS

Taught at Stanford

The original course is taught by the Amidi brothers, well known for the CS 229 cheatsheets that have helped millions of learners.

AAAfshine AmidiAdjunct Lecturer
SAShervine AmidiAdjunct Lecturer

Fridays 3:30–5:20pm · Thornton 110 · Autumn 2025. This is an independent study hub, not affiliated with or endorsed by Stanford.

OFFICIAL RESOURCES

Go to the source

Pair this hub with the official material from the course authors.

Ready to begin?

Start with Lecture 1 and build your way up to reasoning agents. Your progress is saved automatically in your browser.

Enter the learning hub