Pretraining, Fine-Tuning, and RLHF
A High School & College Primer on How Modern LLMs Are Built in Three Stages
You've heard that ChatGPT was "trained on the internet" — but what does that actually mean? And what's the difference between a raw model and the polished chatbot you talk to every day? If you've tried to find answers and hit a wall of jargon, this guide is for you.
**TLDR: Pretraining, Fine-Tuning, and RLHF** walks you through the three stages that turn a blank neural network into a working AI assistant. You'll learn how large language model training for beginners actually works — starting with next-token prediction on trillions of words, moving through supervised fine-tuning on human-written examples, and finishing with the reinforcement learning from human feedback process that teaches the model to be helpful rather than just fluent. Each stage is explained with concrete numbers, plain language, and honest coverage of what can go wrong: hallucination, sycophancy, and reward hacking all get named and explained.
This guide is written for high school students, college freshmen, and anyone who wants a real mental model of modern AI — not a marketing summary and not a graduate-school textbook. It covers the same concepts taught in AI and machine learning courses, compressed into a focused read you can finish in an afternoon.
No calculus required. No prior AI background assumed. Just clear explanations of ideas that actually matter.
Pick it up, read it once, and walk into your next AI class or conversation ready to participate.
- Explain what a large language model is and what 'next-token prediction' actually means
- Describe what happens during pretraining, including data, compute, and loss
- Distinguish supervised fine-tuning from pretraining and explain why instruction data matters
- Understand how RLHF uses a reward model and PPO to align model outputs with human preferences
- Recognize the limitations and failure modes of each stage (hallucination, reward hacking, sycophancy)
- 1. What an LLM Actually IsSets up the core object: a neural network that predicts the next token, and why that simple task scales into something that looks like reasoning.
- 2. Stage 1: Pretraining on the InternetCovers the first and most expensive stage — training on trillions of tokens of text to learn language, facts, and patterns through cross-entropy loss.
- 3. Stage 2: Supervised Fine-TuningExplains how a base model becomes an instruction-follower by training on curated prompt-response pairs written by humans.
- 4. Stage 3: RLHF and the Reward ModelWalks through reinforcement learning from human feedback — collecting preference rankings, training a reward model, and optimizing the LLM with PPO.
- 5. What Goes Wrong and What Comes NextSurveys the known failure modes of each stage (hallucination, sycophancy, reward hacking) and previews newer methods like DPO, RLAIF, and constitutional AI.