Pretraining, Fine-Tuning, and RLHF

A High School & College Primer on How Modern LLMs Are Built in Three Stages

You've heard that ChatGPT was "trained on the internet" — but what does that actually mean? And what's the difference between a raw model and the polished chatbot you talk to every day? If you've tried to find answers and hit a wall of jargon, this guide is for you.

**TLDR: Pretraining, Fine-Tuning, and RLHF** walks you through the three stages that turn a blank neural network into a working AI assistant. You'll learn how large language model training for beginners actually works — starting with next-token prediction on trillions of words, moving through supervised fine-tuning on human-written examples, and finishing with the reinforcement learning from human feedback process that teaches the model to be helpful rather than just fluent. Each stage is explained with concrete numbers, plain language, and honest coverage of what can go wrong: hallucination, sycophancy, and reward hacking all get named and explained.

This guide is written for high school students, college freshmen, and anyone who wants a real mental model of modern AI — not a marketing summary and not a graduate-school textbook. It covers the same concepts taught in AI and machine learning courses, compressed into a focused read you can finish in an afternoon.

No calculus required. No prior AI background assumed. Just clear explanations of ideas that actually matter.

Pick it up, read it once, and walk into your next AI class or conversation ready to participate.

What you'll learn

Explain what a large language model is and what 'next-token prediction' actually means
Describe what happens during pretraining, including data, compute, and loss
Distinguish supervised fine-tuning from pretraining and explain why instruction data matters
Understand how RLHF uses a reward model and PPO to align model outputs with human preferences
Recognize the limitations and failure modes of each stage (hallucination, reward hacking, sycophancy)

What's inside

1. What an LLM Actually Is

Sets up the core object: a neural network that predicts the next token, and why that simple task scales into something that looks like reasoning.
2. Stage 1: Pretraining on the Internet

Covers the first and most expensive stage — training on trillions of tokens of text to learn language, facts, and patterns through cross-entropy loss.
3. Stage 2: Supervised Fine-Tuning

Explains how a base model becomes an instruction-follower by training on curated prompt-response pairs written by humans.
4. Stage 3: RLHF and the Reward Model

Walks through reinforcement learning from human feedback — collecting preference rankings, training a reward model, and optimizing the LLM with PPO.
5. What Goes Wrong and What Comes Next

Surveys the known failure modes of each stage (hallucination, sycophancy, reward hacking) and previews newer methods like DPO, RLAIF, and constitutional AI.

Published by Solid State Press

Pretraining, Fine-Tuning, and RLHF

Pretraining, Fine-Tuning, and RLHF

Who This Book Is For

Contents

What an LLM Actually Is

Tokens: the alphabet of an LLM

Next-token prediction