SOLID STATE PRESS
← Back to catalog
Classical ML: Decision Trees, Random Forests, and k-Means cover
Coming soon
Coming soon to Amazon
This title is in our publishing queue.
Browse available titles
Artificial Intelligence

Classical ML: Decision Trees, Random Forests, and k-Means

A High School & College Primer on the Algorithms That Still Matter

You just hit a unit on machine learning and the textbook reads like a research paper. The lecture moved fast. The exam is coming. What you need is someone to sit down with you and explain how these algorithms actually work — no fluff, no PhD prerequisites.

This TLDR guide covers the three classical ML algorithms that show up everywhere: **decision trees**, **random forests**, and **k-means clustering**. You will learn how a decision tree picks its splits using Gini impurity, why combining hundreds of weak trees into a random forest beats any single perfect tree, and how k-means finds hidden structure in data without any labels at all. Each algorithm is walked through with concrete numbers and worked examples, not hand-waving.

This is a machine learning algorithms for beginners guide written for US high school students (grades 9–12) and college freshmen or sophomores taking their first AI, data science, or computer science course. It also covers the evaluation layer — train/test splits, confusion matrices, accuracy's blind spots — so you know not just how each algorithm works but how to judge whether it's working well and when to reach for which tool.

At roughly 15 pages, it is short by design. Every sentence earns its place. Parents helping a student, tutors prepping a session, and self-studiers who want a focused intro to machine learning study guide without wading through a 600-page textbook will all find it useful.

Pick it up, read it once, and walk into class ready.

What you'll learn
  • Distinguish supervised from unsupervised learning and identify which algorithms fit each setting
  • Build a decision tree by hand using Gini impurity or information gain and explain how splits are chosen
  • Explain how bagging and feature randomness turn weak trees into a strong random forest
  • Run the k-means algorithm by hand on a small dataset and recognize when it fails
  • Evaluate models using train/test splits, accuracy, and the elbow method, and recognize overfitting
What's inside
  1. 1. What Classical ML Is (and What It Isn't)
    Orients the reader to supervised vs. unsupervised learning, situates the three algorithms, and contrasts classical ML with deep learning.
  2. 2. Decision Trees: Splitting Your Way to an Answer
    Walks through how a decision tree is built one split at a time, using Gini impurity and information gain on a worked example.
  3. 3. Random Forests: Why Many Weak Trees Beat One Strong One
    Explains bagging, feature subsampling, and majority voting, and why ensembling reduces variance without much added bias.
  4. 4. k-Means: Finding Groups Without Labels
    Steps through the k-means algorithm on a small 2D dataset, covers initialization issues, and shows how to pick k with the elbow method.
  5. 5. Evaluating, Comparing, and Choosing
    Covers how to actually judge these models in practice: train/test splits, accuracy and its limits, confusion matrices, and when to reach for which algorithm.
Published by Solid State Press
Classical ML: Decision Trees, Random Forests, and k-Means cover
TLDR STUDY GUIDES

Classical ML: Decision Trees, Random Forests, and k-Means

A High School & College Primer on the Algorithms That Still Matter
Solid State Press

Who This Book Is For

If you're taking an intro to machine learning course in high school or college, preparing for a data science unit in AP Computer Science Principles, or just trying to make sense of AI concepts as a college freshman encountering them for the first time, this book is for you. It's also useful for self-studiers who keep hearing terms like "random forest" or "clustering" and want a straight answer about what they actually mean.

This guide covers the three classical machine learning algorithms that show up most often in coursework and interviews: decision trees and random forests explained simply enough to build real intuition, plus k-means clustering explained for students who have never encountered unsupervised learning before. Along the way, you'll get a clear supervised vs. unsupervised learning primer and learn how to evaluate and compare models. About 15 pages, no filler.

Read straight through once, then work every numbered example by hand. Finish with the practice problems at the end — that's where the ideas stick.

Contents

  1. 1 What Classical ML Is (and What It Isn't)
  2. 2 Decision Trees: Splitting Your Way to an Answer
  3. 3 Random Forests: Why Many Weak Trees Beat One Strong One
  4. 4 k-Means: Finding Groups Without Labels
  5. 5 Evaluating, Comparing, and Choosing
Chapter 1

What Classical ML Is (and What It Isn't)

Every useful machine learning algorithm starts with the same question: what do you already know, and what are you trying to find out? The answer to that question determines almost everything else — which algorithm to reach for, how to set up your data, and how to know if your model worked.

Machine learning is the practice of training a computer program to make predictions or find patterns by exposing it to data, rather than by writing explicit rules. Instead of coding "if the email contains the word 'prize' and comes from an unknown sender, mark it spam," you show the program thousands of labeled emails and let it figure out the rules itself.

Supervised vs. Unsupervised Learning

The single most important split in all of machine learning is between supervised learning and unsupervised learning. Everything else follows from it.

In supervised learning, every training example comes with a correct answer attached. You give the algorithm features — the measurable properties of each example — alongside a label — the correct output you want the model to predict. A feature might be a patient's age, blood pressure, and cholesterol level. The label might be whether that patient developed heart disease within five years. The algorithm's job is to learn a mapping from features to labels so it can predict the label for new patients it has never seen.

Supervised learning splits further into two tasks. Classification means predicting a category — spam or not spam, malignant or benign, which digit (0–9) appears in an image. Regression means predicting a continuous number — tomorrow's temperature, a house's sale price, how many hours a battery will last. Both tasks use labeled training data; they just differ in what kind of output they're learning to produce.

Keep reading

You've read the first half of Chapter 1. The complete book covers 5 chapters in roughly fifteen pages — readable in one sitting.

Coming soon to Amazon