Data Representation and Compression
Huffman Coding, UTF-8, and the Logic Behind JPEG and MP3 — A TLDR Primer
Your computer science class just hit the unit on data representation, and suddenly you're staring at binary, hexadecimal, Huffman trees, and perceptual coding — all at once. Or maybe you're a parent trying to help your student make sense of why a JPEG looks different from a PNG, or why an MP3 is smaller than a WAV file. This guide was written for exactly that moment.
**TLDR: Data Representation and Compression** is a focused, no-filler guide that walks you through how computers encode text, images, and sound into bits — and how compression makes those bits smaller without destroying what matters. You'll start with binary and hexadecimal (the actual foundation everything else rests on), move through ASCII and Unicode character encoding, and then into how RGB pixels and audio sampling work. From there, the guide covers lossless compression schemes like run-length encoding and Huffman coding, then explains how JPEG and MP3 use perceptual coding to discard what human eyes and ears won't miss anyway.
This book is for high school students in AP Computer Science or a digital literacy course, college freshmen in an intro CS or IT program, and anyone who wants to understand how computers encode text, images, and sound without wading through a 500-page textbook. Every section leads with the key idea, uses concrete worked numbers, and calls out the misconceptions students most often bring into an exam.
If you need to get oriented fast, this is the guide to read first.
- Convert numbers between binary, decimal, and hexadecimal and explain why computers use binary
- Describe how characters, images, and audio are encoded as bits using ASCII, Unicode, RGB pixels, and PCM samples
- Distinguish lossless from lossy compression and identify when each is appropriate
- Trace through a simple Huffman or run-length encoding example and compute a compression ratio
- Explain at a high level why JPEG, MP3, and ZIP work and what tradeoffs they make
- 1. Bits, Bytes, and Why BinaryIntroduces binary as the foundation of all digital data, with conversions between binary, decimal, and hexadecimal.
- 2. Encoding Text: ASCII and UnicodeHow characters become numbers, from 7-bit ASCII to UTF-8 and the handling of emoji and non-English scripts.
- 3. Encoding Images and SoundHow pixels with RGB values represent images and how audio is sampled into PCM, including resolution and bit depth tradeoffs.
- 4. Lossless Compression: Huffman and Run-Length EncodingWalks through two classic lossless schemes that exploit redundancy without throwing information away.
- 5. Lossy Compression: JPEG, MP3, and Perceptual CodingExplains how JPEG and MP3 discard information humans cannot easily perceive, and why this lets files shrink dramatically.
- 6. Why It Matters: Storage, Networks, and TradeoffsConnects representation and compression to real-world concerns like streaming, archival, and choosing the right file format.