How do we make files smaller, and what is the trade-off?
Understand why data is compressed, and the difference between lossy and lossless compression including run-length encoding and Huffman coding.
A focused answer to AQA GCSE Computer Science 3.3.8, covering why data is compressed and the difference between lossy and lossless compression, including run-length encoding and Huffman coding.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
AQA wants you to explain why data is compressed and to describe the difference between lossy and lossless compression, including the two named lossless methods, run-length encoding and Huffman coding.
Why compress data
Lossy compression
Lossy works by exploiting the limits of human perception: MP3 discards frequencies the ear can barely hear and sounds masked by louder ones; JPEG averages out fine colour variation the eye does not register. Each time a lossy file is re-saved, more detail is thrown away, so quality degrades cumulatively (generation loss).
Lossless compression
Run-length encoding
Huffman coding
Because no Huffman code is a prefix of another (the prefix property), the decoder can read a stream of bits and know exactly where each character ends without separators. This is why Huffman coding can shrink text losslessly: replacing fixed 8-bit ASCII with shorter codes for the common letters reduces the total bit count while still allowing exact reconstruction.
Choosing lossy or lossless
The choice between lossy and lossless depends entirely on whether losing data matters. For a text document, a spreadsheet or a program, every byte is significant, so only lossless will do; losing data would corrupt the file. For a photograph streamed on a website or music played from a phone, a small, imperceptible loss of detail is an acceptable price for a much smaller file, so lossy is chosen. The general principle is to use lossless when the data must be exact, and lossy when the file must be small and a little quality loss is tolerable.
Try this
Q1. State one reason files are compressed. [1 mark]
- Cue. To use less storage space or to transfer faster over a network.
Q2. State one difference between lossy and lossless compression. [2 marks]
- Cue. Lossy permanently removes some data so the original cannot be restored; lossless keeps all data so the original can be rebuilt exactly.
Exam-style practice questions
Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AQA 20204 marksA black-and-white image row is stored as the pixels WWWWWWBBBWWWW (W = white, B = black). Apply run-length encoding to this row, state the encoded form, and explain when RLE saves space and when it does not.Show worked answer →
Group each run of identical pixels and record the value with its count. The row is six W, then three B, then four W, which encodes as 6W3B4W.
The original is 13 symbols; the encoded form uses three value-count pairs, so it is shorter. RLE saves space when data has long runs of repeated values (large blocks of one colour). It does not save space, and can make a file larger, when values rarely repeat (a noisy photograph), because every single pixel becomes a count of 1 plus its value.
Markers reward the correct encoding 6W3B4W, and a clear statement that RLE helps only with repetition.
AQA 20235 marksExplain the difference between lossy and lossless compression. For each, give one suitable use and justify why that type of compression is appropriate.Show worked answer →
Lossy compression permanently removes some data, usually detail the eye or ear is unlikely to notice, achieving large size reductions but preventing perfect reconstruction. A suitable use is streaming music (MP3) or photographs (JPEG): the small loss in quality is acceptable because the files must be small enough to stream or share quickly.
Lossless compression reduces size with no loss of data, so the original is rebuilt exactly. A suitable use is a text document, spreadsheet or program file (ZIP), where losing even one character or byte would corrupt the file or change its meaning.
Markers reward a correct definition of each, a sensible matched use, and a justification that links the use to the trade-off (acceptable quality loss versus exact recovery).
Related dot points
- Understand how a bitmap image is represented using pixels and colour depth, the effect of resolution and colour depth on quality and file size, and the role of metadata.
A focused answer to AQA GCSE Computer Science 3.3.6, covering how bitmap images are represented using pixels and colour depth, the effect of resolution and colour depth on quality and file size, and metadata.
- Understand how analogue sound is sampled to be stored digitally, the effect of sample rate and bit depth on quality and file size, and calculate sound file sizes.
A focused answer to AQA GCSE Computer Science 3.3.7, covering how analogue sound is sampled for digital storage, the effect of sample rate and bit depth on quality and file size, and calculating sound file sizes.
- Know that data is stored in bits and bytes, the units from bit to terabyte, and calculate file sizes and storage requirements.
A focused answer to AQA GCSE Computer Science 3.3.4, covering bits and bytes, the units from bit to terabyte, and calculating file sizes and storage requirements.
- Understand the purpose of network protocols, the common protocols and ports, and the four-layer TCP/IP model and why layering is used.
A focused answer to AQA GCSE Computer Science 3.5.3, covering the purpose of network protocols, the common protocols and ports, and the four-layer TCP/IP model and why layering is used.
Sources & how we know this
- AQA GCSE Computer Science (8525) specification — AQA (2020)