Skip to main content
EnglandComputer ScienceSyllabus dot point

How are letters and symbols stored as numbers?

Understand how characters are represented using ASCII and Unicode character sets, and the effect of the character set on storage and the range of characters.

A focused answer to AQA GCSE Computer Science 3.3.5, covering how characters are represented using ASCII and Unicode, and the effect of the character set on storage and the range of characters.

Generated by Claude Opus 4.87 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Character sets
  3. ASCII
  4. Unicode
  5. The effect of the character set
  6. Calculating storage for text
  7. Why ordered codes are useful
  8. Try this

What this dot point is asking

AQA wants you to explain how characters are stored using ASCII and Unicode, and to describe the effect of the character set on the number of characters that can be represented and on the storage each character needs.

Character sets

The reason a standard is essential is that there is nothing about the bit pattern 0100000101000001 that "is" the letter A; it only means A because everyone agrees on the ASCII table. Before standard character sets, files moved between systems often became unreadable because each manufacturer used its own mapping.

ASCII

Two ordering facts are worth memorising because AQA tests them: the capital letters run from 65 ('A') to 90 ('Z'), and the lower-case letters from 97 ('a') to 122 ('z'). Because the codes are contiguous you can calculate any letter's code from a known one by adding the offset, and convert case by adding or subtracting 32.

Unicode

Unicode solved the problem that 128 characters cannot cover Chinese, Arabic, Greek, mathematical symbols and emoji all at once. By making the first 128 code points identical to ASCII, Unicode stays backward-compatible: an old ASCII file opens correctly in a Unicode-aware program.

The effect of the character set

Calculating storage for text

Because a character set fixes the number of bits per character, you can calculate how much storage a piece of text needs. With 7-bit ASCII (often stored in an 8-bit byte), the word "Computer" of eight characters needs eight bytes. With a wider Unicode encoding using, say, two bytes per character, the same word would need sixteen bytes, twice as much. This is the storage cost of Unicode's larger range: every character takes more bits, so a document covering many languages is larger than the same length of plain ASCII English. Questions may ask you to work out the size of a string in bits or bytes given the bits per character, so multiply the number of characters by the bits per character, then divide by 8 for bytes.

Why ordered codes are useful

The fact that character codes run in order is not just a curiosity; programs rely on it. Sorting words alphabetically works by comparing the codes of their characters. Checking whether a character is a digit, an upper-case letter or a lower-case letter is done by testing whether its code falls in the relevant range. Converting a letter between cases is done by adding or subtracting 32, because upper and lower case are exactly 32 apart in ASCII. These tricks all depend on the codes being contiguous and ordered, which is why AQA expects you to know that 'A' is 65 and 'a' is 97.

Try this

Q1. State how many characters can be represented using 7-bit ASCII. [1 mark]

  • Cue. 128, because 27=1282^7 = 128.

Q2. State one advantage of Unicode over ASCII. [1 mark]

  • Cue. It can represent many more characters, including other languages and emoji.

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20183 marksThe ASCII code for the character 'A' is 65. Using this fact, calculate the ASCII code for the character 'F' and explain how you worked it out.
Show worked answer →

ASCII codes for the capital letters are consecutive, so each letter is one more than the letter before it. 'F' is the sixth letter, five places after 'A'.

So the code for 'F' is 65+5=7065 + 5 = 70.

Markers reward stating that the codes are contiguous and ordered, the correct offset of 5 (not 6, because 'A' itself is position 0 of the offset), and the value 70. The same trick converts between upper and lower case: lower-case 'a' is 97, exactly 32 more than 'A'.

AQA 20214 marksExplain the difference between the ASCII and Unicode character sets, and describe the effect each has on the range of characters available and on storage.
Show worked answer →

ASCII uses 7 bits per character, so it can represent only 27=1282^7 = 128 characters, enough for English letters, digits and basic punctuation. Unicode uses more bits per character (commonly up to 32 bits in its widest form), so it can represent over a million characters, covering the alphabets and symbols of the world's languages plus emoji.

The trade-off is storage: Unicode needs more bits per character than ASCII, so the same text takes more space, but it can represent far more characters. The first 128 Unicode codes match ASCII, so plain English text is identical in both.

Markers reward the bit counts (7 for ASCII), the resulting character counts, and a clear storage-versus-range trade-off.

Related dot points

Sources & how we know this