Skip to main content
EnglandComputer ScienceSyllabus dot point

How are characters represented as binary using ASCII and Unicode?

Understand character encoding using ASCII and Unicode, the limitations of ASCII, why Unicode was introduced, and the relationship between a character set and a character code.

A focused answer to AQA A-Level Computer Science 4.5.9, covering character encoding using ASCII and Unicode, the limitations of ASCII, why Unicode was introduced, and the relationship between a character set and its codes.

Generated by Claude Opus 4.87 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Character sets and codes
  3. ASCII and its limitations
  4. Unicode

What this dot point is asking

AQA wants you to explain how characters are encoded as binary using ASCII and Unicode, state the limitations of ASCII, explain why Unicode was introduced, and describe the relationship between a character set and a character code.

Character sets and codes

A useful property of ASCII is that the codes for letters and digits run in sequence: A to Z are consecutive (65 to 90), as are a to z (97 to 122) and 0 to 9 (48 to 57). This lets you do arithmetic to convert case (add or subtract 32) or to turn a digit character into its numeric value (subtract 48). These tricks appear regularly in exam questions, so it is worth memorising the three anchor codes 48, 65 and 97.

ASCII and its limitations

Extended ASCII variants used the spare 8th bit to add another 128 codes, but different vendors filled that range with different characters, so a file written in one extended set displayed wrong characters when opened with another. This incompatibility (the same byte meaning different glyphs on different systems) was a major motivation for a single universal standard.

It is also worth being precise about what a character set does and does not do. The set fixes which characters exist and what numeric code each one has; it does not fix how those codes are stored in bytes. Storage is the job of an encoding, which is why one character set (Unicode) can be stored in several encodings (UTF-8, UTF-16, UTF-32) that all describe the same code points but pack them into bytes differently. Keeping the three ideas separate (character, code point, encoding) is what examiners are testing when they ask about the relationship between a character set and a character code.

Unicode

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20184 marksThe ASCII code for the character 'A' is 65 in denary. Describe how the codes for the uppercase letters are arranged in ASCII and explain how a program could convert the character 'g' to its uppercase equivalent using this arrangement. The ASCII code for 'a' is 97.
Show worked answer →

In ASCII the uppercase letters 'A' to 'Z' occupy consecutive codes from 65 to 90, and the lowercase letters 'a' to 'z' occupy consecutive codes from 97 to 122. Because each block is contiguous, the lowercase code is always exactly 32 greater than the matching uppercase code (97−65=3297 - 65 = 32).

To convert 'g' (code 103) to uppercase, subtract 32: 103−32=71103 - 32 = 71, which is the code for 'G'. A program reads the character code, subtracts 32, and outputs the character with the new code.

Markers reward stating that letters are consecutive, identifying the constant difference of 32, and showing the subtraction that gives 'G'.

AQA 20223 marksExplain why the Unicode character set was introduced and state one disadvantage of using Unicode rather than ASCII.
Show worked answer →

ASCII, using 7 bits, can represent only 128 characters, which is enough for English letters, digits and basic punctuation but not for accented characters, other alphabets such as Greek or Arabic, the thousands of characters in languages such as Chinese, or symbols and emoji. Unicode was introduced to give every character in every writing system a unique code point so text in any language can be stored and exchanged consistently.

A disadvantage is that Unicode characters can require more than one byte (for example UTF-8 uses 1 to 4 bytes), so the same text generally takes more storage and bandwidth than the equivalent ASCII.

Markers reward the limitation of ASCII, the purpose of Unicode (universal coverage), and a valid disadvantage (larger storage).

Related dot points

Sources & how we know this