How are characters represented as binary using ASCII and Unicode?
Understand character encoding using ASCII and Unicode, the limitations of ASCII, why Unicode was introduced, and the relationship between a character set and a character code.
A focused answer to AQA A-Level Computer Science 4.5.9, covering character encoding using ASCII and Unicode, the limitations of ASCII, why Unicode was introduced, and the relationship between a character set and its codes.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
AQA wants you to explain how characters are encoded as binary using ASCII and Unicode, state the limitations of ASCII, explain why Unicode was introduced, and describe the relationship between a character set and a character code.
Character sets and codes
A useful property of ASCII is that the codes for letters and digits run in sequence: A to Z are consecutive (65 to 90), as are a to z (97 to 122) and 0 to 9 (48 to 57). This lets you do arithmetic to convert case (add or subtract 32) or to turn a digit character into its numeric value (subtract 48). These tricks appear regularly in exam questions, so it is worth memorising the three anchor codes 48, 65 and 97.
ASCII and its limitations
Extended ASCII variants used the spare 8th bit to add another 128 codes, but different vendors filled that range with different characters, so a file written in one extended set displayed wrong characters when opened with another. This incompatibility (the same byte meaning different glyphs on different systems) was a major motivation for a single universal standard.
It is also worth being precise about what a character set does and does not do. The set fixes which characters exist and what numeric code each one has; it does not fix how those codes are stored in bytes. Storage is the job of an encoding, which is why one character set (Unicode) can be stored in several encodings (UTF-8, UTF-16, UTF-32) that all describe the same code points but pack them into bytes differently. Keeping the three ideas separate (character, code point, encoding) is what examiners are testing when they ask about the relationship between a character set and a character code.
Unicode
Exam-style practice questions
Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AQA 20184 marksThe ASCII code for the character 'A' is 65 in denary. Describe how the codes for the uppercase letters are arranged in ASCII and explain how a program could convert the character 'g' to its uppercase equivalent using this arrangement. The ASCII code for 'a' is 97.Show worked answer →
In ASCII the uppercase letters 'A' to 'Z' occupy consecutive codes from 65 to 90, and the lowercase letters 'a' to 'z' occupy consecutive codes from 97 to 122. Because each block is contiguous, the lowercase code is always exactly 32 greater than the matching uppercase code ().
To convert 'g' (code 103) to uppercase, subtract 32: , which is the code for 'G'. A program reads the character code, subtracts 32, and outputs the character with the new code.
Markers reward stating that letters are consecutive, identifying the constant difference of 32, and showing the subtraction that gives 'G'.
AQA 20223 marksExplain why the Unicode character set was introduced and state one disadvantage of using Unicode rather than ASCII.Show worked answer →
ASCII, using 7 bits, can represent only 128 characters, which is enough for English letters, digits and basic punctuation but not for accented characters, other alphabets such as Greek or Arabic, the thousands of characters in languages such as Chinese, or symbols and emoji. Unicode was introduced to give every character in every writing system a unique code point so text in any language can be stored and exchanged consistently.
A disadvantage is that Unicode characters can require more than one byte (for example UTF-8 uses 1 to 4 bytes), so the same text generally takes more storage and bandwidth than the equivalent ASCII.
Markers reward the limitation of ASCII, the purpose of Unicode (universal coverage), and a valid disadvantage (larger storage).
Related dot points
- Understand the decimal, binary and hexadecimal number systems, why computers use binary and hexadecimal, and how to convert between the three bases.
A focused answer to AQA A-Level Computer Science 4.5.1, covering the decimal, binary and hexadecimal number systems, why computers use binary and hexadecimal, and how to convert between the three bases.
- Understand the bit and byte, the units of information capacity, binary and decimal prefixes (kibi versus kilo), and how the number of bits limits the range of values that can be represented.
A focused answer to AQA A-Level Computer Science 4.5.8, covering the bit and the byte, the units of information capacity, binary prefixes (kibi, mebi) versus decimal prefixes (kilo, mega), and how the number of bits limits the range of values.
- Understand unsigned and signed binary using two's complement, binary addition and subtraction, fixed point and floating point representation of real numbers, and the effects of overflow and rounding.
A focused answer to AQA A-Level Computer Science 4.5.2 to 4.5.7, covering unsigned and signed binary using two's complement, binary addition and subtraction, fixed and floating point representation of real numbers, and overflow and rounding errors.
- Understand how bitmap images are represented using pixels, colour depth and resolution, how analogue sound is sampled, and the effect of sample rate, resolution and metadata on quality and file size.
A focused answer to AQA A-Level Computer Science 4.5.10 and 4.5.11, covering bitmap images with pixels, colour depth and resolution, the sampling of analogue sound, and the effect of sample rate, resolution and metadata on quality and file size.
- Understand the built-in data types: integer, real or float, Boolean, character and string, and understand records, arrays and user-defined data types built from them.
A focused answer to AQA A-Level Computer Science 4.1.1, covering the built-in data types (integer, real, Boolean, character, string), how each is stored, and how records, arrays and user-defined types are built from them.
Sources & how we know this
- AQA A-level Computer Science (7517) specification — AQA (2015)