How are text characters stored in a computer using character sets such as ASCII and Unicode?
Representing characters in binary using character sets, the ASCII and Unicode character sets, and the relationship between a character, its character code and the number of bits needed.
A focused answer to the WJEC GCSE Computer Science Unit 1 content on representing text, covering character sets, the ASCII character set and its size, the Unicode character set and why it was introduced, character codes, and how the number of bits limits the number of characters.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
WJEC wants you to know how text characters are stored in binary using a character set, the ASCII and Unicode character sets and their sizes, the idea of a character code, and how the number of bits limits how many characters can be represented. This is part of the Data representation and data types content in Unit 1 of WJEC GCSE Computer Science (3500).
Character sets and character codes
ASCII
Because the letters are in sequence, you can work out a code from a known one: 'D' is three after 'A', so . The constant difference of between an upper case letter and its lower case version is a common exam fact.
Unicode
Why this matters
Knowing that each character is just a number lets you reason about file sizes (characters multiplied by bits per character), understand why a Unicode document is bigger than an ASCII one, and explain why sorting text works in code order, which puts all upper case letters before lower case ones.
Try this
Q1. A document of characters is stored in standard ASCII. How many bytes does the text need? [2 marks]
- Cue. One byte per character, so bytes.
Q2. State one reason why Unicode is needed in addition to ASCII. [1 mark]
- Cue. ASCII cannot represent the many characters used by the world's different languages, plus symbols and emoji.
Exam-style practice questions
Practice questions written in the style of WJEC exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
WJEC-style Unit 13 marksThe character 'A' has the ASCII code . State the ASCII code for 'C', and explain how the code for 'a' relates to the code for 'A'.Show worked answer →
A Unit 1 character code question. ASCII codes for letters run in order, so 'C' is two after 'A': (1 mark). The lower-case letters have a separate, higher block of codes; 'a' is , which is more than 'A' at (1 mark for stating 'a' is higher, 1 mark for the difference of ). Markers reward the sequential ordering and the constant offset between upper and lower case. A common error is to assume 'a' and 'A' share a code, when upper and lower case have different codes.
WJEC-style Unit 13 marksExplain why Unicode was introduced when ASCII already existed, and state one consequence of Unicode for file size.Show worked answer →
A Unit 1 explain question. ASCII uses only or bits, so it can represent at most or characters, which is enough for English letters, digits and punctuation but not for the many alphabets and symbols used by the world's languages (1 mark). Unicode was introduced to give every character in every language, plus symbols and emoji, its own unique code (1 mark). A consequence is that because Unicode uses more bits per character, text stored in Unicode generally takes up more memory or storage than the same text in ASCII (1 mark). Markers reward the limited range of ASCII, the wider coverage of Unicode and the larger file size. A common error is to say Unicode replaced binary, when it is just a larger character set still stored in binary.
Related dot points
- The binary and denary number systems, why computers store data in binary, the units of data capacity, and converting whole numbers between binary and denary.
A focused answer to the WJEC GCSE Computer Science Unit 1 content on binary and denary numbers, covering why computers use binary, bits bytes and the units of data capacity, place value in base 2, and converting whole numbers between binary and denary in both directions.
- The hexadecimal number system, why hexadecimal is used as a shorthand for binary, and converting between hexadecimal, binary and denary.
A focused answer to the WJEC GCSE Computer Science Unit 1 content on hexadecimal, covering the base 16 number system, why hexadecimal is a convenient shorthand for binary, and converting between hexadecimal, binary and denary in both directions with worked examples.
- Representing bitmap images as pixels, the meaning of resolution and colour depth, calculating the file size of an image, and the role of metadata.
A focused answer to the WJEC GCSE Computer Science Unit 1 content on representing images, covering bitmap images and pixels, resolution, colour depth and the number of colours, calculating image file size from dimensions and colour depth, and the role of metadata.
- Representing sound in binary by sampling an analogue wave, the meaning of sample rate and sample resolution (bit depth), and how they affect sound quality and file size.
A focused answer to the WJEC GCSE Computer Science Unit 1 content on representing sound, covering analogue to digital conversion by sampling, sample rate and sample resolution (bit depth), how each affects sound quality and file size, and calculating the file size of a sound recording.
- The common data types, the use of data structures such as arrays and records, and the difference between validation and verification.
A focused answer to the WJEC GCSE Computer Science Unit 1 content on data organisation, covering the common data types (integer, real, Boolean, character and string), data structures such as arrays and records, and the difference between validation and verification with examples of each technique.