Skip to main content
WalesComputer ScienceSyllabus dot point

How are text characters stored in a computer using character sets such as ASCII and Unicode?

Representing characters in binary using character sets, the ASCII and Unicode character sets, and the relationship between a character, its character code and the number of bits needed.

A focused answer to the WJEC GCSE Computer Science Unit 1 content on representing text, covering character sets, the ASCII character set and its size, the Unicode character set and why it was introduced, character codes, and how the number of bits limits the number of characters.

Generated by Claude Opus 4.88 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this topic is asking
  2. Character sets and character codes
  3. ASCII
  4. Unicode
  5. Why this matters
  6. Try this

What this topic is asking

WJEC wants you to know how text characters are stored in binary using a character set, the ASCII and Unicode character sets and their sizes, the idea of a character code, and how the number of bits limits how many characters can be represented. This is part of the Data representation and data types content in Unit 1 of WJEC GCSE Computer Science (3500).

Character sets and character codes

ASCII

Because the letters are in sequence, you can work out a code from a known one: 'D' is three after 'A', so 65+3=6865 + 3 = 68. The constant difference of 3232 between an upper case letter and its lower case version is a common exam fact.

Unicode

Why this matters

Knowing that each character is just a number lets you reason about file sizes (characters multiplied by bits per character), understand why a Unicode document is bigger than an ASCII one, and explain why sorting text works in code order, which puts all upper case letters before lower case ones.

Try this

Q1. A document of 200200 characters is stored in standard ASCII. How many bytes does the text need? [2 marks]

  • Cue. One byte per character, so 200200 bytes.

Q2. State one reason why Unicode is needed in addition to ASCII. [1 mark]

  • Cue. ASCII cannot represent the many characters used by the world's different languages, plus symbols and emoji.

Exam-style practice questions

Practice questions written in the style of WJEC exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

WJEC-style Unit 13 marksThe character 'A' has the ASCII code 6565. State the ASCII code for 'C', and explain how the code for 'a' relates to the code for 'A'.
Show worked answer →

A Unit 1 character code question. ASCII codes for letters run in order, so 'C' is two after 'A': 65+2=6765 + 2 = 67 (1 mark). The lower-case letters have a separate, higher block of codes; 'a' is 9797, which is 3232 more than 'A' at 6565 (1 mark for stating 'a' is higher, 1 mark for the difference of 3232). Markers reward the sequential ordering and the constant offset between upper and lower case. A common error is to assume 'a' and 'A' share a code, when upper and lower case have different codes.

WJEC-style Unit 13 marksExplain why Unicode was introduced when ASCII already existed, and state one consequence of Unicode for file size.
Show worked answer →

A Unit 1 explain question. ASCII uses only 77 or 88 bits, so it can represent at most 128128 or 256256 characters, which is enough for English letters, digits and punctuation but not for the many alphabets and symbols used by the world's languages (1 mark). Unicode was introduced to give every character in every language, plus symbols and emoji, its own unique code (1 mark). A consequence is that because Unicode uses more bits per character, text stored in Unicode generally takes up more memory or storage than the same text in ASCII (1 mark). Markers reward the limited range of ASCII, the wider coverage of Unicode and the larger file size. A common error is to say Unicode replaced binary, when it is just a larger character set still stored in binary.

Related dot points

Sources & how we know this