EnglandComputer ScienceSyllabus dot point

What are the primitive data types, and how are characters represented in binary?

Primitive data types (integer, real/float, Boolean, character, string) and how text is represented using character sets such as ASCII and Unicode, including the size and range implications of each type.

An OCR H446 answer on primitive data types and text representation: the integer, real, Boolean, character and string types and their storage, and how characters are represented in binary using character sets such as ASCII and Unicode, with their size and range implications.

Generated by Claude Opus 4.813 min answerUpdated 2026-06-02

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Quick answer

A primitive data type is a basic built-in type from which more complex structures are built. An integer holds a whole number; a real (float) holds a number with a fractional part, stored in floating point; a Boolean holds one of two values (true or false), representable in a single bit; a character holds one symbol; and a string holds a sequence of characters (text). The choice of type affects storage and the range or precision available, so age is an integer, a temperature is a real, and an on/off state is a Boolean. Text is represented in binary using a character set that maps each character to a code. ASCII uses 7 bits for 128 characters (extended ASCII 8 bits for 256), enough for English but not for most languages. Unicode uses more bits to give a unique code point to characters from virtually every writing system, including accented letters and emoji, at the cost of more storage per character.

Jump to a section

What this dot point is asking
The answer
Examples in context
Try this

What this dot point is asking

OCR wants the primitive data types (integer, real/float, Boolean, character, string), when each is appropriate, and how text is represented in binary using character sets such as ASCII and Unicode, including their size and range implications. Expect a "choose the right type" question and an "ASCII versus Unicode" question.

The answer

The primitive data types

Representing characters: character sets

ASCII and Unicode

Key fact

ASCII (American Standard Code for Information Interchange) uses 7 bits to represent 128 characters: the uppercase and lowercase English letters, digits, punctuation and control codes (extended ASCII uses 8 bits for 256). It is compact but cannot represent the characters of most of the world's languages. Unicode was introduced to solve this: it provides a unique code point for characters from virtually all writing systems, including non-Latin alphabets, accented letters, mathematical symbols and emoji, using encodings with more bits per character (such as UTF-16 using 16 bits, or UTF-8 using a variable 8 to 32 bits). The trade-off is that Unicode text can use more storage per character than ASCII, but it enables consistent global text interchange. The first 128 Unicode code points match ASCII for compatibility.

Choose data types for a record

A student record stores: name, age, average mark (to one decimal place), and whether fees are paid. Choose the most appropriate primitive type for each and justify the trickiest one.

step 1: Name

A name is text (a sequence of characters), so use a string.

step 2: Age

Age is a whole number with no fractional part, so use an integer, which is compact and exact.

step 3: Average mark

An average to one decimal place (for example 67.4) has a fractional part, so use a real / float, stored in floating point. Using an integer here would lose the decimal.

step 4: Fees paid

This has exactly two states, paid or not, so use a Boolean (true/false), which a single bit can represent and which expresses the meaning clearly. Choosing Boolean rather than a string "yes"/"no" saves storage and prevents invalid values.

Examples in context

A database column's type (INT, REAL, CHAR, VARCHAR, BOOLEAN) is exactly this choice of primitive type, and it constrains what can be stored. Web pages declare a Unicode encoding (UTF-8) so they can display any language and emoji. A single bit storing a Boolean flag is the most efficient way to record a yes/no state. OCR links this to number and floating-point representation (how integers and reals are encoded) and to data structures, which are built from these primitives.

Try this

Q1. State the most appropriate primitive data type for a value that can only be true or false. [1 mark]

Cue. Boolean.

Q2. State how many characters standard 7-bit ASCII can represent. [1 mark]

Cue. $2^7 = 128$ characters.

Q3. Explain why Unicode was introduced in place of ASCII. [2 marks]

Cue. ASCII's 128 (or 256) characters cannot represent the many thousands of characters used across the world's languages; Unicode gives a unique code point to characters from virtually all writing systems.

Exam-style practice questions

Practice questions written in the style of OCR exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

OCR 20195 marksExplain the difference between the ASCII and Unicode character sets, and explain why Unicode was introduced.

Show worked answer →

ASCII (up to 2): the standard ASCII character set uses 7 bits to represent 128 characters (extended ASCII uses 8 bits for 256), covering the English alphabet, digits, punctuation and control codes. It is compact but cannot represent the characters of most of the world's languages.

Unicode and why introduced (up to 3): Unicode uses more bits per character (for example 16 or more in its encodings) to provide a unique code point for characters from virtually all writing systems, including non-Latin alphabets, accented letters and emoji. It was introduced because ASCII's 128 (or 256) characters could not represent the many thousands of characters used worldwide, so a universal standard was needed for global text interchange. The trade-off is that Unicode text can take more storage per character. Markers reward the bit/character-count contrast and the global-language motivation.

OCR 20214 marksState the most appropriate primitive data type for each of the following and justify one choice: a person's age, whether a light is on, a temperature reading of 21.5 degrees, and a single initial letter.

Show worked answer →

Award marks for correct types (up to 3) plus one justification (up to 1).

Age: integer (a whole number, no fractional part). Light on/off: Boolean (one of two states, true or false). Temperature 21.5: real/float (has a fractional part). Single initial: character (one symbol).

Justification example: a Boolean is used for the light because it has exactly two possible states (on or off), which a single bit can represent, so it is the most storage-efficient and meaningful type. Markers reward the four correct types and a sensible justification for one. A common error is using a real for age or an integer for a value that needs decimals.

Related dot points

Sources & how we know this

OCR AS and A Level Computer Science (H046, H446) specification — OCR (2015)