What are regular expressions, and how do regular and context-free languages differ?
Understand regular expressions and regular languages, the link between regular expressions and finite state machines, the limits of regular languages, and context-free languages described by a BNF grammar.
A focused answer to AQA A-Level Computer Science 4.4.3 and 4.4.4, covering regular expressions and regular languages, their link to finite state machines, the limits of regular languages, and context-free languages described using BNF.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
AQA wants you to write and read regular expressions, link them to finite state machines, recognise what regular languages cannot describe, and read a context-free grammar written in Backus-Naur Form (BNF).
Regular expressions and regular languages
For example, (0|1)+ describes any non-empty binary string, and ab*c describes an a, then any number of bs, then a c. Building these up from the three operators is a common exam skill: alternation gives a choice, concatenation puts pieces in sequence, and the repetition operators control how many times a piece may appear. Reading an unfamiliar expression is the same process in reverse, identifying which operator binds where.
Link to finite state machines
This equivalence ties the topic to program translation. In the lexical analysis stage of compilation, the tokens of a language (identifiers, numbers, operators) are each described by a regular expression, and the lexer that recognises them is essentially a finite state machine. Because the two formalisms are interchangeable, a language designer can specify tokens with readable regular expressions and a tool can automatically build the equivalent finite state machine that does the recognising.
Limits, and context-free languages
BNF defines a grammar with rules of the form non-terminal then ::= then a definition, where <...> denotes a non-terminal and | separates alternatives. A rule can refer to itself (recursion), which is what gives BNF the power to describe nesting.
<digit> ::= 0|1|2|3|4|5|6|7|8|9
<number> ::= <digit> | <digit><number>
The recursion in <number> (defined partly in terms of itself) is exactly what lets it generate numbers of any length, and the same recursion lets a grammar describe arbitrarily deep nesting such as brackets within brackets. This is why the syntax analysis stage of a compiler uses a context-free grammar rather than regular expressions: program structure is full of nesting (blocks inside blocks, expressions inside expressions) that regular languages cannot capture, which mirrors the FSM-versus-pushdown step in the computation hierarchy.
Exam-style practice questions
Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AQA 20194 marksWrite a regular expression that matches strings consisting of one or more letter a followed by exactly one letter b, then explain the link between regular expressions and finite state machines.Show worked answer →
A suitable regular expression is a+b (one or more a, then a single b). Equivalently aa*b expresses the same set.
Regular expressions and finite state machines have equal power: any language that can be described by a regular expression can be recognised by a finite state machine, and any finite state machine can be written as an equivalent regular expression. The set of languages they both describe is exactly the regular languages. This equivalence is why regular expressions are used to define the tokens that a lexical analyser, which is effectively a finite state machine, recognises.
Markers reward a correct regular expression using + (or a a* ) for one or more a then a single b, and the statement that regular expressions and FSMs describe exactly the same class of languages (the regular languages).
AQA 20214 marksExplain why a regular language cannot describe matched, nested structures such as balanced brackets, and state how a context-free grammar in BNF can describe such structures.Show worked answer →
A regular language is recognised by a finite state machine, which has only a finite number of states and therefore a bounded amount of memory. To check balanced brackets it would need to count an unbounded number of open brackets so it can match the same number of close brackets, but a finite state machine cannot count without limit because it has finitely many states. So balanced brackets are not a regular language.
A context-free grammar, written in Backus-Naur Form (BNF), can describe such structures because its rules allow recursion: a non-terminal can be defined in terms of itself, for example a bracketed expression defined as an open bracket, an expression, then a close bracket. This recursion lets the grammar generate arbitrarily deep nesting, which a regular expression cannot.
Markers reward the finite-memory or cannot-count argument for the limit of regular languages and the use of recursion in a BNF rule to handle unbounded nesting.
Related dot points
- Understand finite state machines with and without output, state transition diagrams and tables, and the use of an FSM to recognise inputs or model behaviour.
A focused answer to AQA A-Level Computer Science 4.4.2, covering finite state machines with and without output, state transition diagrams and tables, and using an FSM to recognise inputs or model behaviour.
- Understand the Turing machine model, its components, the idea of a universal Turing machine, and the link to the limits of computation and the halting problem.
A focused answer to AQA A-Level Computer Science 4.4.5, covering the Turing machine model and its components, the transition rules, the universal Turing machine, and its link to computability and the halting problem.
- Understand abstraction, the different forms of abstraction, decomposition, automation, and the components of computational thinking used to solve problems.
A focused answer to AQA A-Level Computer Science 4.4.1, covering abstraction and its forms, decomposition, automation, and the components of computational thinking used to solve problems with computers.
- Understand tractable and intractable problems, the classes P and NP, the idea of computable and non-computable problems, and the use of heuristics for intractable problems.
A focused answer to AQA A-Level Computer Science 4.4.6 and 4.4.7, covering tractable and intractable problems, the classes P and NP, computable and non-computable problems, and the use of heuristics to tackle intractable problems.
- Understand assemblers, compilers and interpreters, the differences between them, the stages of compilation (lexical analysis, syntax analysis, code generation and optimisation), and intermediate code.
A focused answer to AQA A-Level Computer Science 4.6.5, covering assemblers, compilers and interpreters and their differences, the stages of compilation (lexical analysis, syntax analysis, code generation and optimisation), and intermediate code such as bytecode.
Sources & how we know this
- AQA A-level Computer Science (7517) specification — AQA (2015)