How is large-scale data organised in normalised relational databases, and how do big data and data warehousing extend this?
Describe the organisation and structure of data: relational databases, normalisation to third normal form, SQL, and big data and data warehousing.
A focused answer to WJEC A-Level Computer Science Unit 4 organisation of data, covering relational databases and keys, normalisation to first, second and third normal form, SQL, and big data and data warehousing.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
WJEC wants you to describe how large-scale data is organised: relational databases and keys, normalisation to third normal form (1NF, 2NF, 3NF), SQL for querying and editing, and the modern extensions of big data and data warehousing. This deepens the AS database topic to the formal normal forms and the contemporary data landscape. Expect a normalisation-to-3NF question and a big-data question, both rewarding precise definitions.
The answer
Relational databases and keys
The relational model's strength is that one fact is stored in one place and relationships are made explicit through keys, which is what normalisation formalises.
Normalisation to third normal form
Reaching 3NF means every non-key field depends only on the key, the whole key and nothing but the key, which removes the insert, update and delete anomalies that redundancy causes.
SQL
SQL queries data with SELECT ... FROM ... WHERE, combines tables with JOIN, and edits data with INSERT, UPDATE and DELETE. Aggregate functions and clauses such as GROUP BY and ORDER BY summarise and sort results. SQL is declarative: you state the result wanted, not how to fetch it.
Big data and data warehousing
Big data is data too large, fast-changing or varied (volume, velocity, variety) for traditional tools, often requiring distribution across many machines. Data warehousing brings data together from many sources into one store optimised for analysis and reporting.
Examples in context
- Example 1. A retailer reaching 3NF
- A retailer's order table repeats the product name and category against every order line. Normalising to 3NF splits products into their own table keyed by ProductID, so each product's details are stored once and an order line just references the ProductID. Updating a product name then touches one row, not thousands, the practical payoff of 3NF.
- Example 2. Sensor data as big data
- A network of thousands of sensors streams readings every second, producing high-volume, high-velocity, semi-structured data. A single relational table cannot keep up, so the data is distributed and processed with big-data tools. This shows why big data is a distinct problem rather than just a bigger version of a normal database.
- Example 3. A data warehouse for analysis
- A company copies sales, stock and customer data nightly from its operational databases into a data warehouse structured for fast analytical queries. Reports run against the warehouse without slowing the live systems. This illustrates the purpose of data warehousing: consolidating data for analysis separately from day-to-day transaction processing.
Try this
Q1. State the condition that a table in third normal form must satisfy beyond being in second normal form. [1 mark]
- Cue. No non-key field depends on another non-key field (no transitive dependency); every non-key field depends only on the primary key.
Q2. State two of the characteristics commonly used to define big data. [2 marks]
- Cue. Any two of volume (huge size), velocity (high rate of arrival) and variety (many formats, including unstructured data).
Exam-style practice questions
Practice questions written in the style of WJEC exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
WJEC 20206 marksState the conditions a relational database table must meet to be in first, second and third normal form.Show worked answer →
State each normal form's condition in order, each building on the previous.
First normal form (1NF): the table contains no repeating groups and each field holds a single (atomic) value, with a primary key identifying each record.
Second normal form (2NF): the table is in 1NF and every non-key field depends on the whole primary key, not just part of it (this removes partial dependencies, relevant where the key is composite).
Third normal form (3NF): the table is in 2NF and there are no non-key fields that depend on other non-key fields (no transitive dependencies); every non-key field depends only on the primary key.
Markers reward 1NF (atomic values, no repeating groups, a primary key), 2NF (1NF plus no partial dependency on part of a composite key), and 3NF (2NF plus no transitive dependency between non-key fields).
WJEC 20224 marksExplain what is meant by big data, and state two reasons traditional relational databases can struggle to handle it.Show worked answer →
Define big data, then give two genuine challenges for relational databases.
Big data refers to data sets so large, fast-changing or varied that traditional tools struggle to store, process and analyse them. It is often characterised by volume (huge size), velocity (high rate of arrival) and variety (many formats, including unstructured data).
Reasons relational databases struggle: first, the sheer volume can exceed what a single relational system can store and query efficiently, so the data must be distributed across many machines. Second, much big data is unstructured or semi-structured (text, images, sensor streams) and does not fit neatly into fixed relational tables.
Markers reward a definition referencing volume, velocity or variety, and two valid challenges such as scale beyond a single system or unstructured data not fitting relational tables.
Related dot points
- Describe advanced data representation: floating-point numbers, normalisation, fixed-point, and the limits of representing real numbers.
A focused answer to WJEC A-Level Computer Science Unit 4 advanced data representation, covering floating-point form with mantissa and exponent, normalisation, fixed-point representation, and the precision and range limits of real numbers.
- Describe computer architecture in depth: processor design, parallel processing, peripherals, and the hardware and methods of communication.
A focused answer to WJEC A-Level Computer Science Unit 4 hardware and communication, covering processor architecture, the role of registers and buses, parallel processing, peripherals and storage, and communication hardware and methods.
- Describe low-level programming: machine code and assembly language, the instruction set, and addressing modes.
A focused answer to WJEC A-Level Computer Science Unit 4 low-level programming, covering machine code and assembly language, the instruction set, the structure of an instruction, and addressing modes such as immediate and direct.
- Describe applications of computer science including artificial intelligence, machine learning, automation and modern computing applications.
A focused answer to WJEC A-Level Computer Science Unit 4 applications, covering artificial intelligence and machine learning, neural networks, expert systems, automation, and modern computing applications with their capabilities and limits.
- Evaluate the ethical, legal, social and environmental impacts of computer science and the relevant legislation.
A focused answer to WJEC A-Level Computer Science Unit 4 impact, covering the ethical, legal, social and environmental issues raised by computing, the relevant legislation, and how they guide responsible use.