Skip to main content
EnglandMathsSyllabus dot point

How do you collect reliable data and choose a fair sample?

Types of data, populations and samples, random and stratified sampling, sources of bias, and designing good data collection.

A focused answer to the AQA GCSE Mathematics statistics content on sampling and data, covering types of data, populations and samples, random and stratified sampling, sources of bias, and designing good data collection.

Generated by Claude Opus 4.88 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Types of data
  3. Populations and samples
  4. Random and stratified sampling
  5. Sources of bias and good design
  6. Designing a good questionnaire
  7. Why sample size and method both matter

What this dot point is asking

AQA wants you to classify data types, distinguish a population from a sample, carry out random and stratified sampling (including the stratified-sample calculation), identify sources of bias, and design fair data collection. The calculation skill is the stratified sample; the rest is reasoning about why a sample might mislead and how to fix it, which is reliably tested with "give a reason" questions.

Types of data

Qualitative (categorical) data describes a quality, such as colour or favourite subject. Quantitative data is numerical and splits into discrete data, which is counted in whole steps (number of siblings, goals scored), and continuous data, which is measured on a scale and can take any value (height, time, mass). Continuous data is usually grouped into class intervals for analysis. Primary data is collected first-hand; secondary data comes from an existing source.

Populations and samples

The population is the entire set you want to know about, which is often too large to survey completely, so a sample is taken instead. A good sample is large enough to be reliable and representative, meaning its make-up mirrors the population. Conclusions from a sample are estimates about the population, and a bigger, fairer sample gives a more trustworthy estimate.

Random and stratified sampling

In a simple random sample every member of the population has an equal chance of selection, achieved by numbering everyone and drawing numbers at random. A stratified sample first divides the population into groups (strata) such as year groups or age bands, then samples each group in proportion to its size, so the sample mirrors the structure of the population.

Sources of bias and good design

Bias occurs when the sampling method systematically over- or under-represents part of the population. Common causes are surveying at a single time or place, asking only volunteers (self-selection), or using leading questions. A questionnaire should have clear, non-leading questions with non-overlapping, exhaustive response options, often using time or quantity boundaries that do not double-count. To reduce bias, sample across varied times, places and subgroups, and use random or stratified selection rather than convenience.

Designing a good questionnaire

A well-designed questionnaire question has three features that examiners look for. First, it asks one clear thing, avoiding vague words like "often" or "regularly" that mean different things to different people. Second, it is not leading: "Do you agree that the new park is a good idea?" pushes a yes, whereas "What is your opinion of the new park?" is neutral. Third, its response boxes are exhaustive (cover every possibility) and non-overlapping. A common error is offering response intervals such as "00 to 1010" and "1010 to 2020", which both include 1010; the fix is "00 to 99" and "1010 to 1919", or using a clear boundary convention. Being able to criticise a flawed question and rewrite it is a standard exam task.

Why sample size and method both matter

A reliable estimate needs both a fair method and an adequate size. A large sample chosen by a biased method (for example, only people who volunteered online) is still biased, just confidently wrong. A perfectly fair method with too small a sample is too sensitive to chance to trust. Stratified sampling improves reliability by guaranteeing the structure of the population is reflected, while a larger sample reduces random variation. When asked to evaluate a sampling plan, comment on both the method (is it fair and random) and the size (is it large enough), because the marks usually reward addressing each.

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20193 marksA school has 600600 Year 1010 students and 400400 Year 1111 students. A stratified sample of 5050 students is taken across the two year groups. Work out how many Year 1010 students should be in the sample. (Higher tier, Paper 2, calculator.)
Show worked answer →

The total population is 600+400=1000600 + 400 = 1000 students. The sampling fraction is 501000=0.05\dfrac{50}{1000} = 0.05.

Year 1010 contribution: 600×0.05=30600 \times 0.05 = 30 students.

Markers reward the sampling fraction and the multiplication. A useful check is that Year 1111 gives 400×0.05=20400 \times 0.05 = 20, and 30+20=5030 + 20 = 50. Dividing the sample equally (2525 each) ignores the different group sizes and is the standard error.

AQA 20213 marksA researcher surveys shoppers outside one supermarket on a weekday morning to estimate the average weekly spend of all households in a town. Explain two reasons why this sample may be biased and how the design could be improved. (Higher tier, Paper 1, non-calculator.)
Show worked answer →

Reason one: a weekday morning excludes people who work full-time, so the sample is unrepresentative of the whole town.

Reason two: surveying outside one supermarket misses households who shop elsewhere or online.

Improvement: survey at varied times and locations, or take a random or stratified sample of all households in the town.

Markers reward each valid bias reason and a sensible improvement that broadens the sample.

Related dot points

Sources & how we know this