How do you collect reliable data and choose a fair sample?
Types of data, populations and samples, random and stratified sampling, sources of bias, and designing good data collection.
A focused answer to the AQA GCSE Mathematics statistics content on sampling and data, covering types of data, populations and samples, random and stratified sampling, sources of bias, and designing good data collection.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
AQA wants you to classify data types, distinguish a population from a sample, carry out random and stratified sampling (including the stratified-sample calculation), identify sources of bias, and design fair data collection. The calculation skill is the stratified sample; the rest is reasoning about why a sample might mislead and how to fix it, which is reliably tested with "give a reason" questions.
Types of data
Qualitative (categorical) data describes a quality, such as colour or favourite subject. Quantitative data is numerical and splits into discrete data, which is counted in whole steps (number of siblings, goals scored), and continuous data, which is measured on a scale and can take any value (height, time, mass). Continuous data is usually grouped into class intervals for analysis. Primary data is collected first-hand; secondary data comes from an existing source.
Populations and samples
The population is the entire set you want to know about, which is often too large to survey completely, so a sample is taken instead. A good sample is large enough to be reliable and representative, meaning its make-up mirrors the population. Conclusions from a sample are estimates about the population, and a bigger, fairer sample gives a more trustworthy estimate.
Random and stratified sampling
In a simple random sample every member of the population has an equal chance of selection, achieved by numbering everyone and drawing numbers at random. A stratified sample first divides the population into groups (strata) such as year groups or age bands, then samples each group in proportion to its size, so the sample mirrors the structure of the population.
Sources of bias and good design
Bias occurs when the sampling method systematically over- or under-represents part of the population. Common causes are surveying at a single time or place, asking only volunteers (self-selection), or using leading questions. A questionnaire should have clear, non-leading questions with non-overlapping, exhaustive response options, often using time or quantity boundaries that do not double-count. To reduce bias, sample across varied times, places and subgroups, and use random or stratified selection rather than convenience.
Designing a good questionnaire
A well-designed questionnaire question has three features that examiners look for. First, it asks one clear thing, avoiding vague words like "often" or "regularly" that mean different things to different people. Second, it is not leading: "Do you agree that the new park is a good idea?" pushes a yes, whereas "What is your opinion of the new park?" is neutral. Third, its response boxes are exhaustive (cover every possibility) and non-overlapping. A common error is offering response intervals such as " to " and " to ", which both include ; the fix is " to " and " to ", or using a clear boundary convention. Being able to criticise a flawed question and rewrite it is a standard exam task.
Why sample size and method both matter
A reliable estimate needs both a fair method and an adequate size. A large sample chosen by a biased method (for example, only people who volunteered online) is still biased, just confidently wrong. A perfectly fair method with too small a sample is too sensitive to chance to trust. Stratified sampling improves reliability by guaranteeing the structure of the population is reflected, while a larger sample reduces random variation. When asked to evaluate a sampling plan, comment on both the method (is it fair and random) and the size (is it large enough), because the marks usually reward addressing each.
Exam-style practice questions
Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AQA 20193 marksA school has Year students and Year students. A stratified sample of students is taken across the two year groups. Work out how many Year students should be in the sample. (Higher tier, Paper 2, calculator.)Show worked answer →
The total population is students. The sampling fraction is .
Year contribution: students.
Markers reward the sampling fraction and the multiplication. A useful check is that Year gives , and . Dividing the sample equally ( each) ignores the different group sizes and is the standard error.
AQA 20213 marksA researcher surveys shoppers outside one supermarket on a weekday morning to estimate the average weekly spend of all households in a town. Explain two reasons why this sample may be biased and how the design could be improved. (Higher tier, Paper 1, non-calculator.)Show worked answer →
Reason one: a weekday morning excludes people who work full-time, so the sample is unrepresentative of the whole town.
Reason two: surveying outside one supermarket misses households who shop elsewhere or online.
Improvement: survey at varied times and locations, or take a random or stratified sample of all households in the town.
Markers reward each valid bias reason and a sensible improvement that broadens the sample.
Related dot points
- Finding the mean, median, mode and range, averages from frequency tables, and the median and interquartile range from grouped data at Higher tier.
A focused answer to the AQA GCSE Mathematics statistics content on averages and spread, covering the mean, median, mode and range, averages from frequency tables, and the median and interquartile range from grouped data at Higher tier.
- Drawing and interpreting bar charts, pie charts, frequency tables, and cumulative frequency graphs, box plots and histograms at Higher tier.
A focused answer to the AQA GCSE Mathematics statistics content on charts and graphs, covering bar charts, pie charts and frequency tables, and cumulative frequency graphs, box plots and histograms at Higher tier.
- Plotting scatter graphs, describing correlation, drawing a line of best fit, using it to estimate values, and recognising the limits of extrapolation.
A focused answer to the AQA GCSE Mathematics statistics content on scatter graphs and correlation, covering plotting scatter graphs, describing correlation, drawing a line of best fit, using it to estimate values, and the limits of extrapolation.
- Estimating probability using relative frequency, the effect of more trials, comparing experimental and theoretical probability, and finding expected outcomes.
A focused answer to the AQA GCSE Mathematics probability content on relative frequency, covering estimating probability from experiments, the effect of more trials, comparing experimental and theoretical probability, and finding expected outcomes.
- The probability scale, equally likely outcomes, the fact that probabilities sum to one, and combining mutually exclusive and independent events.
A focused answer to the AQA GCSE Mathematics probability content on the basics, covering the probability scale, equally likely outcomes, the fact that probabilities sum to one, and combining mutually exclusive and independent events.
Sources & how we know this
- AQA GCSE Mathematics (8300) specification — AQA (2015)