What does correlation tell you, and why does it not prove causation?
Vocabulary of correlation (positive, negative, zero, causation, association, interpolation, extrapolation); describing correlation by inspection as strong or weak; correlation does not imply causation; spurious correlation.
A focused answer to Edexcel GCSE Statistics on correlation, covering the vocabulary of correlation, describing correlation by inspection as strong or weak and positive or negative, why correlation does not imply causation, spurious correlation, and interpolation versus extrapolation.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
Edexcel codes 2e.01 to 2e.03 require you to know and apply the vocabulary of correlation (positive, negative, zero, causation, association, interpolation, extrapolation), to describe correlation by inspection as strong or weak, and to understand that correlation does not imply causation, including being aware of spurious correlation. These ideas are the foundation for the line of best fit and the rank and product moment coefficients on the other pages of this module.
The vocabulary of correlation
On a scatter diagram the explanatory (independent) variable is plotted on the axis and the response (dependent) variable on the axis. The direction of the cloud of points (uphill, downhill or shapeless) tells you the type of correlation.
Strength of correlation by inspection
Edexcel asks you to judge strength by inspection (no calculation needed at this stage). The closer the points lie to a straight line, the stronger the correlation:
- Strong correlation: points lie close to a clear straight line.
- Weak correlation: points show a trend but are widely scattered.
- Zero / no correlation: points show no straight-line pattern.
You combine strength and direction, for example "strong negative correlation" or "weak positive correlation", and always describe it in the context of the variables.
Correlation does not imply causation
This is one of the most heavily tested ideas in the whole qualification. Even strong correlation does not prove that one variable causes the other. There are three possibilities whenever two variables correlate:
- One genuinely causes the other (age of a car causing its value to fall).
- A third factor causes both (hot weather causing both ice cream sales and swimming).
- The link is coincidental.
Edexcel expects you to state explicitly that correlation does not imply causation and, where relevant, to suggest a plausible third factor.
Spurious correlation
A classic example is that the number of storks and the number of babies across towns may correlate, simply because larger towns have more of both. Recognising spurious correlation, and naming the hidden third variable, is exactly what extended-response questions reward.
Interpolation and extrapolation
Interpolation is estimating a value within the range of the data, which is generally reliable. Extrapolation is estimating beyond the range, which is risky because the pattern may not continue. Edexcel expects you to be cautious about extrapolation when using a line of best fit to make predictions.
Exam-style practice questions
Practice questions written in the style of Pearson Edexcel exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Edexcel 1ST0 20193 marksA scatter diagram shows the ice cream sales and the number of people swimming at a beach on days. The points show strong positive correlation. (a) Describe the correlation. (b) A student says 'buying ice cream makes people swim'. Explain why this conclusion is not justified.Show worked answer →
(a) Strong positive correlation: as ice cream sales increase, the number of people swimming also increases, and the points lie close to a straight line.
(b) Correlation does not imply causation. Both ice cream sales and swimming are likely caused by a third factor (hot weather), so the relationship is an association, not evidence that one causes the other.
Markers reward describing the correlation (strong, positive), stating that correlation does not imply causation, and identifying a likely third factor.
Edexcel 1ST0 20214 marksFor each pair of variables, state the type of correlation you would expect (positive, negative or none) and say whether any correlation is likely to be causal. (a) A car's age and its value. (b) The number of storks and the number of babies born in a set of towns.Show worked answer →
(a) Negative correlation: as a car gets older its value falls. This is likely causal, because age and wear directly reduce a car's worth.
(b) Positive correlation may appear, but it is spurious: more storks and more babies are both linked to town size (bigger towns have more of both), so there is no causal link.
Markers reward the correct correlation type for each, a valid comment on causation for (a), and recognising spurious correlation with a third factor for (b).
Related dot points
- Line of best fit by eye through the double mean point; the regression line y = a + bx; interpreting gradient and intercept; using the line for prediction with awareness of interpolation and extrapolation.
A focused answer to Edexcel GCSE Statistics on lines of best fit and regression, covering drawing a line of best fit through the double mean point, the regression line y = a + bx, interpreting the gradient and intercept, and using the line to make predictions with awareness of extrapolation.
- Calculating and interpreting Spearman's rank correlation coefficient; interpreting Pearson's product moment correlation coefficient; the distinction between rank correlation and product moment correlation.
A focused answer to Edexcel GCSE Statistics (Higher tier) on correlation coefficients, covering calculating and interpreting Spearman's rank correlation coefficient, interpreting Pearson's product moment correlation coefficient, and the distinction between rank correlation and linear product moment correlation.
- Types of data: raw, quantitative, qualitative, categorical, ordinal, discrete, continuous, ungrouped, grouped, bivariate and multivariate; primary versus secondary; explanatory and response variables; grouping into class intervals.
A focused answer to Edexcel GCSE Statistics on types of data, covering quantitative versus qualitative, categorical and ordinal, discrete and continuous, grouped and bivariate data, primary versus secondary sources, explanatory and response variables, and the effect of grouping into class intervals.
- Mode, median and mean for discrete and grouped data; estimating the mean of grouped data with midpoints; linear interpolation for the median; weighted and geometric mean; effect of changes and transformations on averages.
A focused answer to Edexcel GCSE Statistics on averages, covering mode, median and mean for discrete and grouped data, estimating the mean with class midpoints, linear interpolation for the median, weighted and geometric mean at Higher tier, and the effect of changes and transformations.
Sources & how we know this
- Pearson Edexcel GCSE (9-1) Statistics (1ST0) specification — Pearson Edexcel (2017)