Does correlation prove that one thing causes another?
The difference between correlation and causation, spurious correlation, and confounding variables.
A focused answer to AQA GCSE Statistics on correlation and causation, covering why correlation does not prove cause, spurious correlation, and confounding variables that explain an apparent link.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
AQA wants you to explain why correlation does not prove causation, recognise spurious correlation, and identify confounding variables that could explain an apparent relationship. This is as much a reasoning skill as a calculation: examiners want you to challenge an unjustified "X causes Y" claim drawn from a scatter diagram.
Why correlation is not causation
When two variables are correlated, there are several possible explanations, and "the first causes the second" is only one of them. The second could cause the first; a third variable could cause both; or the correlation could be a coincidence. A classic example is that ice cream sales and drowning incidents are correlated, but neither causes the other; both rise in hot weather. Because you cannot tell which explanation is true from the scatter diagram alone, you cannot claim cause from correlation.
Spurious correlation
Large data sets can throw up striking but meaningless correlations purely by chance: with enough variables, some pairs will appear strongly correlated even though they are wholly unconnected. This is why a strong correlation alone, however striking, is not evidence of a real relationship, and why statisticians look for a plausible mechanism before taking a correlation seriously.
Confounding variables
In the ice cream and drowning example, temperature is the confounding variable: hot days raise both ice cream sales and the number of people swimming (and so drownings). The two original variables appear linked only because a hidden third factor drives both. To establish genuine causation you need a controlled experiment that holds confounding variables constant and changes only the explanatory variable, which is why observational correlations can suggest, but never prove, cause.
This topic ties together with the rest of the module and with experimental design. When you fit a line of best fit and make a prediction, you are using the correlation, but you must not therefore claim the explanatory variable causes the response. And it connects to the work on controlling variables and bias: a controlled experiment earns the right to talk about cause precisely because it holds the confounders constant, which an observational scatter diagram cannot do. The exam phrasing to watch for is any sentence of the form "this shows that X causes Y" drawn from a correlation alone, which you should always challenge by naming a plausible confounder and stating that an experiment would be needed to establish cause.
Exam-style practice questions
Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AQA 20193 marksA newspaper reports a strong positive correlation between the number of fire engines sent to a fire and the amount of damage caused. It concludes that fire engines cause damage. Explain why this conclusion is wrong.Show worked answer →
Correlation does not prove causation. The fire engines do not cause the damage; both the number of engines sent and the damage are caused by a third (confounding) variable, the size of the fire.
Bigger fires cause more damage and also lead to more engines being sent, which produces the correlation without any direct cause.
Markers reward stating that correlation is not causation and identifying the size of the fire as the confounding variable that explains both.
AQA 20212 marksExplain what is meant by a spurious correlation, and give one example.Show worked answer →
A spurious correlation is a correlation between two variables that has no real or meaningful connection and arises by coincidence (or through a hidden third variable).
Example: a correlation between the number of films a particular actor appears in each year and the annual number of drownings, which is pure coincidence.
Markers reward a correct definition (no genuine link, arising by chance) and a sensible example of two unconnected variables.
Related dot points
- Plotting scatter diagrams, bivariate data, identifying types and strength of correlation, and spotting outliers.
A focused answer to AQA GCSE Statistics on scatter diagrams, covering plotting bivariate data, describing the type and strength of correlation, and identifying outliers on a scatter diagram.
- Lines of best fit through the mean point, the equation of the line, interpolation, extrapolation, and Spearman's rank correlation coefficient.
A focused answer to AQA GCSE Statistics on lines of best fit, covering drawing the line through the mean point, finding its equation, interpolation and extrapolation, and Spearman's rank correlation coefficient.
- Explanatory and response variables, controlled and extraneous variables, control groups, and sources of bias in sampling and data collection.
A focused answer to AQA GCSE Statistics on controlling variables and bias, covering explanatory and response variables, controlled and extraneous variables, control groups and matched pairs, and the main sources of bias in sampling and data collection.
- The statistical enquiry cycle, hypotheses, the stages of an investigation, and types of statistical problem.
A focused answer to AQA GCSE Statistics on the statistical enquiry cycle, covering the stages of an investigation, writing a hypothesis, the role of pilot studies, and how the cycle structures a real statistical problem.
Sources & how we know this
- AQA GCSE Statistics (8382) specification — AQA (2017)