How do you use scatter graphs to describe correlation and make predictions?
Plotting scatter graphs, describing correlation, drawing a line of best fit, using it to estimate values, and recognising the limits of extrapolation.
A focused answer to the AQA GCSE Mathematics statistics content on scatter graphs and correlation, covering plotting scatter graphs, describing correlation, drawing a line of best fit, using it to estimate values, and the limits of extrapolation.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
AQA wants you to plot a scatter graph, describe the correlation, draw a line of best fit by eye, use it to estimate values, and understand the limits of prediction (extrapolation) and the difference between correlation and causation. This topic links statistics to straight-line graphs, and the interpretation marks reward precise language about strength, direction and the dangers of predicting beyond the data.
Plotting and describing correlation
A scatter graph plots two variables, one on each axis, with a point for each paired observation. The pattern of points reveals the relationship. Read three features:
- Direction: positive (points rise from lower left to upper right), negative (points fall from upper left to lower right), or no correlation (no clear trend).
- Strength: strong if the points lie close to a straight line, weak if they are scattered loosely around it.
- Outliers: single points well away from the main cluster, which may be data errors.
So "strong positive correlation" means the points rise together and sit close to a line, while "weak negative correlation" means a loose downward trend.
The line of best fit
A line of best fit is a single straight line drawn through the data to summarise the trend. Draw it so that roughly equal numbers of points sit above and below, balancing the scatter; it should pass through the mean point (the average of the values and the average of the values). It is not a "join the dots" line and need not pass through any particular data point.
Interpolation, extrapolation and causation
Estimating a value inside the range of the data is interpolation, which is usually reliable because the trend is supported by data there. Estimating outside the range is extrapolation, which is unreliable because the linear pattern may not continue (it could level off, reverse, or hit a natural limit). Always flag an extrapolated prediction as uncertain.
A strong correlation shows that two variables move together, but it does not prove that one causes the other. There may be a third (lurking) variable, the causation could run the other way, or the link could be coincidence. "Correlation does not imply causation" is a standard exam response.
Outliers and the line of best fit
A single outlier can distort the line of best fit, pulling it toward itself and changing both its gradient and position. Before drawing the line, scan for any point well away from the main cluster; it may be a recording error worth ignoring, or a genuine but unusual case. When an exam shows an outlier, you are often asked to identify it and to comment on its effect: removing a clear error before fitting the line gives a more representative trend. The line should follow the bulk of the points, not be dragged off course by one stray.
Linking correlation to the gradient
The gradient of the line of best fit has a real meaning: it is the predicted change in the variable for each unit increase in the variable. If hours of training against fitness score has a line of gradient , then each extra hour of training is associated with about more score points, on average and within the data range. This connects scatter graphs to straight-line graphs and to rates of change, and it lets you interpret the relationship quantitatively rather than only describing it as "positive" or "strong".
Exam-style practice questions
Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AQA 20193 marksA scatter graph plots hours of revision against test score for ten students, showing a positive linear trend. A line of best fit passes through and . Use the line to estimate the score of a student who revised for hours. (Foundation tier, Paper 2, calculator.)Show worked answer →
Find the gradient of the line of best fit: marks per hour.
The line through has equation . At : .
So the estimated score is .
Markers reward using the line (not a single point), the gradient, and the estimate. Reading directly off the data instead of the line of best fit loses marks.
AQA 20213 marksA scatter graph shows a strong positive correlation between ice cream sales and air temperature. A student concludes that buying ice cream causes the temperature to rise. Comment on this conclusion and explain why predicting sales at from the graph would be unreliable. (Higher tier, Paper 1, non-calculator.)Show worked answer →
Correlation does not imply causation: the two variables rise together, but ice cream sales do not cause higher temperatures (the temperature drives sales, not the reverse).
Predicting at is extrapolation, using a value far outside the data range, where the linear trend may not hold, so the estimate is unreliable.
Markers reward the causation point and a clear explanation of why extrapolation is unsafe.
Related dot points
- Finding the mean, median, mode and range, averages from frequency tables, and the median and interquartile range from grouped data at Higher tier.
A focused answer to the AQA GCSE Mathematics statistics content on averages and spread, covering the mean, median, mode and range, averages from frequency tables, and the median and interquartile range from grouped data at Higher tier.
- Drawing and interpreting bar charts, pie charts, frequency tables, and cumulative frequency graphs, box plots and histograms at Higher tier.
A focused answer to the AQA GCSE Mathematics statistics content on charts and graphs, covering bar charts, pie charts and frequency tables, and cumulative frequency graphs, box plots and histograms at Higher tier.
- Types of data, populations and samples, random and stratified sampling, sources of bias, and designing good data collection.
A focused answer to the AQA GCSE Mathematics statistics content on sampling and data, covering types of data, populations and samples, random and stratified sampling, sources of bias, and designing good data collection.
- Plotting straight lines, finding gradient and intercept from the equation, writing equations of lines, and the conditions for parallel and perpendicular lines.
A focused answer to the AQA GCSE Mathematics algebra content on straight line graphs, covering gradient and intercept, the equation of a line through points, and the conditions for parallel and perpendicular lines.
- Recognising direct and inverse proportion, setting up and using proportion equations with a constant, and interpreting their graphs.
A focused answer to the AQA GCSE Mathematics content on direct and inverse proportion, covering recognising each type, setting up and using proportion equations with a constant of proportionality, and interpreting their graphs.
Sources & how we know this
- AQA GCSE Mathematics (8300) specification — AQA (2015)