EnglandMathsSyllabus dot point

How do you use scatter graphs to describe correlation and make predictions?

Plotting scatter graphs, describing correlation, drawing a line of best fit, using it to estimate values, and recognising the limits of extrapolation.

A focused answer to the AQA GCSE Mathematics statistics content on scatter graphs and correlation, covering plotting scatter graphs, describing correlation, drawing a line of best fit, using it to estimate values, and the limits of extrapolation.

Generated by Claude Opus 4.88 min answerUpdated 2026-06-02

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this dot point is asking
Plotting and describing correlation
The line of best fit
Interpolation, extrapolation and causation
Outliers and the line of best fit
Linking correlation to the gradient

What this dot point is asking

AQA wants you to plot a scatter graph, describe the correlation, draw a line of best fit by eye, use it to estimate values, and understand the limits of prediction (extrapolation) and the difference between correlation and causation. This topic links statistics to straight-line graphs, and the interpretation marks reward precise language about strength, direction and the dangers of predicting beyond the data.

Plotting and describing correlation

A scatter graph plots two variables, one on each axis, with a point for each paired observation. The pattern of points reveals the relationship. Read three features:

Direction: positive (points rise from lower left to upper right), negative (points fall from upper left to lower right), or no correlation (no clear trend).
Strength: strong if the points lie close to a straight line, weak if they are scattered loosely around it.
Outliers: single points well away from the main cluster, which may be data errors.

So "strong positive correlation" means the points rise together and sit close to a line, while "weak negative correlation" means a loose downward trend.

The line of best fit

A line of best fit is a single straight line drawn through the data to summarise the trend. Draw it so that roughly equal numbers of points sit above and below, balancing the scatter; it should pass through the mean point (the average of the $x$ values and the average of the $y$ values). It is not a "join the dots" line and need not pass through any particular data point.

Use a line of best fit to predict

A scatter graph relates weekly hours of training to a fitness score, with a positive trend. A line of best fit passes through $(1, 20)$ and $(6, 60)$ . Estimate the fitness score for $4$ hours of training.

Find the gradient of the line

Gradient $= \dfrac{60 - 20}{6 - 1} = \dfrac{40}{5} = 8$ score points per hour.

Write the line equation

Using the point $(1, 20)$ : $y = 20 + 8(x - 1)$ .

Substitute the prediction value

At $x = 4$ : $y = 20 + 8 \times 3 = 20 + 24 = 44$ .

State and judge the estimate

The estimated fitness score is $44$ . Since $4$ hours lies within the data range $1$ to $6$ , this is interpolation and is reasonably reliable.

Interpolation, extrapolation and causation

Estimating a value inside the range of the data is interpolation, which is usually reliable because the trend is supported by data there. Estimating outside the range is extrapolation, which is unreliable because the linear pattern may not continue (it could level off, reverse, or hit a natural limit). Always flag an extrapolated prediction as uncertain.

A strong correlation shows that two variables move together, but it does not prove that one causes the other. There may be a third (lurking) variable, the causation could run the other way, or the link could be coincidence. "Correlation does not imply causation" is a standard exam response.

Outliers and the line of best fit

A single outlier can distort the line of best fit, pulling it toward itself and changing both its gradient and position. Before drawing the line, scan for any point well away from the main cluster; it may be a recording error worth ignoring, or a genuine but unusual case. When an exam shows an outlier, you are often asked to identify it and to comment on its effect: removing a clear error before fitting the line gives a more representative trend. The line should follow the bulk of the points, not be dragged off course by one stray.

Linking correlation to the gradient

The gradient of the line of best fit has a real meaning: it is the predicted change in the $y$ variable for each unit increase in the $x$ variable. If hours of training against fitness score has a line of gradient $8$ , then each extra hour of training is associated with about $8$ more score points, on average and within the data range. This connects scatter graphs to straight-line graphs and to rates of change, and it lets you interpret the relationship quantitatively rather than only describing it as "positive" or "strong".

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20193 marksA scatter graph plots hours of revision against test score for ten students, showing a positive linear trend. A line of best fit passes through

(2, 30)

and

(8, 78)

. Use the line to estimate the score of a student who revised for

5

hours. (Foundation tier, Paper 2, calculator.)

Show worked answer →

Find the gradient of the line of best fit: $\dfrac{78 - 30}{8 - 2} = \dfrac{48}{6} = 8$ marks per hour.

The line through $(2, 30)$ has equation $y = 30 + 8(x - 2)$ . At $x = 5$ : $y = 30 + 8 \times 3 = 30 + 24 = 54$ .

So the estimated score is $54$ .

Markers reward using the line (not a single point), the gradient, and the estimate. Reading directly off the data instead of the line of best fit loses marks.

AQA 20213 marksA scatter graph shows a strong positive correlation between ice cream sales and air temperature. A student concludes that buying ice cream causes the temperature to rise. Comment on this conclusion and explain why predicting sales at

45^\circ\text{C}

from the graph would be unreliable. (Higher tier, Paper 1, non-calculator.)

Show worked answer →

Correlation does not imply causation: the two variables rise together, but ice cream sales do not cause higher temperatures (the temperature drives sales, not the reverse).

Predicting at $45^\circ\text{C}$ is extrapolation, using a value far outside the data range, where the linear trend may not hold, so the estimate is unreliable.

Markers reward the causation point and a clear explanation of why extrapolation is unsafe.

Related dot points

Sources & how we know this

AQA GCSE Mathematics (8300) specification — AQA (2015)