EnglandStatisticsSyllabus dot point

How do you fit a line to data and use it to predict?

Lines of best fit through the mean point, the equation of the line, interpolation, extrapolation, and Spearman's rank correlation coefficient.

A focused answer to AQA GCSE Statistics on lines of best fit, covering drawing the line through the mean point, finding its equation, interpolation and extrapolation, and Spearman's rank correlation coefficient.

Generated by Claude Opus 4.810 min answerUpdated 2026-06-02

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this dot point is asking
Drawing the line of best fit
The equation of the line
Interpolation and extrapolation
Spearman's rank correlation coefficient

What this dot point is asking

AQA wants you to draw a line of best fit through the mean point, find its equation, use it to predict by interpolation, recognise the danger of extrapolation, and calculate and interpret Spearman's rank correlation coefficient. The line of best fit turns a visible trend into a usable prediction tool, and Spearman's coefficient puts a number on the strength of a ranked relationship.

Drawing the line of best fit

Plotting the mean point first gives the line an anchor and is what examiners expect: calculate $\bar{x}$ and $\bar{y}$ , plot $(\bar{x}, \bar{y})$ , then draw the straightest line through it that balances the points either side. The mean point always lies on the true regression line, which is why using it improves the accuracy of a by-eye line and the predictions you make from it.

The equation of the line

To find the equation, read the gradient as $\frac{\text{rise}}{\text{run}}$ between two clear points on the line, and read $c$ where the line crosses the $y$ -axis (or substitute the mean point to solve for $c$ ). Once you have the equation, substitute an $x$ -value to predict $y$ , or rearrange and substitute a $y$ -value to predict $x$ . The gradient also carries meaning in context: in $y = 90 - 8x$ for car value, the $-8$ says the value falls by about $\pounds 800$ for each extra year of age.

Interpolation and extrapolation

This distinction is heavily examined. A prediction for a car aged $5$ years from data covering ages $1$ to $10$ is interpolation and is reasonable; a prediction for a car aged $20$ years is extrapolation and is unsafe, because the linear trend may break down (a value cannot fall below zero). Always check whether a prediction falls inside or outside the data range and comment on its reliability. The strength of the correlation matters too: a prediction from a strong correlation is more trustworthy than one from a weak correlation, even when both are interpolations.

Spearman's rank correlation coefficient

Calculating Spearman's rank correlation coefficient

Rank each variable separately

Rank the values of the first variable from $1$ upwards, and do the same for the second variable. If two values tie, share the average rank between them.

Find and square the rank differences

For each pair, find $d$ (rank in variable one minus rank in variable two), then square it. Suppose for $n = 5$ pairs the squared differences sum to $\sum d^2 = 8$ .

Substitute into the formula

r_s = 1 - \frac{6 \times 8}{5(5^2 - 1)} = 1 - \frac{48}{120} = 1 - 0.4 = 0.6.

Interpret the value

A value near $+1$ means strong agreement in the rankings, near $-1$ means strong reverse agreement, and near $0$ means little agreement. Here $r_s = 0.6$ shows a moderate positive agreement between the two rankings.

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20205 marksSeven students' Maths and Physics test ranks gave

\sum d^2 = 14

. (a) Calculate Spearman's rank correlation coefficient. (b) Interpret the result.

Show worked answer →

(a) With $n = 7$ : $r_s = 1 - \frac{6\sum d^2}{n(n^2 - 1)} = 1 - \frac{6 \times 14}{7(49 - 1)} = 1 - \frac{84}{336} = 1 - 0.25 = 0.75$ .

(b) $r_s = 0.75$ is a fairly strong positive agreement in the rankings: students who ranked highly in Maths tended to rank highly in Physics.

Markers reward correct substitution into the formula (including $n(n^2-1) = 336$ ), the value $0.75$ , and a contextual interpretation of a fairly strong positive rank correlation.

AQA 20184 marksA line of best fit through the mean point of a data set on car age (

x

, years) and value (

y

, \pounds hundreds) is

y = 90 - 8x

. (a) Predict the value of a car aged

5

years. (b) Explain why using the line to predict the value of a car aged

20

years would be unreliable.

Show worked answer →

(a) $y = 90 - 8(5) = 90 - 40 = 50$ , so about $\pounds 5000$ (since $y$ is in hundreds).

(b) An age of $20$ years is well outside the range of the data, so this is extrapolation; the linear pattern may not continue (the value cannot go negative), making the prediction unreliable.

Markers reward substituting $x = 5$ for part (a) and, for (b), identifying extrapolation and explaining that the trend may not hold outside the data range.

Related dot points

Sources & how we know this

AQA GCSE Statistics (8382) specification — AQA (2017)