Skip to main content
EnglandStatisticsSyllabus dot point

How do you fit a line to data and use it to predict?

Lines of best fit through the mean point, the equation of the line, interpolation, extrapolation, and Spearman's rank correlation coefficient.

A focused answer to AQA GCSE Statistics on lines of best fit, covering drawing the line through the mean point, finding its equation, interpolation and extrapolation, and Spearman's rank correlation coefficient.

Generated by Claude Opus 4.810 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Drawing the line of best fit
  3. The equation of the line
  4. Interpolation and extrapolation
  5. Spearman's rank correlation coefficient

What this dot point is asking

AQA wants you to draw a line of best fit through the mean point, find its equation, use it to predict by interpolation, recognise the danger of extrapolation, and calculate and interpret Spearman's rank correlation coefficient. The line of best fit turns a visible trend into a usable prediction tool, and Spearman's coefficient puts a number on the strength of a ranked relationship.

Drawing the line of best fit

Plotting the mean point first gives the line an anchor and is what examiners expect: calculate xˉ\bar{x} and yˉ\bar{y}, plot (xˉ,yˉ)(\bar{x}, \bar{y}), then draw the straightest line through it that balances the points either side. The mean point always lies on the true regression line, which is why using it improves the accuracy of a by-eye line and the predictions you make from it.

The equation of the line

To find the equation, read the gradient as riserun\frac{\text{rise}}{\text{run}} between two clear points on the line, and read cc where the line crosses the yy-axis (or substitute the mean point to solve for cc). Once you have the equation, substitute an xx-value to predict yy, or rearrange and substitute a yy-value to predict xx. The gradient also carries meaning in context: in y=908xy = 90 - 8x for car value, the 8-8 says the value falls by about £800\pounds 800 for each extra year of age.

Interpolation and extrapolation

This distinction is heavily examined. A prediction for a car aged 55 years from data covering ages 11 to 1010 is interpolation and is reasonable; a prediction for a car aged 2020 years is extrapolation and is unsafe, because the linear trend may break down (a value cannot fall below zero). Always check whether a prediction falls inside or outside the data range and comment on its reliability. The strength of the correlation matters too: a prediction from a strong correlation is more trustworthy than one from a weak correlation, even when both are interpolations.

Spearman's rank correlation coefficient

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20205 marksSeven students' Maths and Physics test ranks gave d2=14\sum d^2 = 14. (a) Calculate Spearman's rank correlation coefficient. (b) Interpret the result.
Show worked answer →

(a) With n=7n = 7: rs=16d2n(n21)=16×147(491)=184336=10.25=0.75r_s = 1 - \frac{6\sum d^2}{n(n^2 - 1)} = 1 - \frac{6 \times 14}{7(49 - 1)} = 1 - \frac{84}{336} = 1 - 0.25 = 0.75.

(b) rs=0.75r_s = 0.75 is a fairly strong positive agreement in the rankings: students who ranked highly in Maths tended to rank highly in Physics.

Markers reward correct substitution into the formula (including n(n21)=336n(n^2-1) = 336), the value 0.750.75, and a contextual interpretation of a fairly strong positive rank correlation.

AQA 20184 marksA line of best fit through the mean point of a data set on car age (xx, years) and value (yy, \pounds hundreds) is y=908xy = 90 - 8x. (a) Predict the value of a car aged 55 years. (b) Explain why using the line to predict the value of a car aged 2020 years would be unreliable.
Show worked answer →

(a) y=908(5)=9040=50y = 90 - 8(5) = 90 - 40 = 50, so about £5000\pounds 5000 (since yy is in hundreds).

(b) An age of 2020 years is well outside the range of the data, so this is extrapolation; the linear pattern may not continue (the value cannot go negative), making the prediction unreliable.

Markers reward substituting x=5x = 5 for part (a) and, for (b), identifying extrapolation and explaining that the trend may not hold outside the data range.

Related dot points

Sources & how we know this