How do you fit a line to data and use it to predict?
Lines of best fit through the mean point, the equation of the line, interpolation, extrapolation, and Spearman's rank correlation coefficient.
A focused answer to AQA GCSE Statistics on lines of best fit, covering drawing the line through the mean point, finding its equation, interpolation and extrapolation, and Spearman's rank correlation coefficient.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
AQA wants you to draw a line of best fit through the mean point, find its equation, use it to predict by interpolation, recognise the danger of extrapolation, and calculate and interpret Spearman's rank correlation coefficient. The line of best fit turns a visible trend into a usable prediction tool, and Spearman's coefficient puts a number on the strength of a ranked relationship.
Drawing the line of best fit
Plotting the mean point first gives the line an anchor and is what examiners expect: calculate and , plot , then draw the straightest line through it that balances the points either side. The mean point always lies on the true regression line, which is why using it improves the accuracy of a by-eye line and the predictions you make from it.
The equation of the line
To find the equation, read the gradient as between two clear points on the line, and read where the line crosses the -axis (or substitute the mean point to solve for ). Once you have the equation, substitute an -value to predict , or rearrange and substitute a -value to predict . The gradient also carries meaning in context: in for car value, the says the value falls by about for each extra year of age.
Interpolation and extrapolation
This distinction is heavily examined. A prediction for a car aged years from data covering ages to is interpolation and is reasonable; a prediction for a car aged years is extrapolation and is unsafe, because the linear trend may break down (a value cannot fall below zero). Always check whether a prediction falls inside or outside the data range and comment on its reliability. The strength of the correlation matters too: a prediction from a strong correlation is more trustworthy than one from a weak correlation, even when both are interpolations.
Spearman's rank correlation coefficient
Exam-style practice questions
Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AQA 20205 marksSeven students' Maths and Physics test ranks gave . (a) Calculate Spearman's rank correlation coefficient. (b) Interpret the result.Show worked answer →
(a) With : .
(b) is a fairly strong positive agreement in the rankings: students who ranked highly in Maths tended to rank highly in Physics.
Markers reward correct substitution into the formula (including ), the value , and a contextual interpretation of a fairly strong positive rank correlation.
AQA 20184 marksA line of best fit through the mean point of a data set on car age (, years) and value (, \pounds hundreds) is . (a) Predict the value of a car aged years. (b) Explain why using the line to predict the value of a car aged years would be unreliable.Show worked answer →
(a) , so about (since is in hundreds).
(b) An age of years is well outside the range of the data, so this is extrapolation; the linear pattern may not continue (the value cannot go negative), making the prediction unreliable.
Markers reward substituting for part (a) and, for (b), identifying extrapolation and explaining that the trend may not hold outside the data range.
Related dot points
- Plotting scatter diagrams, bivariate data, identifying types and strength of correlation, and spotting outliers.
A focused answer to AQA GCSE Statistics on scatter diagrams, covering plotting bivariate data, describing the type and strength of correlation, and identifying outliers on a scatter diagram.
- The difference between correlation and causation, spurious correlation, and confounding variables.
A focused answer to AQA GCSE Statistics on correlation and causation, covering why correlation does not prove cause, spurious correlation, and confounding variables that explain an apparent link.
- Trend lines through moving averages, the mean seasonal effect, and forecasting future values from a time series.
A focused answer to AQA GCSE Statistics on trend lines and forecasting, covering drawing a trend line through moving averages, calculating the mean seasonal effect, and forecasting future values from a time series.
- Comparing distributions using an average and a measure of spread, skewness, and writing comparisons in context.
A focused answer to AQA GCSE Statistics on comparing distributions, covering how to compare two data sets using an average and a measure of spread, describe skewness from the mean, median and mode, and write comparisons in context.
Sources & how we know this
- AQA GCSE Statistics (8382) specification — AQA (2017)