How do you fit a line to data and use it to make and interpret predictions?
Line of best fit by eye through the double mean point; the regression line y = a + bx; interpreting gradient and intercept; using the line for prediction with awareness of interpolation and extrapolation.
A focused answer to Edexcel GCSE Statistics on lines of best fit and regression, covering drawing a line of best fit through the double mean point, the regression line y = a + bx, interpreting the gradient and intercept, and using the line to make predictions with awareness of extrapolation.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
Edexcel code 2e.04 requires you to determine a line of best fit by eye, drawn through the calculated double mean point , and (Higher tier) to use the regression line of the form . You must interpret the gradient and intercept in context and use the line to make predictions, while being aware of the dangers of interpolation and extrapolation. Non-linear models are not tested.
Drawing the line of best fit
A line of best fit is a single straight line that passes as close as possible to all the points on a scatter diagram, showing the overall trend. The key technique Edexcel requires is to use the double mean point.
Plotting first gives you a fixed, accurate anchor, so your line is no longer drawn purely by eye. You then rotate the line about this point until it balances the points above and below.
The regression line y = a + bx
At Higher tier the line of best fit is treated as a regression line written in the form
This is the same straight-line idea as from mathematics, just with the constant written first. You can find the equation from two points on the line, or from the gradient and the double mean point, by substituting into and solving for .
Interpreting gradient and intercept
Marks are won by interpreting the line in context, not just stating numbers:
- The gradient is the rate of change: "for each extra year of age, the value falls by GBP " for a gradient of (thousand pounds per year).
- The intercept is the predicted when : "a brand new car (age ) is predicted to be worth GBP ". Be careful, because the intercept is only meaningful if is sensible for the context.
Using the line for prediction
To predict a value, substitute the known into the equation (or read off the line). This is reliable for interpolation (an inside the data range) but unreliable for extrapolation (an beyond the data), because the linear trend may not continue. Always check whether the prediction point lies within the range of the original data before trusting it.
The strength of the correlation also affects how much you should trust a prediction. If the points lie close to the line (strong correlation), predictions made by interpolation are reasonably reliable; if the points are widely scattered (weak correlation), even an interpolated prediction carries a large uncertainty. So when judging a prediction, consider both whether it is an interpolation or extrapolation and how strong the underlying correlation is.
When a line of best fit is appropriate
Edexcel only tests linear models, so a line of best fit should be used when the scatter diagram shows a roughly straight-line trend. If the points clearly curve, a straight line is a poor model and any prediction from it is unreliable. You should also ignore a single outlier when positioning the line by eye, since one stray point can distort it; mention the outlier rather than letting it drag the line. Recognising that a straight line does not suit curved data is part of choosing an appropriate model.
Exam-style practice questions
Practice questions written in the style of Pearson Edexcel exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Edexcel 1ST0 20204 marksA scatter diagram plots the age, years, and value, thousand pounds, of cars. The mean age is years and the mean value is GBP . (a) Explain how the double mean point helps you draw the line of best fit. (b) The line passes through and has gradient . Write its equation and interpret the gradient.Show worked answer →
(a) The double mean point always lies on the line of best fit, so plotting it gives a fixed point to draw the line through, making the line more accurate than drawing by eye alone.
(b) Using with gradient through : , so . Equation: .
The gradient means the value falls by about GBP for each extra year of age.
Markers reward explaining the double mean point lies on the line, forming the equation, and interpreting the gradient in context (GBP per year).
Edexcel 1ST0 20224 marksThe regression line for the marks in a mock test () and a final test () is . (a) Predict the final mark of a student who scored in the mock. (b) The mock marks ranged from to . Explain why using the line to predict the final mark for a student who scored in the mock is unreliable.Show worked answer →
(a) Substitute : . The predicted final mark is .
(b) A mock mark of is well outside the range of the data ( to ), so using the line there is extrapolation. The linear relationship is only known to hold within the data range and may not continue, so the prediction is unreliable.
Markers reward the substitution and prediction , and identifying extrapolation beyond the data range as the reason for unreliability.
Related dot points
- Vocabulary of correlation (positive, negative, zero, causation, association, interpolation, extrapolation); describing correlation by inspection as strong or weak; correlation does not imply causation; spurious correlation.
A focused answer to Edexcel GCSE Statistics on correlation, covering the vocabulary of correlation, describing correlation by inspection as strong or weak and positive or negative, why correlation does not imply causation, spurious correlation, and interpolation versus extrapolation.
- Calculating and interpreting Spearman's rank correlation coefficient; interpreting Pearson's product moment correlation coefficient; the distinction between rank correlation and product moment correlation.
A focused answer to Edexcel GCSE Statistics (Higher tier) on correlation coefficients, covering calculating and interpreting Spearman's rank correlation coefficient, interpreting Pearson's product moment correlation coefficient, and the distinction between rank correlation and linear product moment correlation.
- Mode, median and mean for discrete and grouped data; estimating the mean of grouped data with midpoints; linear interpolation for the median; weighted and geometric mean; effect of changes and transformations on averages.
A focused answer to Edexcel GCSE Statistics on averages, covering mode, median and mean for discrete and grouped data, estimating the mean with class midpoints, linear interpolation for the median, weighted and geometric mean at Higher tier, and the effect of changes and transformations.
- Time series graphs; identifying trends by inspection and by calculating moving averages; plotting a trend line; interpreting seasonal and cyclic variation; using trends and seasonal effects to predict.
A focused answer to Edexcel GCSE Statistics on time series, covering time series graphs, identifying trends by inspection and by moving averages, plotting a trend line, interpreting seasonal and cyclic variation, and using the trend and seasonal effect to make predictions.
Sources & how we know this
- Pearson Edexcel GCSE (9-1) Statistics (1ST0) specification — Pearson Edexcel (2017)