EnglandStatisticsSyllabus dot point

How do you fit a line to data and use it to make and interpret predictions?

Line of best fit by eye through the double mean point; the regression line y = a + bx; interpreting gradient and intercept; using the line for prediction with awareness of interpolation and extrapolation.

A focused answer to Edexcel GCSE Statistics on lines of best fit and regression, covering drawing a line of best fit through the double mean point, the regression line y = a + bx, interpreting the gradient and intercept, and using the line to make predictions with awareness of extrapolation.

Generated by Claude Opus 4.89 min answerUpdated 2026-06-02

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this dot point is asking
Drawing the line of best fit
The regression line y = a + bx
Interpreting gradient and intercept
Using the line for prediction
When a line of best fit is appropriate

What this dot point is asking

Edexcel code 2e.04 requires you to determine a line of best fit by eye, drawn through the calculated double mean point $(\bar{x}, \bar{y})$ , and (Higher tier) to use the regression line of the form $y = a + bx$ . You must interpret the gradient and intercept in context and use the line to make predictions, while being aware of the dangers of interpolation and extrapolation. Non-linear models are not tested.

Drawing the line of best fit

A line of best fit is a single straight line that passes as close as possible to all the points on a scatter diagram, showing the overall trend. The key technique Edexcel requires is to use the double mean point.

Plotting $(\bar{x}, \bar{y})$ first gives you a fixed, accurate anchor, so your line is no longer drawn purely by eye. You then rotate the line about this point until it balances the points above and below.

The regression line y = a + bx

At Higher tier the line of best fit is treated as a regression line written in the form

This is the same straight-line idea as $y = mx + c$ from mathematics, just with the constant written first. You can find the equation from two points on the line, or from the gradient and the double mean point, by substituting into $y = a + bx$ and solving for $a$ .

Interpreting gradient and intercept

Marks are won by interpreting the line in context, not just stating numbers:

The gradient $b$ is the rate of change: "for each extra year of age, the value falls by GBP $800$ " for a gradient of $-0.8$ (thousand pounds per year).
The intercept $a$ is the predicted $y$ when $x = 0$ : "a brand new car (age $0$ ) is predicted to be worth GBP $12{,}000$ ". Be careful, because the intercept is only meaningful if $x = 0$ is sensible for the context.

Using the line for prediction

To predict a value, substitute the known $x$ into the equation (or read off the line). This is reliable for interpolation (an $x$ inside the data range) but unreliable for extrapolation (an $x$ beyond the data), because the linear trend may not continue. Always check whether the prediction point lies within the range of the original data before trusting it.

The strength of the correlation also affects how much you should trust a prediction. If the points lie close to the line (strong correlation), predictions made by interpolation are reasonably reliable; if the points are widely scattered (weak correlation), even an interpolated prediction carries a large uncertainty. So when judging a prediction, consider both whether it is an interpolation or extrapolation and how strong the underlying correlation is.

When a line of best fit is appropriate

Edexcel only tests linear models, so a line of best fit should be used when the scatter diagram shows a roughly straight-line trend. If the points clearly curve, a straight line is a poor model and any prediction from it is unreliable. You should also ignore a single outlier when positioning the line by eye, since one stray point can distort it; mention the outlier rather than letting it drag the line. Recognising that a straight line does not suit curved data is part of choosing an appropriate model.

Finding and using a regression line

Anchor on the double mean point

For $12$ houses, the mean floor area is $\bar{x} = 90$ m squared and the mean price is $\bar{y} = 250$ thousand pounds, so the line passes through $(90, 250)$ .

Use the gradient to find the equation

The line has gradient $b = 2$ (thousand pounds per m squared). Substituting into $y = a + bx$ : $250 = a + 2 \times 90$ , so $a = 250 - 180 = 70$ . The line is $y = 70 + 2x$ .

Interpret the parts

The gradient $2$ means price rises by GBP $2000$ per extra square metre; the intercept $70$ predicts a price of GBP $70{,}000$ for a (notional) zero-area house.

Predict within the data

For a $100$ m squared house: $y = 70 + 2 \times 100 = 270$ , so the predicted price is GBP $270{,}000$ , reliable because $100$ is within the data range.

Exam-style practice questions

Practice questions written in the style of Pearson Edexcel exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Edexcel 1ST0 20204 marksA scatter diagram plots the age,

x

years, and value,

y

thousand pounds, of

10

cars. The mean age is

5

years and the mean value is GBP

8000

. (a) Explain how the double mean point helps you draw the line of best fit. (b) The line passes through

(5, 8)

and has gradient

-0.8

. Write its equation and interpret the gradient.

Show worked answer →

(a) The double mean point $(\bar{x}, \bar{y}) = (5, 8)$ always lies on the line of best fit, so plotting it gives a fixed point to draw the line through, making the line more accurate than drawing by eye alone.

(b) Using $y = a + bx$ with gradient $b = -0.8$ through $(5, 8)$ : $8 = a + (-0.8)(5)$ , so $a = 8 + 4 = 12$ . Equation: $y = 12 - 0.8x$ .

The gradient $-0.8$ means the value falls by about GBP $800$ for each extra year of age.

Markers reward explaining the double mean point lies on the line, forming the equation, and interpreting the gradient in context (GBP $800$ per year).

Edexcel 1ST0 20224 marksThe regression line for the marks in a mock test (

x

) and a final test (

y

) is

y = 15 + 0.7x

. (a) Predict the final mark of a student who scored

40

in the mock. (b) The mock marks ranged from

20

80

. Explain why using the line to predict the final mark for a student who scored

5

in the mock is unreliable.

Show worked answer →

(a) Substitute $x = 40$ : $y = 15 + 0.7 \times 40 = 15 + 28 = 43$ . The predicted final mark is $43$ .

(b) A mock mark of $5$ is well outside the range of the data ( $20$ to $80$ ), so using the line there is extrapolation. The linear relationship is only known to hold within the data range and may not continue, so the prediction is unreliable.

Markers reward the substitution and prediction $43$ , and identifying extrapolation beyond the data range as the reason for unreliability.

Related dot points

Sources & how we know this

Pearson Edexcel GCSE (9-1) Statistics (1ST0) specification — Pearson Edexcel (2017)