Skip to main content
EnglandMathsSyllabus dot point

How do you draw and interpret scatter graphs, describe correlation, and use a line of best fit?

Scatter graphs and bivariate data, describing correlation (positive, negative or none), drawing and using a line of best fit to estimate values, and recognising the dangers of extrapolation and correlation versus causation.

A focused answer to the Edexcel GCSE Mathematics statistics content on scatter graphs and correlation, covering bivariate data, describing correlation, drawing and using a line of best fit, and the limits of extrapolation and correlation versus causation.

Generated by Claude Opus 4.89 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Scatter graphs and bivariate data
  3. Describing correlation
  4. The line of best fit
  5. The limits: extrapolation and causation
  6. Identifying outliers
  7. Try this

What this dot point is asking

Edexcel expects you to plot and interpret scatter graphs of bivariate (two-variable) data, to describe the correlation, to draw and use a line of best fit to estimate values, and to understand the limits of this: extrapolation beyond the data and the difference between correlation and causation. Scatter graphs are about relationships between two quantities, and the reasoning about reliability is heavily examined.

Scatter graphs and bivariate data

A scatter graph plots one variable against another, with each data point a pair of values. Its purpose is to reveal whether the two quantities are linked.

Describing correlation

Correlation describes the direction and strength of the relationship shown by the points.

Always describe correlation in context, not just by name: "positive correlation, so taller people tend to be heavier", rather than just "positive".

The line of best fit

A line of best fit is a single straight line that passes as close as possible to all the points, with roughly equal numbers above and below.

The limits: extrapolation and causation

Two cautions are heavily examined. Extrapolation is using the line of best fit beyond the range of the data; it is unreliable because there is no evidence the pattern continues outside the observed values. Correlation does not imply causation: two variables moving together does not prove one causes the other, since a third factor may be involved (ice cream sales and sunburn both rise in summer, but neither causes the other). Estimating within the data (interpolation) is the reliable use of a line of best fit.

Identifying outliers

A scatter graph can also reveal an outlier: a point that does not fit the general pattern, lying well away from the rest. An outlier might come from a measurement error or a genuinely unusual case, and a question may ask you to identify one and suggest why it occurred. When drawing a line of best fit, you generally ignore a clear outlier so that it does not pull the line away from the trend the bulk of the data shows. Being able to spot and comment on an outlier, rather than blindly fitting a line, is part of the statistical reasoning Edexcel rewards.

Try this

Q1. A scatter graph shows the age of a car against its value. What type of correlation would you expect, and what does it mean? [2 marks]

  • Cue. Negative correlation: as the car gets older, its value tends to decrease.

Q2. Why should you not use a line of best fit to predict a value far outside the plotted data? [1 mark]

  • Cue. That is extrapolation, and the relationship may not continue beyond the observed range, so the estimate is unreliable.

Exam-style practice questions

Practice questions written in the style of Pearson Edexcel exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Edexcel 20192 marksA scatter graph shows the temperature and the number of ice creams sold each day. Describe the correlation and what it means in this context. (Paper 2, calculator.)
Show worked answer →

As the temperature increases, the number of ice creams sold also increases, so the points rise to the right.

Correlation: positive correlation.

Meaning: on hotter days, more ice creams tend to be sold.

Markers award a mark for "positive correlation" and a mark for interpreting it in context. Just writing "positive" without saying what it means about temperature and sales can lose the context mark.

Edexcel 20213 marksA scatter graph plots study hours against test score, with a line of best fit drawn. A student studied for 77 hours. Explain how to use the line to estimate the score, and why an estimate for 2020 hours would be unreliable. (Paper 2, calculator.)
Show worked answer →

To estimate, go up from 77 hours on the horizontal axis to the line of best fit, then across to read the score on the vertical axis.

An estimate for 2020 hours is unreliable because it is outside the range of the data (extrapolation), where the relationship may no longer hold.

Markers award a mark for the interpolation method, a mark for naming extrapolation, and a mark for explaining why it is unreliable. Saying "2020 hours is impossible" misses the statistical point about extrapolation.

Related dot points

Sources & how we know this