How do you draw and interpret scatter graphs, describe correlation, and use a line of best fit?
Scatter graphs and bivariate data, describing correlation (positive, negative or none), drawing and using a line of best fit to estimate values, and recognising the dangers of extrapolation and correlation versus causation.
A focused answer to the Edexcel GCSE Mathematics statistics content on scatter graphs and correlation, covering bivariate data, describing correlation, drawing and using a line of best fit, and the limits of extrapolation and correlation versus causation.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
Edexcel expects you to plot and interpret scatter graphs of bivariate (two-variable) data, to describe the correlation, to draw and use a line of best fit to estimate values, and to understand the limits of this: extrapolation beyond the data and the difference between correlation and causation. Scatter graphs are about relationships between two quantities, and the reasoning about reliability is heavily examined.
Scatter graphs and bivariate data
A scatter graph plots one variable against another, with each data point a pair of values. Its purpose is to reveal whether the two quantities are linked.
Describing correlation
Correlation describes the direction and strength of the relationship shown by the points.
Always describe correlation in context, not just by name: "positive correlation, so taller people tend to be heavier", rather than just "positive".
The line of best fit
A line of best fit is a single straight line that passes as close as possible to all the points, with roughly equal numbers above and below.
The limits: extrapolation and causation
Two cautions are heavily examined. Extrapolation is using the line of best fit beyond the range of the data; it is unreliable because there is no evidence the pattern continues outside the observed values. Correlation does not imply causation: two variables moving together does not prove one causes the other, since a third factor may be involved (ice cream sales and sunburn both rise in summer, but neither causes the other). Estimating within the data (interpolation) is the reliable use of a line of best fit.
Identifying outliers
A scatter graph can also reveal an outlier: a point that does not fit the general pattern, lying well away from the rest. An outlier might come from a measurement error or a genuinely unusual case, and a question may ask you to identify one and suggest why it occurred. When drawing a line of best fit, you generally ignore a clear outlier so that it does not pull the line away from the trend the bulk of the data shows. Being able to spot and comment on an outlier, rather than blindly fitting a line, is part of the statistical reasoning Edexcel rewards.
Try this
Q1. A scatter graph shows the age of a car against its value. What type of correlation would you expect, and what does it mean? [2 marks]
- Cue. Negative correlation: as the car gets older, its value tends to decrease.
Q2. Why should you not use a line of best fit to predict a value far outside the plotted data? [1 mark]
- Cue. That is extrapolation, and the relationship may not continue beyond the observed range, so the estimate is unreliable.
Exam-style practice questions
Practice questions written in the style of Pearson Edexcel exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Edexcel 20192 marksA scatter graph shows the temperature and the number of ice creams sold each day. Describe the correlation and what it means in this context. (Paper 2, calculator.)Show worked answer →
As the temperature increases, the number of ice creams sold also increases, so the points rise to the right.
Correlation: positive correlation.
Meaning: on hotter days, more ice creams tend to be sold.
Markers award a mark for "positive correlation" and a mark for interpreting it in context. Just writing "positive" without saying what it means about temperature and sales can lose the context mark.
Edexcel 20213 marksA scatter graph plots study hours against test score, with a line of best fit drawn. A student studied for hours. Explain how to use the line to estimate the score, and why an estimate for hours would be unreliable. (Paper 2, calculator.)Show worked answer →
To estimate, go up from hours on the horizontal axis to the line of best fit, then across to read the score on the vertical axis.
An estimate for hours is unreliable because it is outside the range of the data (extrapolation), where the relationship may no longer hold.
Markers award a mark for the interpolation method, a mark for naming extrapolation, and a mark for explaining why it is unreliable. Saying " hours is impossible" misses the statistical point about extrapolation.
Related dot points
- Populations and samples, representative and biased sampling, random sampling, types of data (qualitative and quantitative, discrete and continuous), and designing questionnaires and data collection.
A focused answer to the Edexcel GCSE Mathematics statistics content on sampling and data, covering populations and samples, representative and biased sampling, random sampling, types of data, and designing fair data collection.
- The mean, median, mode and range; finding averages from frequency tables and from grouped data using the midpoint and an estimated mean; and comparing distributions using an average and the range.
A focused answer to the Edexcel GCSE Mathematics statistics content on averages and spread, covering the mean, median, mode and range, finding averages from frequency tables and grouped data, and comparing distributions.
- Drawing and interpreting statistical diagrams: bar charts, pictograms, pie charts, frequency polygons, cumulative frequency graphs and box plots, and finding the median, quartiles and interquartile range (Higher tier).
A focused answer to the Edexcel GCSE Mathematics statistics content on charts and graphs, covering bar charts, pie charts, frequency polygons, cumulative frequency graphs and box plots, and finding the median, quartiles and interquartile range.
- Straight line graphs: plotting lines, finding the gradient and y-intercept, using the equation y = mx + c, finding the equation of a line through two points, and parallel and perpendicular lines (Higher tier).
A focused answer to the Edexcel GCSE Mathematics algebra content on straight line graphs, covering gradient and intercept, the equation y = mx + c, finding the equation through two points, and parallel and perpendicular lines.
- Estimating probability from experimental data using relative frequency, comparing experimental and theoretical probability, and calculating the expected number of outcomes from a probability.
A focused answer to the Edexcel GCSE Mathematics probability content on relative frequency and expected outcomes, covering estimating probability from experiments, comparing experimental and theoretical probability, and predicting the expected number of outcomes.
Sources & how we know this
- Pearson Edexcel GCSE (9-1) Mathematics (1MA1) specification — Pearson Edexcel (2015)