How do you plot a scatter graph, describe correlation, and use a line of best fit?
Plot scatter graphs, describe the type and strength of correlation, draw a line of best fit, use it to estimate values, and understand interpolation, extrapolation and the difference between correlation and causation.
A CCEA GCSE Mathematics answer on scatter graphs and correlation, covering plotting paired data, describing the type and strength of correlation, drawing and using a line of best fit, interpolation and extrapolation, and the difference between correlation and causation.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
Scatter graphs show the relationship between two variables, and they are a CCEA Handling Data topic that links statistics to straight-line graphs. You must plot paired data, describe the type and strength of correlation, draw a line of best fit and use it to estimate values, and understand the difference between interpolation and extrapolation and between correlation and causation. The interpretation in context, and the caution about extrapolation and causation, are the reasoning marks.
Plotting and describing correlation
A scatter graph plots two variables against each other, one point per pair of data. The overall pattern of the points shows the correlation.
Describing correlation fully means stating both its type (positive, negative or none) and, where asked, its strength, and then interpreting it in the context of the two variables.
The line of best fit
A line of best fit is a straight line drawn through the data to summarise the trend, with roughly as many points above it as below and following the direction of the cloud of points. It does not have to pass through any particular point, though it often passes near the mean point of the data.
Once drawn, the line lets you estimate one variable from a known value of the other, by reading up to the line and across.
Interpolation, extrapolation and causation
Estimating within the range of the data is interpolation, which is reasonably reliable because the line is supported by data there. Estimating outside the range is extrapolation, which is unreliable because the relationship may not continue beyond the data.
Crucially, correlation does not imply causation. Two variables can be correlated because a third factor affects both, or by coincidence, so a strong correlation alone does not prove that one variable causes the other. CCEA tests this idea directly, so a careful answer avoids claiming cause from correlation. A classic example is that ice-cream sales and the number of people swimming are positively correlated, but neither causes the other; the hidden factor is hot weather, which drives both up. Recognising the likely hidden variable is a good way to explain why a correlation need not be causal in a worded answer.
The scatter graph also reveals outliers, single points that lie far from the trend. An outlier might be a genuine unusual case or a recording error, and it is worth noting because a line of best fit should follow the main body of points rather than be dragged towards one stray value. When you draw the line by eye, balance the points above and below it and ignore an obvious outlier for the purpose of the trend, mentioning it separately if the question asks you to comment.
Why this matters
Scatter graphs are how data analysts spot and quantify relationships, used across science, economics and social research, and they connect statistics to the gradient and trend ideas of straight-line graphs. The cautions about extrapolation and causation are exactly the critical reasoning CCEA rewards in AO2 and AO3, and they matter well beyond the exam for reading statistics sensibly.
Exam-style practice questions
Practice questions written in the style of CCEA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
CCEA 20192 marksA scatter graph plots hours of revision against test score, and the points rise from bottom left to top right. Describe the correlation and what it suggests. (Non-calculator.)Show worked answer →
Points rising from bottom left to top right show positive correlation.
So there is positive correlation between revision time and test score.
This suggests that, in general, more revision is associated with a higher test score.
One mark is for "positive correlation" and one for the interpretation in context. Saying only "the line goes up" without naming positive correlation does not gain the first mark.
CCEA 20213 marksExplain why using a line of best fit to predict a value far outside the data range is unreliable. (Non-calculator.)Show worked answer →
Predicting beyond the range of the data is called extrapolation.
The line of best fit is only known to model the relationship within the range of the data collected.
Outside that range the pattern may change, so the prediction is unreliable and could be far from the true value.
Marks are for naming extrapolation, for explaining the line only fits the data range, and for the conclusion about reliability. Predictions inside the range (interpolation) are far safer.
Related dot points
- Understand the data-handling cycle, distinguish types of data and sampling, use frequency and two-way tables, and draw and interpret bar charts, pie charts, pictograms, frequency polygons and histograms.
A CCEA GCSE Mathematics answer on collecting and representing data, covering the data-handling cycle, types of data and sampling, frequency and two-way tables, and bar charts pie charts pictograms frequency polygons and histograms.
- Find the mean, median, mode and range of a data set, estimate the mean from grouped data, find the modal class, and use averages and range to compare two distributions.
A CCEA GCSE Mathematics answer on averages and spread, covering the mean, median, mode and range, estimating the mean from grouped data, finding the modal class, and comparing two distributions using an average and a measure of spread.
- Construct and read cumulative frequency curves, find the median and quartiles, calculate the interquartile range, and draw and compare box plots (Higher tier).
A CCEA GCSE Mathematics Higher answer on cumulative frequency and box plots, covering constructing and reading cumulative frequency curves, finding the median and quartiles, calculating the interquartile range, and drawing and comparing box plots.
- Use the probability scale from 0 to 1, find probabilities of single events, use that probabilities sum to 1, apply the mutually exclusive addition rule, and list outcomes with sample space diagrams.
A CCEA GCSE Mathematics answer on probability basics, covering the probability scale, single-event probability, the fact that probabilities sum to 1, mutually exclusive events and the addition rule, and listing outcomes with sample space diagrams.
- Use relative frequency to estimate probability from experimental data, understand how estimates improve with more trials, identify bias, and calculate expected frequencies.
A CCEA GCSE Mathematics answer on relative frequency, covering estimating probability from experimental data, how estimates improve with more trials, recognising a biased experiment, and calculating expected frequencies.
Sources & how we know this
- CCEA GCSE Mathematics specification (2210) — CCEA (2017)