Skip to main content
Northern IrelandMathsSyllabus dot point

How do you plot a scatter graph, describe correlation, and use a line of best fit?

Plot scatter graphs, describe the type and strength of correlation, draw a line of best fit, use it to estimate values, and understand interpolation, extrapolation and the difference between correlation and causation.

A CCEA GCSE Mathematics answer on scatter graphs and correlation, covering plotting paired data, describing the type and strength of correlation, drawing and using a line of best fit, interpolation and extrapolation, and the difference between correlation and causation.

Generated by Claude Opus 4.810 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Plotting and describing correlation
  3. The line of best fit
  4. Interpolation, extrapolation and causation
  5. Why this matters

What this dot point is asking

Scatter graphs show the relationship between two variables, and they are a CCEA Handling Data topic that links statistics to straight-line graphs. You must plot paired data, describe the type and strength of correlation, draw a line of best fit and use it to estimate values, and understand the difference between interpolation and extrapolation and between correlation and causation. The interpretation in context, and the caution about extrapolation and causation, are the reasoning marks.

Plotting and describing correlation

A scatter graph plots two variables against each other, one point per pair of data. The overall pattern of the points shows the correlation.

Describing correlation fully means stating both its type (positive, negative or none) and, where asked, its strength, and then interpreting it in the context of the two variables.

The line of best fit

A line of best fit is a straight line drawn through the data to summarise the trend, with roughly as many points above it as below and following the direction of the cloud of points. It does not have to pass through any particular point, though it often passes near the mean point of the data.

Once drawn, the line lets you estimate one variable from a known value of the other, by reading up to the line and across.

Interpolation, extrapolation and causation

Estimating within the range of the data is interpolation, which is reasonably reliable because the line is supported by data there. Estimating outside the range is extrapolation, which is unreliable because the relationship may not continue beyond the data.

Crucially, correlation does not imply causation. Two variables can be correlated because a third factor affects both, or by coincidence, so a strong correlation alone does not prove that one variable causes the other. CCEA tests this idea directly, so a careful answer avoids claiming cause from correlation. A classic example is that ice-cream sales and the number of people swimming are positively correlated, but neither causes the other; the hidden factor is hot weather, which drives both up. Recognising the likely hidden variable is a good way to explain why a correlation need not be causal in a worded answer.

The scatter graph also reveals outliers, single points that lie far from the trend. An outlier might be a genuine unusual case or a recording error, and it is worth noting because a line of best fit should follow the main body of points rather than be dragged towards one stray value. When you draw the line by eye, balance the points above and below it and ignore an obvious outlier for the purpose of the trend, mentioning it separately if the question asks you to comment.

Why this matters

Scatter graphs are how data analysts spot and quantify relationships, used across science, economics and social research, and they connect statistics to the gradient and trend ideas of straight-line graphs. The cautions about extrapolation and causation are exactly the critical reasoning CCEA rewards in AO2 and AO3, and they matter well beyond the exam for reading statistics sensibly.

Exam-style practice questions

Practice questions written in the style of CCEA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

CCEA 20192 marksA scatter graph plots hours of revision against test score, and the points rise from bottom left to top right. Describe the correlation and what it suggests. (Non-calculator.)
Show worked answer →

Points rising from bottom left to top right show positive correlation.

So there is positive correlation between revision time and test score.

This suggests that, in general, more revision is associated with a higher test score.

One mark is for "positive correlation" and one for the interpretation in context. Saying only "the line goes up" without naming positive correlation does not gain the first mark.

CCEA 20213 marksExplain why using a line of best fit to predict a value far outside the data range is unreliable. (Non-calculator.)
Show worked answer →

Predicting beyond the range of the data is called extrapolation.

The line of best fit is only known to model the relationship within the range of the data collected.

Outside that range the pattern may change, so the prediction is unreliable and could be far from the true value.

Marks are for naming extrapolation, for explaining the line only fits the data range, and for the conclusion about reliability. Predictions inside the range (interpolation) are far safer.

Related dot points

Sources & how we know this