EnglandMathsSyllabus dot point

How do you summarise, display and interpret data, and how do you identify relationships and outliers?

Measures of location and spread, histograms, box plots and cumulative frequency, identifying outliers, scatter diagrams, correlation and the use of regression lines.

A focused answer to the AQA A-Level Mathematics data presentation content, covering measures of location and spread, histograms, box plots, cumulative frequency, outliers, scatter diagrams, correlation and regression.

Generated by Claude Opus 4.811 min answerUpdated 2026-06-02

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this dot point is asking
Location and spread
Displaying data
Outliers
Correlation and regression
Choosing and comparing summaries

What this dot point is asking

AQA wants you to calculate and interpret measures of location and spread, draw and read histograms, box plots and cumulative frequency graphs, identify outliers using standard rules, describe correlation in scatter diagrams, and use a regression line to make and judge predictions. These skills are applied to the large data set on Paper 3, where interpretation in context earns as many marks as the calculation.

Location and spread

The median is the middle value of ordered data, and the mode is the most common value. The mean uses every data point but is sensitive to outliers, whereas the median and IQR are more robust. Comparing the mean with the median is a quick way to detect skew: mean above median suggests positive skew.

Displaying data

Histograms plot frequency density (frequency divided by class width) so that the area of each bar represents frequency. This is what makes them valid for unequal class widths, where bar height alone would mislead. Box plots display the minimum, lower quartile, median, upper quartile and maximum, and make comparisons of two distributions easy. Cumulative frequency graphs plot the running total against the upper class boundary, letting you read off the median and quartiles for grouped data.

Outliers

Find the quartiles and test for outliers

For the data $5, 7, 8, 9, 11, 12, 14, 30$ (already ordered, $n = 8$ ), find the quartiles and identify any outliers using the $1.5 \times \mathrm{IQR}$ rule.

Step 1: Find the median

With $n = 8$ , the median is the mean of the $4$ th and $5$ th values: $\frac{9 + 11}{2} = 10$ .

Step 2: Find the quartiles

The lower half is $5, 7, 8, 9$ , so $Q_1 = \frac{7 + 8}{2} = 7.5$ . The upper half is $11, 12, 14, 30$ , so $Q_3 = \frac{12 + 14}{2} = 13$ .

Step 3: Compute the outlier boundaries

$\mathrm{IQR} = 13 - 7.5 = 5.5$ . Boundaries are $7.5 - 1.5\times 5.5 = -0.75$ and $13 + 1.5\times 5.5 = 21.25$ .

Step 4: Identify outliers

$30 > 21.25$ , so $30$ is an outlier; no value is below $-0.75$ .

Correlation and regression

Correlation measures how closely points follow a straight line. Positive correlation means the variables rise together; negative means one falls as the other rises. The regression line of $y$ on $x$ is used to predict $y$ from $x$ : its gradient gives the predicted change in $y$ per unit change in $x$ . Predicting within the data range (interpolation) is reliable; predicting far outside it (extrapolation) is not, because the linear pattern may not continue.

Choosing and comparing summaries

A recurring exam skill is justifying which average or measure of spread to use. For skewed data or data with outliers, the median and interquartile range are preferred because they are not distorted by extreme values; for roughly symmetric data the mean and standard deviation use all the information and are usually reported. When comparing two data sets you should compare both a measure of location and a measure of spread, and you must do so in context: "the second class had a higher median mark and a smaller interquartile range, so on average they scored more and were more consistent."

Coding (linear transformation) sometimes simplifies calculation: if $y = \frac{x - a}{b}$ , then $\bar{x} = a + b\bar{y}$ and the standard deviation of $x$ is $b$ times that of $y$ . This lets you work with smaller numbers and scale back at the end, and AQA occasionally tests the effect of coding on the mean and standard deviation directly.

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20196 marksPaper 3, Section A. The times, in minutes, for

11

runners to complete a course are:

24, 26, 27, 28, 29, 30, 31, 33, 34, 41, 52

. (a) Find the median and the interquartile range. (b) Using the rule that an outlier is more than

1.5

times the interquartile range beyond a quartile, identify any outliers. (c) Comment on the skewness of the data.

Show worked answer →

With $11$ ordered values, the median is the $6$ th: $30$ . The lower quartile is the $3$ rd value $27$ and the upper quartile the $9$ th value $34$ , so $\mathrm{IQR} = 34 - 27 = 7$ . The outlier boundaries are $27 - 1.5\times 7 = 16.5$ and $34 + 1.5\times 7 = 44.5$ , so $52$ is an outlier (above $44.5$ ). For (c), the mean exceeds the median and there is a high outlier, so the data are positively skewed. Markers reward correct quartile positions, the IQR rule applied to both ends, and a justified comment on skew.

AQA 20217 marksPaper 3, Section B. A study records the daily maximum temperature

x

(degrees Celsius) and ice-cream sales

y

(hundreds) for several days, giving the regression line

y = -1.5 + 0.8x

. (a) Interpret the gradient in context. (b) Use the line to estimate sales when the temperature is

22

degrees Celsius, and comment on the reliability of this estimate if the data ranged from

10

25

degrees Celsius. (c) Estimate sales at

35

degrees Celsius and explain why this estimate is unreliable. (d) State what a correlation between these variables does, and does not, tell you.

Show worked answer →

For (a), the gradient $0.8$ means each $1$ degree rise in temperature is associated with an extra $0.8$ hundred (that is $80$ ) sales. For (b), at $x = 22$ , $y = -1.5 + 0.8\times 22 = 16.1$ , so about $1610$ sales; $22$ lies within the data range $10$ to $25$ , so this interpolation is reliable. For (c), at $x = 35$ , $y = -1.5 + 0.8\times 35 = 26.5$ , but $35$ is well outside the data range, so this extrapolation is unreliable as the linear relationship may not continue. For (d), correlation shows the strength of a linear association but does not establish that temperature causes sales. Markers reward contextual interpretation, the interpolation versus extrapolation distinction, and the correlation-not-causation point.

Related dot points

Sources & how we know this

AQA A-level Mathematics (7357) specification — AQA (2017)