Skip to main content
EnglandMathsSyllabus dot point

How do you summarise, display and interpret data, and how do you identify relationships and outliers?

Measures of location and spread, histograms, box plots and cumulative frequency, identifying outliers, scatter diagrams, correlation and the use of regression lines.

A focused answer to the AQA A-Level Mathematics data presentation content, covering measures of location and spread, histograms, box plots, cumulative frequency, outliers, scatter diagrams, correlation and regression.

Generated by Claude Opus 4.811 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Location and spread
  3. Displaying data
  4. Outliers
  5. Correlation and regression
  6. Choosing and comparing summaries

What this dot point is asking

AQA wants you to calculate and interpret measures of location and spread, draw and read histograms, box plots and cumulative frequency graphs, identify outliers using standard rules, describe correlation in scatter diagrams, and use a regression line to make and judge predictions. These skills are applied to the large data set on Paper 3, where interpretation in context earns as many marks as the calculation.

Location and spread

The median is the middle value of ordered data, and the mode is the most common value. The mean uses every data point but is sensitive to outliers, whereas the median and IQR are more robust. Comparing the mean with the median is a quick way to detect skew: mean above median suggests positive skew.

Displaying data

Histograms plot frequency density (frequency divided by class width) so that the area of each bar represents frequency. This is what makes them valid for unequal class widths, where bar height alone would mislead. Box plots display the minimum, lower quartile, median, upper quartile and maximum, and make comparisons of two distributions easy. Cumulative frequency graphs plot the running total against the upper class boundary, letting you read off the median and quartiles for grouped data.

Outliers

Correlation and regression

Correlation measures how closely points follow a straight line. Positive correlation means the variables rise together; negative means one falls as the other rises. The regression line of yy on xx is used to predict yy from xx: its gradient gives the predicted change in yy per unit change in xx. Predicting within the data range (interpolation) is reliable; predicting far outside it (extrapolation) is not, because the linear pattern may not continue.

Choosing and comparing summaries

A recurring exam skill is justifying which average or measure of spread to use. For skewed data or data with outliers, the median and interquartile range are preferred because they are not distorted by extreme values; for roughly symmetric data the mean and standard deviation use all the information and are usually reported. When comparing two data sets you should compare both a measure of location and a measure of spread, and you must do so in context: "the second class had a higher median mark and a smaller interquartile range, so on average they scored more and were more consistent."

Coding (linear transformation) sometimes simplifies calculation: if y=xβˆ’aby = \frac{x - a}{b}, then xΛ‰=a+byΛ‰\bar{x} = a + b\bar{y} and the standard deviation of xx is bb times that of yy. This lets you work with smaller numbers and scale back at the end, and AQA occasionally tests the effect of coding on the mean and standard deviation directly.

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20196 marksPaper 3, Section A. The times, in minutes, for 1111 runners to complete a course are: 24,26,27,28,29,30,31,33,34,41,5224, 26, 27, 28, 29, 30, 31, 33, 34, 41, 52. (a) Find the median and the interquartile range. (b) Using the rule that an outlier is more than 1.51.5 times the interquartile range beyond a quartile, identify any outliers. (c) Comment on the skewness of the data.
Show worked answer β†’

With 1111 ordered values, the median is the 66th: 3030. The lower quartile is the 33rd value 2727 and the upper quartile the 99th value 3434, so IQR=34βˆ’27=7\mathrm{IQR} = 34 - 27 = 7. The outlier boundaries are 27βˆ’1.5Γ—7=16.527 - 1.5\times 7 = 16.5 and 34+1.5Γ—7=44.534 + 1.5\times 7 = 44.5, so 5252 is an outlier (above 44.544.5). For (c), the mean exceeds the median and there is a high outlier, so the data are positively skewed. Markers reward correct quartile positions, the IQR rule applied to both ends, and a justified comment on skew.

AQA 20217 marksPaper 3, Section B. A study records the daily maximum temperature xx (degrees Celsius) and ice-cream sales yy (hundreds) for several days, giving the regression line y=βˆ’1.5+0.8xy = -1.5 + 0.8x. (a) Interpret the gradient in context. (b) Use the line to estimate sales when the temperature is 2222 degrees Celsius, and comment on the reliability of this estimate if the data ranged from 1010 to 2525 degrees Celsius. (c) Estimate sales at 3535 degrees Celsius and explain why this estimate is unreliable. (d) State what a correlation between these variables does, and does not, tell you.
Show worked answer β†’

For (a), the gradient 0.80.8 means each 11 degree rise in temperature is associated with an extra 0.80.8 hundred (that is 8080) sales. For (b), at x=22x = 22, y=βˆ’1.5+0.8Γ—22=16.1y = -1.5 + 0.8\times 22 = 16.1, so about 16101610 sales; 2222 lies within the data range 1010 to 2525, so this interpolation is reliable. For (c), at x=35x = 35, y=βˆ’1.5+0.8Γ—35=26.5y = -1.5 + 0.8\times 35 = 26.5, but 3535 is well outside the data range, so this extrapolation is unreliable as the linear relationship may not continue. For (d), correlation shows the strength of a linear association but does not establish that temperature causes sales. Markers reward contextual interpretation, the interpolation versus extrapolation distinction, and the correlation-not-causation point.

Related dot points

Sources & how we know this