How do you summarise, display and interpret data, and how do you identify relationships and outliers?
Measures of location and spread, histograms, box plots and cumulative frequency, identifying outliers, scatter diagrams, correlation and the use of regression lines.
A focused answer to the AQA A-Level Mathematics data presentation content, covering measures of location and spread, histograms, box plots, cumulative frequency, outliers, scatter diagrams, correlation and regression.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
AQA wants you to calculate and interpret measures of location and spread, draw and read histograms, box plots and cumulative frequency graphs, identify outliers using standard rules, describe correlation in scatter diagrams, and use a regression line to make and judge predictions. These skills are applied to the large data set on Paper 3, where interpretation in context earns as many marks as the calculation.
Location and spread
The median is the middle value of ordered data, and the mode is the most common value. The mean uses every data point but is sensitive to outliers, whereas the median and IQR are more robust. Comparing the mean with the median is a quick way to detect skew: mean above median suggests positive skew.
Displaying data
Histograms plot frequency density (frequency divided by class width) so that the area of each bar represents frequency. This is what makes them valid for unequal class widths, where bar height alone would mislead. Box plots display the minimum, lower quartile, median, upper quartile and maximum, and make comparisons of two distributions easy. Cumulative frequency graphs plot the running total against the upper class boundary, letting you read off the median and quartiles for grouped data.
Outliers
Correlation and regression
Correlation measures how closely points follow a straight line. Positive correlation means the variables rise together; negative means one falls as the other rises. The regression line of on is used to predict from : its gradient gives the predicted change in per unit change in . Predicting within the data range (interpolation) is reliable; predicting far outside it (extrapolation) is not, because the linear pattern may not continue.
Choosing and comparing summaries
A recurring exam skill is justifying which average or measure of spread to use. For skewed data or data with outliers, the median and interquartile range are preferred because they are not distorted by extreme values; for roughly symmetric data the mean and standard deviation use all the information and are usually reported. When comparing two data sets you should compare both a measure of location and a measure of spread, and you must do so in context: "the second class had a higher median mark and a smaller interquartile range, so on average they scored more and were more consistent."
Coding (linear transformation) sometimes simplifies calculation: if , then and the standard deviation of is times that of . This lets you work with smaller numbers and scale back at the end, and AQA occasionally tests the effect of coding on the mean and standard deviation directly.
Exam-style practice questions
Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AQA 20196 marksPaper 3, Section A. The times, in minutes, for runners to complete a course are: . (a) Find the median and the interquartile range. (b) Using the rule that an outlier is more than times the interquartile range beyond a quartile, identify any outliers. (c) Comment on the skewness of the data.Show worked answer β
With ordered values, the median is the th: . The lower quartile is the rd value and the upper quartile the th value , so . The outlier boundaries are and , so is an outlier (above ). For (c), the mean exceeds the median and there is a high outlier, so the data are positively skewed. Markers reward correct quartile positions, the IQR rule applied to both ends, and a justified comment on skew.
AQA 20217 marksPaper 3, Section B. A study records the daily maximum temperature (degrees Celsius) and ice-cream sales (hundreds) for several days, giving the regression line . (a) Interpret the gradient in context. (b) Use the line to estimate sales when the temperature is degrees Celsius, and comment on the reliability of this estimate if the data ranged from to degrees Celsius. (c) Estimate sales at degrees Celsius and explain why this estimate is unreliable. (d) State what a correlation between these variables does, and does not, tell you.Show worked answer β
For (a), the gradient means each degree rise in temperature is associated with an extra hundred (that is ) sales. For (b), at , , so about sales; lies within the data range to , so this interpolation is reliable. For (c), at , , but is well outside the data range, so this extrapolation is unreliable as the linear relationship may not continue. For (d), correlation shows the strength of a linear association but does not establish that temperature causes sales. Markers reward contextual interpretation, the interpolation versus extrapolation distinction, and the correlation-not-causation point.
Related dot points
- Populations and samples, the advantages and limitations of sampling, simple random sampling, systematic, stratified, quota and opportunity sampling, and the importance of the large data set.
A focused answer to the AQA A-Level Mathematics sampling content, covering populations and samples, the trade-offs of sampling, simple random, systematic, stratified, quota and opportunity sampling, and the role of the large data set.
- Probability of events, mutually exclusive and independent events, the addition and multiplication rules, Venn diagrams and tree diagrams, and conditional probability.
A focused answer to the AQA A-Level Mathematics probability content, covering single and combined events, mutually exclusive and independent events, the addition and multiplication rules, Venn and tree diagrams, and conditional probability.
- Discrete random variables and their probability distributions, the requirement that probabilities sum to one, the use of statistical distributions to model real situations, and an introduction to the binomial and normal models.
A focused answer to the AQA A-Level Mathematics statistical distributions content, covering discrete random variables, probability distributions, the condition that probabilities sum to one, and choosing a suitable model such as the binomial or normal distribution.
- The normal distribution as a model for continuous data, its mean and standard deviation, calculating probabilities, the standard normal distribution and standardising, finding values from probabilities, and using the normal approximation to the binomial.
A focused answer to the AQA A-Level Mathematics normal distribution content, covering the bell curve, mean and standard deviation, calculating probabilities, standardising with z values, inverse problems, and the normal approximation to the binomial.
- Setting up null and alternative hypotheses, the significance level, one-tailed and two-tailed tests, hypothesis tests for a binomial proportion and for a normal mean, critical regions, and interpreting the conclusion in context.
A focused answer to the AQA A-Level Mathematics hypothesis testing content, covering null and alternative hypotheses, significance levels, one and two-tailed tests for a binomial proportion and a normal mean, critical regions, and stating conclusions in context.
Sources & how we know this
- AQA A-level Mathematics (7357) specification β AQA (2017)