Skip to main content
EnglandMathsSyllabus dot point

How do you summarise, display and compare data, and identify outliers and correlation?

Measures of central tendency and spread, histograms, box plots and cumulative frequency, identifying outliers, comparing distributions, and correlation and the regression line.

A focused answer to the OCR A-Level Mathematics A data presentation content, covering the mean, median and mode, range, interquartile range, variance and standard deviation, histograms, box plots and cumulative frequency, identifying outliers, comparing distributions, and interpreting correlation and the regression line.

Generated by Claude Opus 4.812 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. The answer
  3. Examples in context
  4. Try this

What this dot point is asking

OCR wants you to calculate and interpret measures of central tendency (mean, median, mode) and spread (range, interquartile range, variance, standard deviation), draw and read histograms (with frequency density), box plots and cumulative frequency graphs, identify outliers by a stated rule, compare two distributions, and interpret correlation and the least-squares regression line, including the dangers of extrapolation.

The answer

Averages and spread

The mean uses every value but is sensitive to outliers; the median is the middle value and is resistant to outliers; the mode is the most common value. The standard deviation measures typical distance from the mean, while the interquartile range measures the spread of the middle half and ignores extremes.

Histograms and frequency density

A histogram shows grouped continuous data with area proportional to frequency, so the vertical axis is frequency density, not frequency.

Outliers

An outlier is a value far from the rest. Two common rules are: more than 1.5×IQR1.5 \times \text{IQR} beyond a quartile, or more than two standard deviations from the mean. The question states which rule to use.

Box plots and skewness

A box plot shows the minimum, the three quartiles and the maximum. Comparing the median's position within the box describes skewness: a median nearer Q1Q_1 indicates positive skew, nearer Q3Q_3 indicates negative skew.

Examples in context

Comparing two distributions

To compare data sets, always compare a measure of location and a measure of spread, in context. For example "the median mark of class A (62) is higher than class B (55), and class A's smaller interquartile range (10 versus 18) shows its marks were more consistent."

Correlation and regression

Correlation measures how closely two variables follow a linear relationship; the product moment correlation coefficient rr runs from 1-1 to 11. The regression line of yy on xx is the best-fit line for predicting yy from xx. Use it only within the range of the data: predicting outside that range (extrapolation) is unreliable, and correlation alone does not prove that one variable causes the other.

Try this

Q1. A class width is 55 and the frequency is 3030. Find the frequency density. [1 mark]

  • Cue. 305=6\dfrac{30}{5} = 6.

Q2. Data has mean 5050 and standard deviation 44. Using the two-standard-deviation rule, find the upper outlier boundary. [2 marks]

  • Cue. 50+2(4)=5850 + 2(4) = 58.

Exam-style practice questions

Practice questions written in the style of OCR exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

OCR 20196 marksA data set of 5050 values has x=1500\sum x = 1500 and x2=47000\sum x^2 = 47\,000. Find the mean and the standard deviation. An outlier is defined as more than two standard deviations from the mean; determine whether a value of 5858 is an outlier.
Show worked answer →

Mean xˉ=xn=150050=30\bar{x} = \dfrac{\sum x}{n} = \dfrac{1500}{50} = 30 (M1, A1).

Variance =x2nxˉ2=4700050302=940900=40= \dfrac{\sum x^2}{n} - \bar{x}^2 = \dfrac{47\,000}{50} - 30^2 = 940 - 900 = 40 (M1), so standard deviation σ=406.32\sigma = \sqrt{40} \approx 6.32 (A1).

Two standard deviations is 2(6.32)=12.652(6.32) = 12.65, so the outlier boundary above the mean is 30+12.65=42.6530 + 12.65 = 42.65 (M1).

Since 58>42.6558 > 42.65, the value 5858 is an outlier (A1).

Markers reward the mean, the variance formula, the standard deviation, the boundary, and the comparison.

OCR 20215 marksThe lengths of 8080 leaves are recorded. The cumulative frequency reaches 2020 at 66 cm, 4040 at 7.57.5 cm and 6060 at 99 cm. Estimate the median and the interquartile range, and comment on the skewness.
Show worked answer →

With n=80n = 80, the median is at the 4040th value: from the data, the median 7.5\approx 7.5 cm (M1, A1).

The lower quartile is at the 2020th value, 6\approx 6 cm, and the upper quartile at the 6060th value, 9\approx 9 cm (M1).

Interquartile range =96=3= 9 - 6 = 3 cm (A1).

The median (7.57.5) is exactly midway between the quartiles (66 and 99), so the distribution is roughly symmetrical (A1).

Markers reward reading the quartiles from the cumulative frequency, the interquartile range, and a justified comment on skewness.

Related dot points

Sources & how we know this