How do you summarise, display and compare data, and identify outliers and correlation?
Measures of central tendency and spread, histograms, box plots and cumulative frequency, identifying outliers, comparing distributions, and correlation and the regression line.
A focused answer to the OCR A-Level Mathematics A data presentation content, covering the mean, median and mode, range, interquartile range, variance and standard deviation, histograms, box plots and cumulative frequency, identifying outliers, comparing distributions, and interpreting correlation and the regression line.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
OCR wants you to calculate and interpret measures of central tendency (mean, median, mode) and spread (range, interquartile range, variance, standard deviation), draw and read histograms (with frequency density), box plots and cumulative frequency graphs, identify outliers by a stated rule, compare two distributions, and interpret correlation and the least-squares regression line, including the dangers of extrapolation.
The answer
Averages and spread
The mean uses every value but is sensitive to outliers; the median is the middle value and is resistant to outliers; the mode is the most common value. The standard deviation measures typical distance from the mean, while the interquartile range measures the spread of the middle half and ignores extremes.
Histograms and frequency density
A histogram shows grouped continuous data with area proportional to frequency, so the vertical axis is frequency density, not frequency.
Outliers
An outlier is a value far from the rest. Two common rules are: more than beyond a quartile, or more than two standard deviations from the mean. The question states which rule to use.
Box plots and skewness
A box plot shows the minimum, the three quartiles and the maximum. Comparing the median's position within the box describes skewness: a median nearer indicates positive skew, nearer indicates negative skew.
Examples in context
Comparing two distributions
To compare data sets, always compare a measure of location and a measure of spread, in context. For example "the median mark of class A (62) is higher than class B (55), and class A's smaller interquartile range (10 versus 18) shows its marks were more consistent."
Correlation and regression
Correlation measures how closely two variables follow a linear relationship; the product moment correlation coefficient runs from to . The regression line of on is the best-fit line for predicting from . Use it only within the range of the data: predicting outside that range (extrapolation) is unreliable, and correlation alone does not prove that one variable causes the other.
Try this
Q1. A class width is and the frequency is . Find the frequency density. [1 mark]
- Cue. .
Q2. Data has mean and standard deviation . Using the two-standard-deviation rule, find the upper outlier boundary. [2 marks]
- Cue. .
Exam-style practice questions
Practice questions written in the style of OCR exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
OCR 20196 marksA data set of values has and . Find the mean and the standard deviation. An outlier is defined as more than two standard deviations from the mean; determine whether a value of is an outlier.Show worked answer →
Mean (M1, A1).
Variance (M1), so standard deviation (A1).
Two standard deviations is , so the outlier boundary above the mean is (M1).
Since , the value is an outlier (A1).
Markers reward the mean, the variance formula, the standard deviation, the boundary, and the comparison.
OCR 20215 marksThe lengths of leaves are recorded. The cumulative frequency reaches at cm, at cm and at cm. Estimate the median and the interquartile range, and comment on the skewness.Show worked answer →
With , the median is at the th value: from the data, the median cm (M1, A1).
The lower quartile is at the th value, cm, and the upper quartile at the th value, cm (M1).
Interquartile range cm (A1).
The median () is exactly midway between the quartiles ( and ), so the distribution is roughly symmetrical (A1).
Markers reward reading the quartiles from the cumulative frequency, the interquartile range, and a justified comment on skewness.
Related dot points
- Populations and samples, the census, sampling methods (simple random, systematic, stratified, quota and opportunity), their advantages and disadvantages, and the role of the large data set.
A focused answer to the OCR A-Level Mathematics A statistical sampling content, covering populations and samples, the census, simple random, systematic, stratified, quota and opportunity sampling, the advantages and disadvantages of each, sources of bias, and how the pre-release large data set is used.
- Probability of events, mutually exclusive and independent events, Venn diagrams, tree diagrams and two-way tables, the addition and multiplication laws, and conditional probability.
A focused answer to the OCR A-Level Mathematics A probability content, covering the probability of events, mutually exclusive and independent events, Venn diagrams, tree diagrams and two-way tables, the addition and multiplication laws, and conditional probability with the conditional formula.
- Discrete random variables and probability distributions, the binomial distribution as a model and its probabilities, the Normal distribution, standardising, the inverse Normal, and the Normal approximation to the binomial.
A focused answer to the OCR A-Level Mathematics A statistical distributions content, covering discrete random variables, the binomial distribution and its conditions and probabilities, the Normal distribution as a continuous model, standardising to the standard Normal, the inverse Normal for unknown parameters, and the Normal approximation to the binomial.
- Null and alternative hypotheses, one- and two-tailed tests, significance levels and critical regions, hypothesis tests for a binomial proportion, for a Normal mean, and for a correlation coefficient.
A focused answer to the OCR A-Level Mathematics A hypothesis testing content, covering null and alternative hypotheses, one- and two-tailed tests, significance levels and critical regions, tests for a binomial proportion, tests for the mean of a Normal distribution, and tests for a product moment correlation coefficient.
- Sketching curves including polynomials, the reciprocal function and its variations, intersections of graphs, and the transformations y equals f(x) plus a, f(x plus a), f(ax) and af(x).
A focused answer to the OCR A-Level Mathematics A graphs and transformations content, covering sketching polynomial and reciprocal curves, asymptotes, points of intersection, and the four standard graph transformations of translation, stretch and reflection.
- Methods of proof: proof by deduction, proof by exhaustion, disproof by counter-example, and proof by contradiction, including the irrationality of root 2 and the infinitude of primes.
A focused answer to the OCR A-Level Mathematics A proof content, covering proof by deduction, proof by exhaustion, disproof by counter-example and proof by contradiction, with the standard results that root 2 is irrational and that there are infinitely many primes.
Sources & how we know this
- OCR A Level Mathematics A (H240) specification — OCR (2017)