How do you summarise and display a data set so its centre, spread and shape are clear?
Calculate and interpret measures of location and dispersion, including the mean, median, quartiles, interquartile range, variance and standard deviation, and use stem-and-leaf plots, boxplots and measures of skewness to describe the shape of a distribution.
A focused answer to the SQA Advanced Higher Statistics exploratory data analysis content: the mean, median and quartiles, the interquartile range, variance and standard deviation, stem-and-leaf plots and boxplots, outlier rules, and how to describe the shape and skewness of a distribution.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
Before any modelling, the SQA expects you to explore a data set: to summarise its centre and spread numerically and to read its shape from a plot. This exploratory data analysis (EDA) is the diagnostic step that tells you which model or test is sensible later, so the calculations and the interpretations both carry marks.
Measures of location
Location answers "where is the centre?". The three measures behave differently under extreme values.
Measures of dispersion
Dispersion answers "how spread out is the data?".
The divisor (rather than ) is used for a sample because it gives an unbiased estimate of the population variance; dividing by would, on average, underestimate the true spread. The IQR is resistant to outliers, like the median, whereas the standard deviation, like the mean, is sensitive to them.
Stem-and-leaf plots and boxplots
A stem-and-leaf plot keeps the original digits while showing the shape, so it doubles as a sorted list. A boxplot draws the five-number summary: a box from to with the median marked, and whiskers to the most extreme non-outlying values.
Describing shape and skewness
The shape of a distribution is symmetric, positively skewed or negatively skewed, and you read it from the relationship between mean and median or from a plot.
Try this
Q1. A sample has , and . Find the sample variance. [2 marks]
- Cue. .
Q2. State which measure of spread you would quote for clearly skewed data, and why. [1 mark]
- Cue. The IQR, because it is resistant to the extreme values in the tail, whereas the standard deviation is inflated by them.
Exam-style practice questions
Practice questions written in the style of SQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AH style: dispersion4 marksThe sample is given. Calculate the sample mean and the sample standard deviation.Show worked answer β
Mean: (1 mark).
Squared deviations from the mean: (1 mark).
Sample variance uses divisor : (1 mark).
Sample standard deviation: (1 mark). Markers reward the mean, the sum of squared deviations, the divisor and the final root.
AH style: shape3 marksA data set has mean , median and a long upper tail. State and justify the type of skewness, and say which measure of location is more representative.Show worked answer β
The distribution is positively (right) skewed, because the mean exceeds the median and there is a long tail of high values pulling the mean upward (1 mark).
The few large values inflate the mean, so the mean is dragged toward the tail while the median stays near the bulk of the data (1 mark).
The median is the more representative measure of location here, because it is resistant to the extreme high values whereas the mean is not (1 mark). Markers reward naming the skew, the mean-median reasoning, and the choice of the median with justification.
Related dot points
- Describe the principles of experimental design, distinguish observational studies from designed experiments, identify sources of bias, and explain control, randomisation, replication and blocking when planning data collection.
A focused answer to the SQA Advanced Higher Statistics experimental design content: the difference between observational studies and designed experiments, control, randomisation, replication and blocking, the types of variable, and the common sources of bias that invalidate conclusions.
- Apply the addition and multiplication laws of probability, calculate conditional probabilities and use tree diagrams, the total probability rule and Bayes' theorem, and test events for independence and mutual exclusivity.
A focused answer to the SQA Advanced Higher Statistics probability content: the addition and multiplication laws, conditional probability, independence and mutual exclusivity, tree diagrams, the total probability rule and Bayes' theorem for reversing a conditional probability.
- Analyse bivariate data using scatter plots, the sums of squares and products, the product-moment correlation coefficient, and the least-squares regression line, and assess the model with residual plots and the limitations of extrapolation.
A focused answer to the SQA Advanced Higher Statistics bivariate data content: scatter plots, the sums of squares Sxx, Syy and Sxy, the product-moment correlation coefficient, the least-squares regression line, prediction, residual plots and the dangers of extrapolation.
- Work with discrete probability distributions, calculate the expectation and variance of a discrete random variable and apply the laws of expectation and variance, and use the binomial, Poisson and geometric distributions as models.
A focused answer to the SQA Advanced Higher Statistics discrete random variables content: probability distributions, the expectation and variance of a discrete random variable, the laws of expectation and variance, and the binomial, Poisson and geometric distributions with their means and variances.
Sources & how we know this
- SQA Advanced Higher Statistics Course Specification (C803 77) β SQA (2023)
- SQA Advanced Higher Statistics Data Booklet β SQA (2019)