ScotlandStatisticsSyllabus dot point

How do you summarise and display a data set so its centre, spread and shape are clear?

Calculate and interpret measures of location and dispersion, including the mean, median, quartiles, interquartile range, variance and standard deviation, and use stem-and-leaf plots, boxplots and measures of skewness to describe the shape of a distribution.

A focused answer to the SQA Advanced Higher Statistics exploratory data analysis content: the mean, median and quartiles, the interquartile range, variance and standard deviation, stem-and-leaf plots and boxplots, outlier rules, and how to describe the shape and skewness of a distribution.

Generated by Claude Opus 4.812 min answerUpdated 2026-06-16

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Quick answer

The standard measures of location are the mean $\bar{x}=\dfrac{\sum x}{n}$ , the median (middle value) and the mode. The measures of dispersion are the range, the interquartile range $\text{IQR}=Q_3-Q_1$ , the variance and the standard deviation. For a sample, the variance uses the $n-1$ divisor: $s^2=\dfrac{\sum (x-\bar{x})^2}{n-1}$ and $s=\sqrt{s^2}$ . A boxplot displays the five-number summary (minimum, $Q_1$ , median, $Q_3$ , maximum) and flags outliers beyond $Q_1-1.5\times\text{IQR}$ or $Q_3+1.5\times\text{IQR}$ . Skewness describes asymmetry: mean above median suggests positive (right) skew; mean below median suggests negative (left) skew.

Jump to a section

What this dot point is asking
Measures of location
Measures of dispersion
Stem-and-leaf plots and boxplots
Describing shape and skewness
Try this

What this dot point is asking

Before any modelling, the SQA expects you to explore a data set: to summarise its centre and spread numerically and to read its shape from a plot. This exploratory data analysis (EDA) is the diagnostic step that tells you which model or test is sensible later, so the calculations and the interpretations both carry marks.

Measures of location

Location answers "where is the centre?". The three measures behave differently under extreme values.

Measures of dispersion

Dispersion answers "how spread out is the data?".

The $n-1$ divisor (rather than $n$ ) is used for a sample because it gives an unbiased estimate of the population variance; dividing by $n$ would, on average, underestimate the true spread. The IQR is resistant to outliers, like the median, whereas the standard deviation, like the mean, is sensitive to them.

Stem-and-leaf plots and boxplots

A stem-and-leaf plot keeps the original digits while showing the shape, so it doubles as a sorted list. A boxplot draws the five-number summary: a box from $Q_1$ to $Q_3$ with the median marked, and whiskers to the most extreme non-outlying values.

Describing shape and skewness

The shape of a distribution is symmetric, positively skewed or negatively skewed, and you read it from the relationship between mean and median or from a plot.

Try this

Q1. A sample has $\sum x = 90$ , $\sum x^2 = 1044$ and $n = 9$ . Find the sample variance. [2 marks]

Cue. $s^2=\dfrac{1044-\frac{90^2}{9}}{9-1}=\dfrac{1044-900}{8}=\dfrac{144}{8}=18$ .

Q2. State which measure of spread you would quote for clearly skewed data, and why. [1 mark]

Cue. The IQR, because it is resistant to the extreme values in the tail, whereas the standard deviation is inflated by them.

Exam-style practice questions

Practice questions written in the style of SQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AH style: dispersion4 marksThe sample

3, 7, 7, 9, 14

is given. Calculate the sample mean and the sample standard deviation.

Show worked answer →

Mean: $\bar{x} = \dfrac{3 + 7 + 7 + 9 + 14}{5} = \dfrac{40}{5} = 8$ (1 mark).

Squared deviations from the mean: $(3-8)^2 + (7-8)^2 + (7-8)^2 + (9-8)^2 + (14-8)^2 = 25 + 1 + 1 + 1 + 36 = 64$ (1 mark).

Sample variance uses divisor $n - 1 = 4$ : $s^2 = \dfrac{64}{4} = 16$ (1 mark).

Sample standard deviation: $s = \sqrt{16} = 4$ (1 mark). Markers reward the mean, the sum of squared deviations, the $n-1$ divisor and the final root.

AH style: shape3 marksA data set has mean

52

, median

47

and a long upper tail. State and justify the type of skewness, and say which measure of location is more representative.

Show worked answer →

The distribution is positively (right) skewed, because the mean exceeds the median and there is a long tail of high values pulling the mean upward (1 mark).

The few large values inflate the mean, so the mean is dragged toward the tail while the median stays near the bulk of the data (1 mark).

The median is the more representative measure of location here, because it is resistant to the extreme high values whereas the mean is not (1 mark). Markers reward naming the skew, the mean-median reasoning, and the choice of the median with justification.

Related dot points

Sources & how we know this

SQA Advanced Higher Statistics Course Specification (C803 77) — SQA (2023)
SQA Advanced Higher Statistics Data Booklet — SQA (2019)