Skip to main content
ScotlandStatisticsSyllabus dot point

How do you summarise and display a data set so its centre, spread and shape are clear?

Calculate and interpret measures of location and dispersion, including the mean, median, quartiles, interquartile range, variance and standard deviation, and use stem-and-leaf plots, boxplots and measures of skewness to describe the shape of a distribution.

A focused answer to the SQA Advanced Higher Statistics exploratory data analysis content: the mean, median and quartiles, the interquartile range, variance and standard deviation, stem-and-leaf plots and boxplots, outlier rules, and how to describe the shape and skewness of a distribution.

Generated by Claude Opus 4.812 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Measures of location
  3. Measures of dispersion
  4. Stem-and-leaf plots and boxplots
  5. Describing shape and skewness
  6. Try this

What this dot point is asking

Before any modelling, the SQA expects you to explore a data set: to summarise its centre and spread numerically and to read its shape from a plot. This exploratory data analysis (EDA) is the diagnostic step that tells you which model or test is sensible later, so the calculations and the interpretations both carry marks.

Measures of location

Location answers "where is the centre?". The three measures behave differently under extreme values.

Measures of dispersion

Dispersion answers "how spread out is the data?".

The nβˆ’1n-1 divisor (rather than nn) is used for a sample because it gives an unbiased estimate of the population variance; dividing by nn would, on average, underestimate the true spread. The IQR is resistant to outliers, like the median, whereas the standard deviation, like the mean, is sensitive to them.

Stem-and-leaf plots and boxplots

A stem-and-leaf plot keeps the original digits while showing the shape, so it doubles as a sorted list. A boxplot draws the five-number summary: a box from Q1Q_1 to Q3Q_3 with the median marked, and whiskers to the most extreme non-outlying values.

Describing shape and skewness

The shape of a distribution is symmetric, positively skewed or negatively skewed, and you read it from the relationship between mean and median or from a plot.

Try this

Q1. A sample has βˆ‘x=90\sum x = 90, βˆ‘x2=1044\sum x^2 = 1044 and n=9n = 9. Find the sample variance. [2 marks]

  • Cue. s2=1044βˆ’90299βˆ’1=1044βˆ’9008=1448=18s^2=\dfrac{1044-\frac{90^2}{9}}{9-1}=\dfrac{1044-900}{8}=\dfrac{144}{8}=18.

Q2. State which measure of spread you would quote for clearly skewed data, and why. [1 mark]

  • Cue. The IQR, because it is resistant to the extreme values in the tail, whereas the standard deviation is inflated by them.

Exam-style practice questions

Practice questions written in the style of SQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AH style: dispersion4 marksThe sample 3,7,7,9,143, 7, 7, 9, 14 is given. Calculate the sample mean and the sample standard deviation.
Show worked answer β†’

Mean: xˉ=3+7+7+9+145=405=8\bar{x} = \dfrac{3 + 7 + 7 + 9 + 14}{5} = \dfrac{40}{5} = 8 (1 mark).

Squared deviations from the mean: (3βˆ’8)2+(7βˆ’8)2+(7βˆ’8)2+(9βˆ’8)2+(14βˆ’8)2=25+1+1+1+36=64(3-8)^2 + (7-8)^2 + (7-8)^2 + (9-8)^2 + (14-8)^2 = 25 + 1 + 1 + 1 + 36 = 64 (1 mark).

Sample variance uses divisor nβˆ’1=4n - 1 = 4: s2=644=16s^2 = \dfrac{64}{4} = 16 (1 mark).

Sample standard deviation: s=16=4s = \sqrt{16} = 4 (1 mark). Markers reward the mean, the sum of squared deviations, the nβˆ’1n-1 divisor and the final root.

AH style: shape3 marksA data set has mean 5252, median 4747 and a long upper tail. State and justify the type of skewness, and say which measure of location is more representative.
Show worked answer β†’

The distribution is positively (right) skewed, because the mean exceeds the median and there is a long tail of high values pulling the mean upward (1 mark).

The few large values inflate the mean, so the mean is dragged toward the tail while the median stays near the bulk of the data (1 mark).

The median is the more representative measure of location here, because it is resistant to the extreme high values whereas the mean is not (1 mark). Markers reward naming the skew, the mean-median reasoning, and the choice of the median with justification.

Related dot points

Sources & how we know this