Skip to main content
EnglandGeographySyllabus dot point

How are statistical techniques used to analyse geographical data and test relationships?

The statistical techniques used in geography (measures of central tendency and dispersion, percentage change, Spearman's rank correlation and significance testing) and how to calculate, apply and critically interpret them in geographical contexts.

An OCR A-Level Geography answer to the statistical skills embedded across all components, covering measures of central tendency (mean, median, mode) and dispersion (range, interquartile range, standard deviation), percentage change, Spearman's rank correlation and significance testing, with worked calculations in KaTeX and guidance on critical interpretation for AO3.

Generated by Claude Opus 4.812 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. The answer
  3. Examples in context
  4. Try this

What this dot point is asking

OCR embeds statistical skills across all components and the Independent Investigation. You need to calculate, apply and critically interpret measures of central tendency and dispersion, percentage change, Spearman's rank correlation and significance testing, in geographical contexts. These are tested as AO3 in Papers 01 and 02 and are central to analysing fieldwork data.

The answer

Central tendency and dispersion

The choice of measure matters geographically. The mean uses all values but is distorted by outliers (one huge value pulls it up), so for skewed data, incomes, settlement sizes, the median is often more representative. The interquartile range, IQR=Q3Q1IQR = Q_3 - Q_1, measures the spread of the middle 50%50\% and, like the median, resists outliers. The standard deviation, σ=(xxˉ)2n\sigma = \sqrt{\dfrac{\sum (x - \bar{x})^2}{n}}, measures the average distance of values from the mean: a small σ\sigma means values cluster tightly (a well-sorted beach, a uniform region), a large σ\sigma means they are widely spread. Reporting both a central value and a measure of spread gives a fuller picture than either alone.

Percentage change

Percentage change standardises change so that places or times of different sizes can be compared. It is calculated as

percentage change=new valueoriginal valueoriginal value×100\text{percentage change} = \frac{\text{new value} - \text{original value}}{\text{original value}} \times 100

A positive result is an increase, a negative result a decrease. The strength of the measure is comparability: a rise of 5000050\,000 people means very different things in a town of 100000100\,000 (+50%+50\%) and a city of 55 million (+1%+1\%). The caution is that percentage change can mislead from a small base (a doubling from 22 to 44 is +100%+100\% but trivial in absolute terms), so it should be read alongside absolute figures.

Spearman's rank correlation

Spearman's rank correlation coefficient (rsr_s) tests the strength and direction of the relationship between two variables by ranking them. It is calculated as

rs=16d2n3nr_s = 1 - \frac{6 \sum d^2}{n^3 - n}

where dd is the difference between the ranks of each paired observation and nn is the number of pairs. The result lies between +1+1 (perfect positive correlation, both rise together), 00 (no correlation) and 1-1 (perfect negative correlation, one rises as the other falls). It is widely used in geography (for example testing whether pebble size decreases downstream, or whether deprivation correlates with distance from a city centre) because it works on ranked data and does not assume a normal distribution.

Significance testing

A calculated rsr_s must be checked for statistical significance to judge whether the relationship is real or could be due to chance. The calculated value is compared with a critical value from a significance table for the relevant sample size (nn) and chosen significance level, commonly 0.050.05 (the 95%95\% confidence level, meaning a 5%5\% chance the result is random). If the calculated rsr_s exceeds the critical value, the result is significant and the null hypothesis (that there is no relationship) is rejected; if it does not, the relationship cannot be distinguished from chance. Significance depends strongly on sample size: the same coefficient is significant in a large sample but not a small one, which is why fieldwork needs an adequate, justified sample.

Examples in context

Example 1. Testing the Bradshaw model with Spearman's rank. A river study might rank measured values of, say, velocity against distance downstream at several sites and compute rsr_s to test the predicted positive relationship. A strong positive coefficient, checked against the critical value for the sample size, supports the model, while a non-significant result prompts reflection on sample size or confounding factors. This shows statistical testing applied to a physical-geography hypothesis and is a classic Independent Investigation technique.

Example 2. Comparing deprivation with standard deviation. Comparing two cities' neighbourhood deprivation scores, the mean might be similar, but a larger standard deviation in one city reveals greater inequality (deprivation more polarised between rich and poor areas), a difference the mean alone hides. This demonstrates why dispersion matters as much as central tendency in human geography, and links to the Changing Spaces; Making Places analysis of uneven places.

Try this

Q1. State the formula for percentage change. [2 marks]

  • Cue. new valueoriginal valueoriginal value×100\frac{\text{new value} - \text{original value}}{\text{original value}} \times 100.

Q2. A Spearman's rank coefficient is 0.85-0.85 and exceeds the critical value at the 0.050.05 level. Interpret this result. [3 marks]

  • Cue. A strong negative correlation (as one variable rises, the other falls) that is statistically significant at the 95%95\% confidence level, so the null hypothesis of no relationship is rejected; note that correlation does not prove causation.

Exam-style practice questions

Practice questions written in the style of OCR exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

OCR H481/01 (style)4 marksA data set of ten beach pebble sizes (mm) is provided. Calculate the median and the interquartile range, showing your working.
Show worked answer →

A low-tariff AO3 calculation. Reward the correct method. To find the median, rank the values and take the middle: for n=10n = 10 the median is the mean of the 5th and 6th ranked values. The interquartile range (IQR) is the upper quartile minus the lower quartile, IQR=Q3Q1IQR = Q_3 - Q_1, where Q1Q_1 is the value at rank n+14\frac{n+1}{4} and Q3Q_3 at rank 3(n+1)4\frac{3(n+1)}{4} (interpolating between ranks as needed).
The strongest answers show clear working and then interpret: the median resists distortion by extreme pebbles (unlike the mean), and the IQR measures the spread of the middle 50%50\%, so a small IQR indicates a well-sorted beach. Reward correct ranking, the right quartile positions, and a sentence of geographical interpretation rather than a bare number.

OCR H481/02 (style)6 marksA Spearman's rank correlation coefficient of +0.78 is calculated between two variables. Explain what this shows and how its significance would be tested.
Show worked answer →

A medium-tariff AO3 question on correlation and significance. Reward candidates who interpret the coefficient: Spearman's rank rsr_s ranges from +1+1 (perfect positive) through 00 (none) to 1-1 (perfect negative), so +0.78+0.78 indicates a strong positive relationship (as one variable rises, so does the other). For significance, the calculated rsr_s is compared with a critical value from a table at a chosen significance level (commonly 0.050.05, the 95%95\% confidence level) for the sample size (nn); if rsr_s exceeds the critical value, the relationship is statistically significant and unlikely to be due to chance, so the null hypothesis (no relationship) is rejected.
The strongest answers stress that correlation does not prove causation, that a third factor may drive both variables, and that significance depends on sample size. Reward correct interpretation, the comparison-with-critical-value method, and the causation caveat.

Related dot points

Sources & how we know this