Skip to main content
EnglandMathsSyllabus dot point

How do you identify types of data, design a sample, and recognise bias in data collection?

Identify types of data (qualitative and quantitative, discrete and continuous); understand populations and samples; use random and stratified sampling; and recognise sources of bias.

A focused answer to the OCR GCSE Mathematics statistics content on sampling and data, covering types of data, populations and samples, random and stratified sampling, and recognising bias in data collection.

Generated by Claude Opus 4.810 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Types of data
  3. Populations and samples
  4. Random and stratified sampling
  5. Recognising bias

What this dot point is asking

OCR reference S1 covers types of data, populations and samples, sampling methods (including random and stratified) and recognising bias. Good data collection underpins every statistical conclusion, so this content tests the reasoning behind a study as much as any calculation. It appears on every tier, with stratified sampling and bias being reliable Higher and AO2 question types, where a clear, justified explanation earns the marks.

Types of data

Data comes in distinct types that determine how it is handled.

So shoe size is discrete (it jumps in steps), while foot length is continuous (it can be any value). Knowing the type matters because continuous data is grouped into class intervals and shown on a histogram, whereas discrete data may be shown on a bar chart. The distinction guides the choice of chart and average.

Populations and samples

A sample stands in for a population that is too large to survey fully.

So to study the heights of all Year 1111 students in a country (the population), you might measure a sample of a few hundred. The sample must reflect the population's variety, or any conclusion drawn from it will be misleading. Larger samples generally give more reliable results.

Random and stratified sampling

The sampling method affects how representative the sample is.

So for a stratified sample of 5050 from a population of 500500, the sampling fraction is 110\tfrac{1}{10}, and a stratum of 8080 people contributes 88 to the sample. Stratified sampling ensures small groups are not missed and large groups are not over-represented, which a simple random sample might do by chance.

Recognising bias

A biased sample gives misleading conclusions.

A sample is biased if some members of the population are more likely to be chosen than others, so it is not representative. Surveying only gym-goers about exercise over-represents active people; asking only at one time or place can miss whole groups. To reduce bias, sample randomly from the whole target population and make the sample large enough. Explaining precisely why a sample is unrepresentative, not just calling it "unfair", is the AO2 skill OCR rewards.

Exam-style practice questions

Practice questions written in the style of OCR exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

OCR 20193 marksA school has 12001200 students. A head teacher wants to survey a stratified sample of 6060 students by year group. Year 1111 has 300300 students. How many Year 1111 students should be in the sample? (Higher, Paper 4, calculator.)
Show worked answer →

A stratified sample takes the same fraction from each group.

The sampling fraction is 601200=120\dfrac{60}{1200} = \dfrac{1}{20}.

Apply it to Year 1111: 120×300=15\dfrac{1}{20} \times 300 = 15 students.

Markers award a mark for the sampling fraction, a mark for applying it to Year 1111, and a mark for 1515. Taking 6060 divided by the number of year groups, instead of using the proportion in each group, is the standard error.

OCR 20213 marksA researcher surveys people leaving a gym about how much exercise they do per week. Give one reason why this sample is likely to be biased, and suggest a better method. (Foundation, Paper 1, calculator.)
Show worked answer →

People leaving a gym are more likely to exercise a lot, so the sample is not representative of the whole population: it over-represents active people.

A better method would be to take a random sample of the whole target population, for example randomly selecting from a full list of residents rather than only gym-goers.

Markers give a mark for identifying the bias (gym-goers exercise more), a mark for explaining why it is unrepresentative, and a mark for a sensible improvement. A vague answer such as "it is unfair" without explaining the over-representation does not score fully.

Related dot points

Sources & how we know this