How do you design a study and collect data so that the conclusions you draw are valid and free from bias?
Describe the principles of experimental design, distinguish observational studies from designed experiments, identify sources of bias, and explain control, randomisation, replication and blocking when planning data collection.
A focused answer to the SQA Advanced Higher Statistics experimental design content: the difference between observational studies and designed experiments, control, randomisation, replication and blocking, the types of variable, and the common sources of bias that invalidate conclusions.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
Advanced Higher Statistics opens with experimental design because every later technique is only as trustworthy as the data feeding it. The SQA wants you to plan how data is collected, to tell a designed experiment from an observational study, and to recognise the sources of bias that can quietly wreck a conclusion. This is examined in words rather than calculation, so precise vocabulary earns the marks.
Observational studies versus designed experiments
The single most important distinction is who controls the explanatory variable.
Only a designed experiment can support a causal claim, because randomly allocating units to treatments balances out every other factor on average. An observational study can reveal a strong association, but a hidden confounding variable (one linked to both the explanatory and the response variable) may be the real driver, so association never proves causation.
Types of variable
You must classify variables before choosing a model.
The four principles of good design
A well-designed experiment is built from four ideas.
- Control. Hold every nuisance factor constant across treatments so that only the treatment differs. This removes systematic differences that have nothing to do with the question.
- Randomisation. Allocate units to treatments by chance. This spreads the effect of uncontrolled variation evenly, so no treatment is systematically favoured, and it is what justifies the later use of probability models.
- Replication. Apply each treatment to several units. With one unit per treatment you cannot tell a real effect from natural variation; replication lets you estimate that variation.
- Blocking. When units fall into known groups (a moisture gradient, age bands, different machines), form blocks of similar units and randomise treatments within each block. The block-to-block variation is then removed from the comparison, sharpening it.
Sources of bias
Bias is a systematic error that shifts every estimate in the same direction; unlike random error it does not average out by taking a larger sample.
Try this
Q1. A taste test gives every volunteer brand A first and brand B second. Name the bias this risks. [1 mark]
- Cue. Order effects create a measurement bias; the fixed order means any preference may reflect tasting order, not the brands, so the order should be randomised.
Q2. State which design principle lets you estimate natural variation between units. [1 mark]
- Cue. Replication: applying each treatment to several units exposes the unit-to-unit variation that a single unit would hide.
Exam-style practice questions
Practice questions written in the style of SQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AH style: study type3 marksA researcher records the daily screen time and exam mark of 200 pupils, then reports a negative association. Explain why this is an observational study and why it cannot establish that screen time causes lower marks.Show worked answer →
It is an observational study because the researcher only measures the variables as they naturally occur; no treatment is imposed and the pupils are not allocated to groups by the researcher (1 mark).
Because the groups are self-selected, a confounding variable could explain the association: for example, pupils who study less may both spend more time on screens and score lower, so screen time and marks move together without one causing the other (1 mark).
Causation can only be inferred from a designed experiment in which the explanatory variable is controlled and units are randomly allocated to treatments, which removes confounding on average (1 mark). Markers reward naming the study type, identifying a plausible confounder, and the point that only a randomised experiment supports a causal claim.
AH style: design4 marksA trial compares two fertilisers on crop yield across a field that is wetter at one end. Describe how control, randomisation, replication and blocking would improve the design.Show worked answer →
Control: keep all other factors the same for every plot (same seed, planting date, irrigation) so any yield difference is attributable to the fertiliser, not to nuisance factors (1 mark).
Randomisation: allocate fertilisers to plots at random so that uncontrolled variation is spread evenly between the two treatments and systematic bias is avoided (1 mark).
Replication: apply each fertiliser to several plots, not one, so that natural plot-to-plot variation can be estimated and the result is not a fluke of a single plot (1 mark).
Blocking: group plots into blocks of similar wetness (for example a wet block and a dry block) and randomise treatments within each block, so the known moisture gradient is removed from the comparison (1 mark). Markers reward a correct description of each principle in the context of the field.
Related dot points
- Calculate and interpret measures of location and dispersion, including the mean, median, quartiles, interquartile range, variance and standard deviation, and use stem-and-leaf plots, boxplots and measures of skewness to describe the shape of a distribution.
A focused answer to the SQA Advanced Higher Statistics exploratory data analysis content: the mean, median and quartiles, the interquartile range, variance and standard deviation, stem-and-leaf plots and boxplots, outlier rules, and how to describe the shape and skewness of a distribution.
- Describe and apply the main sampling methods, including simple random, systematic and stratified sampling, distinguish a sample from a population and a statistic from a parameter, and explain how a poor sampling method introduces bias.
A focused answer to the SQA Advanced Higher Statistics sampling content: the difference between a population and a sample and a parameter and a statistic, simple random, systematic and stratified sampling, how to carry each out, and how a poor sampling frame or method introduces bias.
- Conduct a statistical investigation that draws together the skills of the course: pose a question, plan and collect data, select and apply appropriate analysis, and communicate justified conclusions with their limitations.
An overview of the statistical investigation in SQA Advanced Higher Statistics: how the skills of design, analysis and inference are combined to pose a question, collect and analyse data, and communicate justified conclusions with their limitations, as examined in the question papers.
- Set up null and alternative hypotheses, choose a significance level, compute and use a test statistic and p-value, decide between one- and two-tailed tests, identify the critical region, and distinguish Type I and Type II errors.
A focused answer to the SQA Advanced Higher Statistics hypothesis testing framework: forming null and alternative hypotheses, the significance level, the test statistic, the p-value and critical region, one- and two-tailed tests, and Type I and Type II errors.