How do you set up a hypothesis test and decide whether to reject the null hypothesis?
Set up null and alternative hypotheses, choose a significance level, compute and use a test statistic and p-value, decide between one- and two-tailed tests, identify the critical region, and distinguish Type I and Type II errors.
A focused answer to the SQA Advanced Higher Statistics hypothesis testing framework: forming null and alternative hypotheses, the significance level, the test statistic, the p-value and critical region, one- and two-tailed tests, and Type I and Type II errors.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
Hypothesis testing is the formal procedure for deciding whether sample data give convincing evidence against a claim. The SQA wants you to state hypotheses correctly, fix a significance level, compute a test statistic and p-value, choose between a one- and two-tailed test, identify the critical region, and reach a conclusion while understanding the two kinds of error you might make. This framework is shared by every named test in the area.
Setting up the hypotheses
Every test begins with a pair of hypotheses about a population parameter.
You never "prove" ; you either reject it (the evidence is strong enough) or fail to reject it (the evidence is insufficient). This asymmetry is deliberate: the test protects unless the data are convincing.
The significance level, test statistic and p-value
The decision rests on comparing a p-value with a pre-chosen significance level.
One- and two-tailed tests and the critical region
Whether you look in one tail or both depends entirely on the alternative hypothesis.
Type I and Type II errors
Because the decision is based on a sample, two mistakes are possible.
Try this
Q1. A researcher tests against at and obtains a p-value of . State the conclusion. [1 mark]
- Cue. Since , do not reject : there is insufficient evidence at the level that the mean exceeds .
Q2. State which error becomes more likely if the significance level is lowered from to , keeping the sample size fixed. [1 mark]
- Cue. A Type II error becomes more likely, because a stricter rejection threshold makes a true effect harder to detect.
Exam-style practice questions
Practice questions written in the style of SQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AH style: set up test3 marksA manufacturer claims the mean lifetime of a bulb is hours. A consumer group suspects it is less. State the null and alternative hypotheses, say whether the test is one- or two-tailed, and explain how the significance level is used.Show worked answer →
Hypotheses: against , where is the true mean lifetime (1 mark).
The test is one-tailed (lower tail) because the suspicion is specifically that the mean is less than , not merely different (1 mark).
The significance level (for example ) is the probability of rejecting when it is in fact true; if the p-value is less than , or the test statistic falls in the critical region, is rejected (1 mark). Markers reward the correctly directed hypotheses, the one-tailed identification and the role of .
AH style: errors3 marksDefine a Type I and a Type II error in hypothesis testing, and state what reducing the significance level does to each.Show worked answer →
A Type I error is rejecting when it is actually true (a false positive); its probability is the significance level (1 mark).
A Type II error is failing to reject when it is actually false (a false negative); its probability is denoted (1 mark).
Reducing lowers the chance of a Type I error but, for a fixed sample size, raises the chance of a Type II error, because a stricter rejection rule makes it harder to detect a real effect (1 mark). Markers reward both definitions and the trade-off between the two error types.
Related dot points
- Carry out the one-sample, two-sample (independent) and paired t-tests for population means, stating the hypotheses, computing the test statistic, using degrees of freedom, and interpreting the result, while checking the normality assumption.
A focused answer to the SQA Advanced Higher Statistics t-test content: the one-sample t-test, the two-sample (independent) t-test and the paired t-test, with the test statistics, the degrees of freedom, the normality assumption and how to interpret the outcome.
- Carry out hypothesis tests for a single population proportion and for the difference between two proportions, using the normal approximation, stating the hypotheses, computing the test statistic and interpreting the result.
A focused answer to the SQA Advanced Higher Statistics proportion test content: testing a single population proportion and the difference between two proportions using the normal approximation, with the test statistics, the pooled estimate for two samples, and how to interpret the outcome.
- Carry out the main non-parametric tests, including the Mann-Whitney U test for two independent samples and the Wilcoxon signed-rank test for paired or single samples, explaining when a non-parametric test is preferred over a t-test.
A focused answer to the SQA Advanced Higher Statistics non-parametric test content: the Mann-Whitney U test for two independent samples and the Wilcoxon signed-rank test for paired data, how each ranks the data, the assumptions they relax, and when to prefer them over a t-test.
- Carry out the chi-squared goodness-of-fit test and the chi-squared test for association in a contingency table, computing expected frequencies, the chi-squared statistic and degrees of freedom, and interpreting the result against the assumptions.
A focused answer to the SQA Advanced Higher Statistics chi-squared content: the goodness-of-fit test and the test for association in a contingency table, computing expected frequencies, the chi-squared statistic and degrees of freedom, the minimum expected frequency rule, and interpreting the outcome.