Introduction to statistical testing; the sign test. Probability and significance, the use of statistical tables and critical values, type I and type II errors, choosing a statistical test.

Covers AQA 4.7 inferential testing: probability and significance (p less than 0.05), critical values, type I and type II errors, the sign test, and choosing a statistical test.

Generated by Claude Opus 4.811 min answerUpdated 2026-06-02

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this dot point is asking
Probability and significance
The sign test, errors and choosing a test

What this dot point is asking

AQA wants you to explain probability and significance, the sign test, critical values, type I and type II errors, and how to choose a statistical test. The exam skill is to calculate and interpret the sign test, to apply the critical-value rule correctly, and to choose the right test from the design, the aim and the level of measurement.

Probability and significance

Inferential statistics let researchers decide whether a result is large enough to be unlikely to have arisen by chance. The standard significance level in psychology is $p < 0.05$ , meaning there is a 5% or smaller probability that the result occurred by chance alone. If a test shows the result is significant at this level, the researcher rejects the null hypothesis (the prediction of no effect or no relationship) and accepts the alternative hypothesis. A stricter level such as $p < 0.01$ is used where being wrong is costly (for example in drug trials), and a more lenient level such as $p < 0.10$ is rarely used because it raises the risk of a false positive. The decision is made by comparing a calculated value (worked out from the data using the chosen test) against a critical value (read from a statistical table using the number of participants, whether the hypothesis is one- or two-tailed, and the significance level).

The sign test, errors and choosing a test

The sign test is the only test you must be able to calculate. It is used when the study is a test of difference, uses a related (repeated measures or matched pairs) design, and produces nominal data (data in categories, such as "improved" or "got worse"). To calculate it, you record the direction of change for each participant as a plus or a minus, discard any participants who showed no change, count the number of pluses and minuses, and take the calculated value $S$ as the total of the less frequent sign. The value of $N$ is the number of participants after discarding the no-change cases. You then compare $S$ with the critical value from the sign-test table: for the sign test the result is significant only if the calculated value is equal to or less than the critical value (this direction of the rule must be checked carefully for every test). When choosing a test, three questions decide it: is the study looking for a difference or a correlation, is the design related or unrelated, and what is the level of measurement (nominal, ordinal, or interval). The sign test answers difference, related, nominal. Errors arise because the decision is probabilistic: a type I error wrongly rejects a true null (a false positive, more likely with a lenient level), while a type II error wrongly retains a false null (a false negative, more likely with a strict level), and $p < 0.05$ is the conventional balance between the two risks.

Calculating and interpreting a sign test

step 1: Record the sign of each change

For each participant, mark a plus if they improved and a minus if they got worse. Suppose 15 people are tested.

step 2: Discard no-change participants and find N

Remove anyone who showed no difference. If 3 showed no change, then $N = 15 - 3 = 12$ .

step 3: Count the signs and find S

Suppose 10 are pluses and 2 are minuses. The calculated value $S$ is the less frequent sign, so $S = 2$ .

step 4: Compare with the critical value and conclude

Look up the critical value for $N = 12$ at $p < 0.05$ . The result is significant only if $S$ is equal to or less than the critical value; if the critical value is $2$ , then $S = 2$ meets the rule and the result is significant, so the null hypothesis is rejected. Applying the "equal to or less than" rule correctly is the crucial step.

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20194 marksIn a study using the sign test, 12 participants were tested. Two showed no change, 8 improved (plus) and 2 got worse (minus). The calculated value of S is 2. Using the critical value of 1 for

N = 10

p < 0.05

(two-tailed), state whether the result is significant and explain your decision.

Show worked answer →

A 4-mark calculation and reasoning item. Markers reward the correct N, the comparison rule, and the conclusion.

First, exclude the participants who showed no change, so $N = 12 - 2 = 10$ . The calculated value $S$ is the less frequent sign, which is the $2$ minus signs, so $S = 2$ .

For the sign test, the result is significant only if the calculated value is equal to or less than the critical value. Here the calculated value $S = 2$ is greater than the critical value of $1$ , so the result is not significant at $p < 0.05$ . We therefore retain the null hypothesis. A full-mark answer drops the no-change scores to get $N = 10$ , identifies $S$ as the less frequent sign, applies the "calculated must be equal to or less than critical" rule, and states the correct conclusion.

AQA 20213 marksExplain the difference between a type I and a type II error.

Show worked answer →

A 3-mark item. Markers want both errors defined in terms of the null hypothesis.

A type I error is rejecting the null hypothesis when it is actually true: the researcher concludes there is a significant effect when there is not (a false positive). It is more likely if the significance level is too lenient, such as $p < 0.10$ . A type II error is retaining the null hypothesis when it is actually false: the researcher concludes there is no effect when there really is one (a false negative). It is more likely if the significance level is too strict, such as $p < 0.01$ .

A full-mark answer defines both in relation to the null hypothesis and notes that they are a trade-off, which is why $p < 0.05$ is used as a balance.

Related dot points

Sources & how we know this

AQA A-level Psychology (7182) specification — AQA (2015)