Skip to main content
EnglandPsychologySyllabus dot point

How do psychologists test whether their results are significant?

Introduction to statistical testing; the sign test. Probability and significance, the use of statistical tables and critical values, type I and type II errors, choosing a statistical test.

Covers AQA 4.7 inferential testing: probability and significance (p less than 0.05), critical values, type I and type II errors, the sign test, and choosing a statistical test.

Generated by Claude Opus 4.811 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Probability and significance
  3. The sign test, errors and choosing a test

What this dot point is asking

AQA wants you to explain probability and significance, the sign test, critical values, type I and type II errors, and how to choose a statistical test. The exam skill is to calculate and interpret the sign test, to apply the critical-value rule correctly, and to choose the right test from the design, the aim and the level of measurement.

Probability and significance

Inferential statistics let researchers decide whether a result is large enough to be unlikely to have arisen by chance. The standard significance level in psychology is p<0.05p < 0.05, meaning there is a 5% or smaller probability that the result occurred by chance alone. If a test shows the result is significant at this level, the researcher rejects the null hypothesis (the prediction of no effect or no relationship) and accepts the alternative hypothesis. A stricter level such as p<0.01p < 0.01 is used where being wrong is costly (for example in drug trials), and a more lenient level such as p<0.10p < 0.10 is rarely used because it raises the risk of a false positive. The decision is made by comparing a calculated value (worked out from the data using the chosen test) against a critical value (read from a statistical table using the number of participants, whether the hypothesis is one- or two-tailed, and the significance level).

The sign test, errors and choosing a test

The sign test is the only test you must be able to calculate. It is used when the study is a test of difference, uses a related (repeated measures or matched pairs) design, and produces nominal data (data in categories, such as "improved" or "got worse"). To calculate it, you record the direction of change for each participant as a plus or a minus, discard any participants who showed no change, count the number of pluses and minuses, and take the calculated value SS as the total of the less frequent sign. The value of NN is the number of participants after discarding the no-change cases. You then compare SS with the critical value from the sign-test table: for the sign test the result is significant only if the calculated value is equal to or less than the critical value (this direction of the rule must be checked carefully for every test). When choosing a test, three questions decide it: is the study looking for a difference or a correlation, is the design related or unrelated, and what is the level of measurement (nominal, ordinal, or interval). The sign test answers difference, related, nominal. Errors arise because the decision is probabilistic: a type I error wrongly rejects a true null (a false positive, more likely with a lenient level), while a type II error wrongly retains a false null (a false negative, more likely with a strict level), and p<0.05p < 0.05 is the conventional balance between the two risks.

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20194 marksIn a study using the sign test, 12 participants were tested. Two showed no change, 8 improved (plus) and 2 got worse (minus). The calculated value of S is 2. Using the critical value of 1 for N=10N = 10 at p<0.05p < 0.05 (two-tailed), state whether the result is significant and explain your decision.
Show worked answer →

A 4-mark calculation and reasoning item. Markers reward the correct N, the comparison rule, and the conclusion.

First, exclude the participants who showed no change, so N=122=10N = 12 - 2 = 10. The calculated value SS is the less frequent sign, which is the 22 minus signs, so S=2S = 2.

For the sign test, the result is significant only if the calculated value is equal to or less than the critical value. Here the calculated value S=2S = 2 is greater than the critical value of 11, so the result is not significant at p<0.05p < 0.05. We therefore retain the null hypothesis. A full-mark answer drops the no-change scores to get N=10N = 10, identifies SS as the less frequent sign, applies the "calculated must be equal to or less than critical" rule, and states the correct conclusion.

AQA 20213 marksExplain the difference between a type I and a type II error.
Show worked answer →

A 3-mark item. Markers want both errors defined in terms of the null hypothesis.

A type I error is rejecting the null hypothesis when it is actually true: the researcher concludes there is a significant effect when there is not (a false positive). It is more likely if the significance level is too lenient, such as p<0.10p < 0.10. A type II error is retaining the null hypothesis when it is actually false: the researcher concludes there is no effect when there really is one (a false negative). It is more likely if the significance level is too strict, such as p<0.01p < 0.01.

A full-mark answer defines both in relation to the null hypothesis and notes that they are a trade-off, which is why p<0.05p < 0.05 is used as a balance.

Related dot points

Sources & how we know this