How do psychologists test whether their results are significant?
Introduction to statistical testing; the sign test. Probability and significance, the use of statistical tables and critical values, type I and type II errors, choosing a statistical test.
Covers AQA 4.7 inferential testing: probability and significance (p less than 0.05), critical values, type I and type II errors, the sign test, and choosing a statistical test.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
AQA wants you to explain probability and significance, the sign test, critical values, type I and type II errors, and how to choose a statistical test. The exam skill is to calculate and interpret the sign test, to apply the critical-value rule correctly, and to choose the right test from the design, the aim and the level of measurement.
Probability and significance
Inferential statistics let researchers decide whether a result is large enough to be unlikely to have arisen by chance. The standard significance level in psychology is , meaning there is a 5% or smaller probability that the result occurred by chance alone. If a test shows the result is significant at this level, the researcher rejects the null hypothesis (the prediction of no effect or no relationship) and accepts the alternative hypothesis. A stricter level such as is used where being wrong is costly (for example in drug trials), and a more lenient level such as is rarely used because it raises the risk of a false positive. The decision is made by comparing a calculated value (worked out from the data using the chosen test) against a critical value (read from a statistical table using the number of participants, whether the hypothesis is one- or two-tailed, and the significance level).
The sign test, errors and choosing a test
The sign test is the only test you must be able to calculate. It is used when the study is a test of difference, uses a related (repeated measures or matched pairs) design, and produces nominal data (data in categories, such as "improved" or "got worse"). To calculate it, you record the direction of change for each participant as a plus or a minus, discard any participants who showed no change, count the number of pluses and minuses, and take the calculated value as the total of the less frequent sign. The value of is the number of participants after discarding the no-change cases. You then compare with the critical value from the sign-test table: for the sign test the result is significant only if the calculated value is equal to or less than the critical value (this direction of the rule must be checked carefully for every test). When choosing a test, three questions decide it: is the study looking for a difference or a correlation, is the design related or unrelated, and what is the level of measurement (nominal, ordinal, or interval). The sign test answers difference, related, nominal. Errors arise because the decision is probabilistic: a type I error wrongly rejects a true null (a false positive, more likely with a lenient level), while a type II error wrongly retains a false null (a false negative, more likely with a strict level), and is the conventional balance between the two risks.
Exam-style practice questions
Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AQA 20194 marksIn a study using the sign test, 12 participants were tested. Two showed no change, 8 improved (plus) and 2 got worse (minus). The calculated value of S is 2. Using the critical value of 1 for at (two-tailed), state whether the result is significant and explain your decision.Show worked answer →
A 4-mark calculation and reasoning item. Markers reward the correct N, the comparison rule, and the conclusion.
First, exclude the participants who showed no change, so . The calculated value is the less frequent sign, which is the minus signs, so .
For the sign test, the result is significant only if the calculated value is equal to or less than the critical value. Here the calculated value is greater than the critical value of , so the result is not significant at . We therefore retain the null hypothesis. A full-mark answer drops the no-change scores to get , identifies as the less frequent sign, applies the "calculated must be equal to or less than critical" rule, and states the correct conclusion.
AQA 20213 marksExplain the difference between a type I and a type II error.Show worked answer →
A 3-mark item. Markers want both errors defined in terms of the null hypothesis.
A type I error is rejecting the null hypothesis when it is actually true: the researcher concludes there is a significant effect when there is not (a false positive). It is more likely if the significance level is too lenient, such as . A type II error is retaining the null hypothesis when it is actually false: the researcher concludes there is no effect when there really is one (a false negative). It is more likely if the significance level is too strict, such as .
A full-mark answer defines both in relation to the null hypothesis and notes that they are a trade-off, which is why is used as a balance.
Related dot points
- Experimental method: laboratory, field, natural and quasi-experiments. Aims, hypotheses, independent and dependent variables, operationalisation, extraneous and confounding variables.
Covers AQA 4.7 experimental methods: laboratory, field, natural and quasi-experiments, aims and hypotheses, IVs and DVs, operationalisation, and extraneous and confounding variables.
- Observational techniques: naturalistic and controlled, covert and overt, participant and non-participant. Observational design: behavioural categories, event and time sampling.
Covers AQA 4.7 observational techniques: naturalistic and controlled, covert and overt, participant and non-participant observation, and observational design (behavioural categories, event and time sampling).
- Self-report techniques: questionnaires; interviews, structured and unstructured. The design of questionnaires, including the use of open and closed questions.
Covers AQA 4.7 self-report techniques: questionnaires, structured and unstructured interviews, open and closed questions, and the design of effective questionnaires.
- Correlations: analysis of the relationship between co-variables. The difference between correlations and experiments. Positive, negative and zero correlations.
Covers AQA 4.7 correlations: co-variables, positive, negative and zero correlations, scattergrams, the difference from experiments, and why correlation does not show causation.
- Experimental designs: independent groups, repeated measures and matched pairs. Design of investigations, including control of variables, randomisation and counterbalancing.
Covers AQA 4.7 experimental design: independent groups, repeated measures and matched pairs, with their strengths and limitations, and the use of randomisation and counterbalancing.
- Sampling: the difference between population and sample; sampling techniques including random, systematic, stratified, opportunity and volunteer; implications of sampling techniques, including bias and generalisation.
Covers AQA 4.7 sampling: population versus sample, random, systematic, stratified, opportunity and volunteer sampling, and the implications of bias and generalisation.
Sources & how we know this
- AQA A-level Psychology (7182) specification — AQA (2015)