What are the key classic studies, and how are studies evaluated and reviewed?
Key studies and classic research: the named classic studies across topics, how to evaluate studies methodologically and ethically, and reviewing and synthesising research evidence.
An Edexcel A-Level Psychology answer to the key studies and classic research, covering the named classic studies (Milgram, Sherif, Baddeley, Watson and Rayner, Raine, Rosenhan), how to evaluate methodology and ethics with GRAVE, and how to review and synthesise evidence for Paper 3.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
Edexcel sets a named classic study for each foundation topic, and Paper 3 tests reviewing, analysing and evaluating studies. You must know each classic study and be able to assess its methodology and ethics and synthesise evidence across several studies.
The answer
The named classic studies
Evaluating methodology and ethics
Reviewing and synthesising evidence
Paper 3 asks you to review research: to compare studies, weigh evidence for and against an explanation, and judge how method affects the conclusions that can be drawn. Strong answers synthesise several studies rather than describing one, and they connect findings to issues and debates such as ethics, reductionism, determinism and generalisability. Reviewing is itself a method with its own reliability (do reviewers agree on quality ratings?) and bias (publication bias toward significant results).
Evaluation (GRAVE)
- Generalisability. Many classic studies used biased samples (Milgram's 40 American men, Sherif's 22 American boys), limiting how far findings apply across gender and culture.
- Reliability. Standardised classic studies (Milgram, Baddeley) replicate well, giving them strong reliability and making synthesis across replications possible.
- Application. Classic studies underpin real applications: Loftus informs the cognitive interview, Rosenhan informs cautious diagnosis, and Bandura informs media-effects policy.
- Validity. Lab studies (Milgram, Baddeley) can lack ecological validity; field studies (Sherif) gain it but lose control, so validity must be judged per study.
- Ethics. Several classic studies breach modern principles (Little Albert's distress, Milgram's deception and harm), which must be weighed against their scientific value.
Examples in context
Example 1. Synthesising obedience evidence across studies. Rather than describing Milgram alone, a strong review compares Milgram (65 per cent gave the maximum shock), his telephone variation (obedience fell to about 21 per cent) and cross-cultural replications (broadly similar high rates). Synthesising these shows the effect is reliable and that situational factors (the proximity and legitimacy of authority) systematically change obedience, supporting agency theory. This synthesis, with a judgement about the weight of evidence, is what Paper 3 rewards, and it also lets you fold in the ethics debate (deception and harm) and the generalisability debate (androcentric samples).
Example 2. Evaluating Little Albert methodologically and ethically. Watson and Rayner's study is a single-participant case study, which gives rich detail but very poor generalisability (one infant). The lack of a control and of standardised testing weakens internal validity, and the fear response was not reliably reconditioned out before Albert left. Ethically, the study caused distress to an infant who could not consent and was never deconditioned, breaching protection from harm. Yet it provided early evidence that emotional responses can be classically conditioned, influencing later treatments. This balanced methodological and ethical evaluation, with a judgement, models the Paper 3 skill.
Try this
Q1. Evaluate the ethics of Milgram's obedience study. [4 marks]
- Cue. It used deception and caused psychological distress, breaching protection from harm and pressuring the right to withdraw, but Milgram debriefed participants, most were glad to take part, and the findings were valuable.
Q2. Explain why a biased sample limits a study's conclusions. [2 marks]
- Cue. A biased sample is not representative of the wider population, so the findings cannot be confidently generalised beyond the participants studied.
Q3. Assess how reviewing and synthesising studies improves the evaluation of psychological evidence. [8 marks]
- Cue. Argue that synthesis weighs converging and conflicting evidence, links method to conclusions and reveals reliability across replications, but note review reliability (inter-rater agreement) and bias (publication bias) as limits.
Exam-style practice questions
Practice questions written in the style of Pearson Edexcel exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Edexcel 20198 marksEvaluate Milgram's obedience study in terms of its methodology and ethics. [8 marks]Show worked answer →
An evaluate question: marks for methodological and ethical assessment, not retelling the procedure (AO3 dominant).
Methodology. Strengths: a highly standardised, controlled lab procedure (same prods, same set-up) gives high reliability and replicability, and Milgram's later variations isolated situational variables. Weaknesses: low ecological validity (an artificial task), possible demand characteristics, and an androcentric, culturally narrow sample of 40 American men, limiting generalisability.
Ethics. The study used deception (participants thought the shocks were real), risked psychological harm (visible distress), and the prods pressured participants in a way that undermined the right to withdraw. In defence, participants were debriefed, most said they were glad to have taken part, and the findings had great value for understanding obedience.
Markers reward balanced methodological points (reliability and validity and generalisability) and ethical points (deception, harm, withdrawal versus debriefing and value), with a judgement.
Edexcel 20216 marksTwo researchers reviewing the same studies agreed on of quality ratings. Calculate the percentage agreement and explain what this tells you about the reliability of the review and one limit of using percentage agreement. [6 marks]Show worked answer →
A quantitative item: show the calculation (AO2) then interpret (AO3).
Percentage agreement (a measure of inter-rater reliability): .
Interpretation: inter-rater reliability is the extent to which two independent reviewers reach the same judgement. A agreement is high, suggesting the review applied its quality criteria consistently and is reliable, so the synthesis is trustworthy.
One limit: percentage agreement does not correct for agreement that happens by chance, so it can overstate reliability. A statistic such as Cohen's kappa, which adjusts for chance agreement, gives a more accurate figure and is preferred when judging the reliability of a review.
Markers reward the correct percentage (), a definition of inter-rater reliability, and the point that percentage agreement ignores chance agreement (kappa is better).
Related dot points
- Issues and debates: nature-nurture, free will and determinism, reductionism and holism, ethics and social control, gender and cultural bias, and the use of psychology in the real world.
An Edexcel A-Level Psychology answer to issues and debates, covering nature-nurture, free will and determinism, reductionism and holism, ethics and social control, gender and cultural bias, GRAVE evaluation and the practical and social implications of psychological research.
- Criminological or health psychology: explanations of the chosen application (offending or health behaviour), biological and social factors, treatments or interventions, and the named application studies.
An Edexcel A-Level Psychology answer to the application option, covering criminological psychology (biological and social explanations of offending, eyewitness testimony, the cognitive interview and treatments) and health psychology (theories of addiction and interventions), with GRAVE evaluation and named application studies.
- Social psychology: obedience (Milgram and agency theory), prejudice (social identity theory and realistic conflict theory), individual and situational explanations, and key social studies.
An Edexcel A-Level Psychology answer to social psychology, covering Milgram's obedience study and agency theory, Adorno's authoritarian personality, social identity theory and realistic conflict theory of prejudice, GRAVE evaluation and the named studies Sherif and Milgram.
- Research methods: experiments and other methods, sampling, experimental design, variables and hypotheses, descriptive and inferential statistics, and the chosen inferential tests.
An Edexcel A-Level Psychology answer to research methods, covering experimental and non-experimental methods, sampling, experimental design, variables and hypotheses, descriptive statistics, levels of measurement, GRAVE evaluation and the Edexcel inferential tests including Mann-Whitney, Wilcoxon, Spearman and chi-square.
- Cognitive psychology: the multi-store model, the working memory model, the reconstructive nature of memory, theories of forgetting, and key cognitive studies.
An Edexcel A-Level Psychology answer to cognitive psychology, covering the multi-store model, the working memory model, reconstructive memory and Bartlett, theories of forgetting including interference and retrieval failure, and the named cognitive studies with GRAVE evaluation.
Sources & how we know this
- Pearson Edexcel A-Level Psychology (9PS0) specification — Pearson Edexcel (2015)