EnglandFurther MathsSyllabus dot point

How do you use the chi-squared test for goodness of fit and for independence in a contingency table?

The chi-squared statistic, goodness of fit tests for given distributions, contingency tables and tests for independence, degrees of freedom, and Yates' correction for a two by two table.

A focused answer to the AQA A-Level Further Mathematics chi-squared tests content, covering the chi-squared statistic, goodness of fit tests for given distributions, contingency tables and tests for independence, degrees of freedom, and Yates' correction for a two by two table.

Generated by Claude Opus 4.811 min answerUpdated 2026-06-02

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this dot point is asking
The chi-squared statistic
Goodness of fit
Contingency tables and degrees of freedom

What this dot point is asking

AQA wants you to calculate the chi-squared statistic from observed and expected frequencies, carry out goodness of fit tests for given distributions such as the uniform, binomial or Poisson, test for independence in a contingency table, find the correct degrees of freedom, and apply Yates' correction for a two by two table.

The chi-squared statistic

Goodness of fit

A goodness of fit test asks whether data are consistent with a stated distribution, such as a uniform, binomial or Poisson model. The expected frequencies come from that model: multiply each model probability by the sample size. Any cell whose expected frequency falls below $5$ must be pooled with a neighbour before testing, because the chi-squared approximation is unreliable for small expected counts. The degrees of freedom are the number of cells used (after pooling) minus one, minus a further one for each parameter you estimated from the data. So a fully specified model loses just one degree of freedom, a Poisson model with $\lambda$ estimated from the sample loses two, and a binomial model with $p$ estimated loses two.

Goodness of fit for a fair die

A die is rolled $60$ times giving counts $8, 12, 9, 11, 10, 10$ . Test at the $5\%$ level whether the die is fair.

State the hypotheses and expected frequencies

$H_0$ : the die is fair, so each face is equally likely. $H_1$ : the die is not fair. Under fairness each expected frequency is $\frac{60}{6} = 10$ , all of which exceed $5$ , so no pooling is needed.

Compute the chi-squared statistic

$\chi^2 = \sum \frac{(O-E)^2}{E} = \frac{(8-10)^2 + (12-10)^2 + (9-10)^2 + (11-10)^2 + 0 + 0}{10} = \frac{4 + 4 + 1 + 1}{10} = 1.0$ .

Find the degrees of freedom and critical value

There are $6$ cells and no estimated parameters, so the degrees of freedom are $6 - 1 = 5$ . The critical value at the $5\%$ level is $11.07$ .

Conclude in context

Since $1.0 < 11.07$ , do not reject $H_0$ : the data are consistent with the die being fair.

Contingency tables and degrees of freedom

A contingency table tests whether two classifying factors (such as sex and favourite sport) are independent. Under independence the expected count in a cell is the product of its row and column totals divided by the grand total, which is just the size the cell would be if the two factors had no relationship. You then sum the usual $\frac{(O - E)^2}{E}$ terms over every cell and compare with the critical value at $(\text{rows} - 1)(\text{columns} - 1)$ degrees of freedom. A large statistic means the observed pattern differs too much from what independence predicts, so independence is rejected.

Yates' correction is needed only for the $2 \times 2$ case, where each factor has just two categories and the single degree of freedom makes the chi-squared approximation rougher. Subtracting $0.5$ from each absolute difference before squaring compensates for treating discrete counts with a continuous distribution, and it always reduces the statistic, making the test slightly more conservative.

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20198 marksA researcher records the favourite sport (football, tennis or swimming) of

200

students, split by sex. The observed counts are: male football

48

, tennis

22

, swimming

30

(total

100

); female football

32

, tennis

38

, swimming

30

(total

100

). Test at the

5\%

significance level whether favourite sport is independent of sex. (The critical value of chi-squared with

2

degrees of freedom at

5\%

5.991

Show worked answer →

State the hypotheses. $H_0$ : favourite sport is independent of sex. $H_1$ : favourite sport is not independent of sex.

Expected frequency in a cell $= \frac{\text{row total} \times \text{column total}}{\text{grand total}}$ . Column totals are football $80$ , tennis $60$ , swimming $60$ ; each row total is $100$ , grand total $200$ .

Expected counts: football $\frac{100 \times 80}{200} = 40$ for each sex; tennis $\frac{100 \times 60}{200} = 30$ ; swimming $30$ .

Compute $\chi^2 = \sum \frac{(O - E)^2}{E}$ . Male: $\frac{(48-40)^2}{40} + \frac{(22-30)^2}{30} + \frac{(30-30)^2}{30} = \frac{64}{40} + \frac{64}{30} + 0 = 1.6 + 2.133 = 3.733$ . Female: $\frac{(32-40)^2}{40} + \frac{(38-30)^2}{30} + 0 = 1.6 + 2.133 = 3.733$ .

Total $\chi^2 = 3.733 + 3.733 = 7.467$ . Degrees of freedom $= (2-1)(3-1) = 2$ , critical value $5.991$ .

Since $7.467 > 5.991$ , reject $H_0$ : there is evidence that favourite sport depends on sex.

Markers reward the hypotheses, the expected frequencies, the chi-squared total, the degrees of freedom, and the comparison with conclusion in context.

AQA 20215 marksExplain why expected frequencies below

5

must be pooled before a chi-squared goodness of fit test, and describe how the degrees of freedom are determined when the parameter of a Poisson model has been estimated from the data.

Show worked answer →

The chi-squared statistic only approximately follows the chi-squared distribution, and the approximation breaks down when expected frequencies are small. A cell with $E < 5$ inflates the term $\frac{(O - E)^2}{E}$ and distorts the test, so adjacent cells are combined until every pooled expected frequency is at least $5$ .

Pooling reduces the number of cells, which lowers the degrees of freedom, since degrees of freedom start from the number of cells used in the final calculation.

Degrees of freedom $=$ (number of cells after pooling) $- 1 -$ (number of parameters estimated from the data). The subtraction of $1$ is for the constraint that expected and observed totals match; each estimated parameter (here the Poisson mean) removes one further degree of freedom.

Markers reward the small-expected-frequency reasoning, the pooling rule of $5$ , and the full degrees-of-freedom formula including the parameter subtraction.

Related dot points

Sources & how we know this

AQA A-level Further Mathematics (7367) specification — AQA (2017)