Skip to main content
EnglandFurther MathsSyllabus dot point

How do you use the chi-squared test for goodness of fit and for independence in a contingency table?

The chi-squared statistic, goodness of fit tests for given distributions, contingency tables and tests for independence, degrees of freedom, and Yates' correction for a two by two table.

A focused answer to the AQA A-Level Further Mathematics chi-squared tests content, covering the chi-squared statistic, goodness of fit tests for given distributions, contingency tables and tests for independence, degrees of freedom, and Yates' correction for a two by two table.

Generated by Claude Opus 4.811 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. The chi-squared statistic
  3. Goodness of fit
  4. Contingency tables and degrees of freedom

What this dot point is asking

AQA wants you to calculate the chi-squared statistic from observed and expected frequencies, carry out goodness of fit tests for given distributions such as the uniform, binomial or Poisson, test for independence in a contingency table, find the correct degrees of freedom, and apply Yates' correction for a two by two table.

The chi-squared statistic

Goodness of fit

A goodness of fit test asks whether data are consistent with a stated distribution, such as a uniform, binomial or Poisson model. The expected frequencies come from that model: multiply each model probability by the sample size. Any cell whose expected frequency falls below 55 must be pooled with a neighbour before testing, because the chi-squared approximation is unreliable for small expected counts. The degrees of freedom are the number of cells used (after pooling) minus one, minus a further one for each parameter you estimated from the data. So a fully specified model loses just one degree of freedom, a Poisson model with Ξ»\lambda estimated from the sample loses two, and a binomial model with pp estimated loses two.

Contingency tables and degrees of freedom

A contingency table tests whether two classifying factors (such as sex and favourite sport) are independent. Under independence the expected count in a cell is the product of its row and column totals divided by the grand total, which is just the size the cell would be if the two factors had no relationship. You then sum the usual (Oβˆ’E)2E\frac{(O - E)^2}{E} terms over every cell and compare with the critical value at (rowsβˆ’1)(columnsβˆ’1)(\text{rows} - 1)(\text{columns} - 1) degrees of freedom. A large statistic means the observed pattern differs too much from what independence predicts, so independence is rejected.

Yates' correction is needed only for the 2Γ—22 \times 2 case, where each factor has just two categories and the single degree of freedom makes the chi-squared approximation rougher. Subtracting 0.50.5 from each absolute difference before squaring compensates for treating discrete counts with a continuous distribution, and it always reduces the statistic, making the test slightly more conservative.

Exam-style practice questions

Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AQA 20198 marksA researcher records the favourite sport (football, tennis or swimming) of 200200 students, split by sex. The observed counts are: male football 4848, tennis 2222, swimming 3030 (total 100100); female football 3232, tennis 3838, swimming 3030 (total 100100). Test at the 5%5\% significance level whether favourite sport is independent of sex. (The critical value of chi-squared with 22 degrees of freedom at 5%5\% is 5.9915.991.)
Show worked answer β†’

State the hypotheses. H0H_0: favourite sport is independent of sex. H1H_1: favourite sport is not independent of sex.

Expected frequency in a cell =rowΒ totalΓ—columnΒ totalgrandΒ total= \frac{\text{row total} \times \text{column total}}{\text{grand total}}. Column totals are football 8080, tennis 6060, swimming 6060; each row total is 100100, grand total 200200.

Expected counts: football 100Γ—80200=40\frac{100 \times 80}{200} = 40 for each sex; tennis 100Γ—60200=30\frac{100 \times 60}{200} = 30; swimming 3030.

Compute Ο‡2=βˆ‘(Oβˆ’E)2E\chi^2 = \sum \frac{(O - E)^2}{E}. Male: (48βˆ’40)240+(22βˆ’30)230+(30βˆ’30)230=6440+6430+0=1.6+2.133=3.733\frac{(48-40)^2}{40} + \frac{(22-30)^2}{30} + \frac{(30-30)^2}{30} = \frac{64}{40} + \frac{64}{30} + 0 = 1.6 + 2.133 = 3.733. Female: (32βˆ’40)240+(38βˆ’30)230+0=1.6+2.133=3.733\frac{(32-40)^2}{40} + \frac{(38-30)^2}{30} + 0 = 1.6 + 2.133 = 3.733.

Total Ο‡2=3.733+3.733=7.467\chi^2 = 3.733 + 3.733 = 7.467. Degrees of freedom =(2βˆ’1)(3βˆ’1)=2= (2-1)(3-1) = 2, critical value 5.9915.991.

Since 7.467>5.9917.467 > 5.991, reject H0H_0: there is evidence that favourite sport depends on sex.

Markers reward the hypotheses, the expected frequencies, the chi-squared total, the degrees of freedom, and the comparison with conclusion in context.

AQA 20215 marksExplain why expected frequencies below 55 must be pooled before a chi-squared goodness of fit test, and describe how the degrees of freedom are determined when the parameter of a Poisson model has been estimated from the data.
Show worked answer β†’

The chi-squared statistic only approximately follows the chi-squared distribution, and the approximation breaks down when expected frequencies are small. A cell with E<5E < 5 inflates the term (Oβˆ’E)2E\frac{(O - E)^2}{E} and distorts the test, so adjacent cells are combined until every pooled expected frequency is at least 55.

Pooling reduces the number of cells, which lowers the degrees of freedom, since degrees of freedom start from the number of cells used in the final calculation.

Degrees of freedom == (number of cells after pooling) βˆ’1βˆ’- 1 - (number of parameters estimated from the data). The subtraction of 11 is for the constraint that expected and observed totals match; each estimated parameter (here the Poisson mean) removes one further degree of freedom.

Markers reward the small-expected-frequency reasoning, the pooling rule of 55, and the full degrees-of-freedom formula including the parameter subtraction.

Related dot points

Sources & how we know this