How do you use the chi-squared test for goodness of fit and for independence in a contingency table?
The chi-squared statistic, goodness of fit tests for given distributions, contingency tables and tests for independence, degrees of freedom, and Yates' correction for a two by two table.
A focused answer to the AQA A-Level Further Mathematics chi-squared tests content, covering the chi-squared statistic, goodness of fit tests for given distributions, contingency tables and tests for independence, degrees of freedom, and Yates' correction for a two by two table.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
AQA wants you to calculate the chi-squared statistic from observed and expected frequencies, carry out goodness of fit tests for given distributions such as the uniform, binomial or Poisson, test for independence in a contingency table, find the correct degrees of freedom, and apply Yates' correction for a two by two table.
The chi-squared statistic
Goodness of fit
A goodness of fit test asks whether data are consistent with a stated distribution, such as a uniform, binomial or Poisson model. The expected frequencies come from that model: multiply each model probability by the sample size. Any cell whose expected frequency falls below must be pooled with a neighbour before testing, because the chi-squared approximation is unreliable for small expected counts. The degrees of freedom are the number of cells used (after pooling) minus one, minus a further one for each parameter you estimated from the data. So a fully specified model loses just one degree of freedom, a Poisson model with estimated from the sample loses two, and a binomial model with estimated loses two.
Contingency tables and degrees of freedom
A contingency table tests whether two classifying factors (such as sex and favourite sport) are independent. Under independence the expected count in a cell is the product of its row and column totals divided by the grand total, which is just the size the cell would be if the two factors had no relationship. You then sum the usual terms over every cell and compare with the critical value at degrees of freedom. A large statistic means the observed pattern differs too much from what independence predicts, so independence is rejected.
Yates' correction is needed only for the case, where each factor has just two categories and the single degree of freedom makes the chi-squared approximation rougher. Subtracting from each absolute difference before squaring compensates for treating discrete counts with a continuous distribution, and it always reduces the statistic, making the test slightly more conservative.
Exam-style practice questions
Practice questions written in the style of AQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AQA 20198 marksA researcher records the favourite sport (football, tennis or swimming) of students, split by sex. The observed counts are: male football , tennis , swimming (total ); female football , tennis , swimming (total ). Test at the significance level whether favourite sport is independent of sex. (The critical value of chi-squared with degrees of freedom at is .)Show worked answer β
State the hypotheses. : favourite sport is independent of sex. : favourite sport is not independent of sex.
Expected frequency in a cell . Column totals are football , tennis , swimming ; each row total is , grand total .
Expected counts: football for each sex; tennis ; swimming .
Compute . Male: . Female: .
Total . Degrees of freedom , critical value .
Since , reject : there is evidence that favourite sport depends on sex.
Markers reward the hypotheses, the expected frequencies, the chi-squared total, the degrees of freedom, and the comparison with conclusion in context.
AQA 20215 marksExplain why expected frequencies below must be pooled before a chi-squared goodness of fit test, and describe how the degrees of freedom are determined when the parameter of a Poisson model has been estimated from the data.Show worked answer β
The chi-squared statistic only approximately follows the chi-squared distribution, and the approximation breaks down when expected frequencies are small. A cell with inflates the term and distorts the test, so adjacent cells are combined until every pooled expected frequency is at least .
Pooling reduces the number of cells, which lowers the degrees of freedom, since degrees of freedom start from the number of cells used in the final calculation.
Degrees of freedom (number of cells after pooling) (number of parameters estimated from the data). The subtraction of is for the constraint that expected and observed totals match; each estimated parameter (here the Poisson mean) removes one further degree of freedom.
Markers reward the small-expected-frequency reasoning, the pooling rule of , and the full degrees-of-freedom formula including the parameter subtraction.
Related dot points
- Hypothesis tests for the mean of a Poisson distribution, tests for a population mean using the normal distribution, one-tailed and two-tailed tests, and the meaning of Type I and Type II errors.
A focused answer to the AQA A-Level Further Mathematics hypothesis testing content, covering tests for the mean of a Poisson distribution, tests for a population mean using the normal distribution, one-tailed and two-tailed tests, and the meaning of Type I and Type II errors.
- The Poisson distribution as a model for random events, its mean and variance, calculating probabilities, the sum of independent Poisson variables, and the Poisson approximation to the binomial.
A focused answer to the AQA A-Level Further Mathematics Poisson distribution content, covering the Poisson distribution as a model for random events, its mean and variance, calculating probabilities, the sum of independent Poisson variables, and the Poisson approximation to the binomial.
- Probability distributions of discrete random variables, the expectation and variance, the effect of linear coding, and expectation and variance of functions of a random variable.
A focused answer to the AQA A-Level Further Mathematics discrete random variables content, covering probability distributions, the expectation and variance, the effect of linear coding, and the expectation and variance of functions of a random variable.
- Confidence intervals for a population mean with known variance, the meaning of a confidence level, the effect of sample size and confidence level on width, and using the t distribution when the variance is unknown.
A focused answer to the AQA A-Level Further Mathematics confidence intervals content, covering confidence intervals for a population mean with known variance, the meaning of a confidence level, the effect of sample size and confidence level on width, and the use of the t distribution when the variance is unknown.
Sources & how we know this
- AQA A-level Further Mathematics (7367) specification β AQA (2017)