Skip to main content
ScotlandStatisticsSyllabus dot point

How do you test whether categorical data fit a model or whether two categorical variables are associated?

Carry out the chi-squared goodness-of-fit test and the chi-squared test for association in a contingency table, computing expected frequencies, the chi-squared statistic and degrees of freedom, and interpreting the result against the assumptions.

A focused answer to the SQA Advanced Higher Statistics chi-squared content: the goodness-of-fit test and the test for association in a contingency table, computing expected frequencies, the chi-squared statistic and degrees of freedom, the minimum expected frequency rule, and interpreting the outcome.

Generated by Claude Opus 4.813 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. The chi-squared statistic
  3. The goodness-of-fit test
  4. The contingency table test for association
  5. The expected-frequency assumption
  6. Try this

What this dot point is asking

When the data are counts in categories rather than measurements, the chi-squared family is the tool. The SQA wants you to run two tests: a goodness-of-fit test (do observed category counts match a proposed model?) and a test for association in a contingency table (are two categorical variables independent?). For each you compute expected frequencies, the chi-squared statistic and the degrees of freedom, then judge against the assumptions.

The chi-squared statistic

Both tests share the same statistic, which measures the total relative discrepancy between observed and expected counts.

Because every term is squared and divided by the expected count, a cell where the observed count is far from expected contributes a lot, while a close match contributes almost nothing.

The goodness-of-fit test

This tests whether observed counts across categories are consistent with a proposed distribution.

The contingency table test for association

This tests whether two categorical variables are independent.

The expected-frequency assumption

The chi-squared distribution is only an approximation to the true distribution of the statistic, and it relies on the expected counts not being too small.

Try this

Q1. A goodness-of-fit test compares observed counts across 55 categories against a fully specified model. State the degrees of freedom. [1 mark]

  • Cue. No parameters are estimated, so degrees of freedom =51=4= 5 - 1 = 4.

Q2. In a contingency table, a cell has row total 6060, column total 3030 and grand total 180180. Find its expected frequency. [1 mark]

  • Cue. E=60×30180=1800180=10E = \dfrac{60 \times 30}{180} = \dfrac{1800}{180} = 10.

Exam-style practice questions

Practice questions written in the style of SQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AH style: goodness of fit4 marksA die is rolled 6060 times with frequencies 8,9,11,10,13,98, 9, 11, 10, 13, 9 for faces 11 to 66. Test at the 5%5\% level whether the die is fair. (Use χ0.05,52=11.07\chi^2_{0.05, 5} = 11.07.)
Show worked answer →

Hypotheses: H0H_0 that the die is fair (each face equally likely) against H1H_1 that it is not (1 mark).

Each expected frequency is 606=10\dfrac{60}{6} = 10. Compute χ2=(OE)2E\chi^2 = \sum \dfrac{(O - E)^2}{E}: 410+110+110+010+910+110=1610=1.6\dfrac{4}{10} + \dfrac{1}{10} + \dfrac{1}{10} + \dfrac{0}{10} + \dfrac{9}{10} + \dfrac{1}{10} = \dfrac{16}{10} = 1.6 (2 marks).

Degrees of freedom =61=5= 6 - 1 = 5; since 1.6<11.071.6 < 11.07, do not reject H0H_0: there is insufficient evidence at the 5%5\% level that the die is unfair (1 mark). Markers reward the hypotheses, the expected frequencies, the chi-squared statistic and the conclusion with degrees of freedom.

AH style: contingency3 marksA 3×23 \times 2 contingency table is tested for association. State how to find an expected frequency and the degrees of freedom for the test.
Show worked answer →

The expected frequency for a cell is E=(row total)×(column total)grand totalE = \dfrac{(\text{row total}) \times (\text{column total})}{\text{grand total}}, computed under H0H_0 that the two variables are independent (1 mark).

Degrees of freedom for an r×cr \times c table are (r1)(c1)(r - 1)(c - 1); here (31)(21)=2(3 - 1)(2 - 1) = 2 (1 mark).

The statistic χ2=(OE)2E\chi^2 = \sum \dfrac{(O - E)^2}{E} is then compared with the critical value on 22 degrees of freedom; a large value gives evidence of association (1 mark). Markers reward the expected-frequency formula, the degrees of freedom and the test structure.

Related dot points

Sources & how we know this