How do you test whether categorical data fit a model or whether two categorical variables are associated?
Carry out the chi-squared goodness-of-fit test and the chi-squared test for association in a contingency table, computing expected frequencies, the chi-squared statistic and degrees of freedom, and interpreting the result against the assumptions.
A focused answer to the SQA Advanced Higher Statistics chi-squared content: the goodness-of-fit test and the test for association in a contingency table, computing expected frequencies, the chi-squared statistic and degrees of freedom, the minimum expected frequency rule, and interpreting the outcome.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
When the data are counts in categories rather than measurements, the chi-squared family is the tool. The SQA wants you to run two tests: a goodness-of-fit test (do observed category counts match a proposed model?) and a test for association in a contingency table (are two categorical variables independent?). For each you compute expected frequencies, the chi-squared statistic and the degrees of freedom, then judge against the assumptions.
The chi-squared statistic
Both tests share the same statistic, which measures the total relative discrepancy between observed and expected counts.
Because every term is squared and divided by the expected count, a cell where the observed count is far from expected contributes a lot, while a close match contributes almost nothing.
The goodness-of-fit test
This tests whether observed counts across categories are consistent with a proposed distribution.
The contingency table test for association
This tests whether two categorical variables are independent.
The expected-frequency assumption
The chi-squared distribution is only an approximation to the true distribution of the statistic, and it relies on the expected counts not being too small.
Try this
Q1. A goodness-of-fit test compares observed counts across categories against a fully specified model. State the degrees of freedom. [1 mark]
- Cue. No parameters are estimated, so degrees of freedom .
Q2. In a contingency table, a cell has row total , column total and grand total . Find its expected frequency. [1 mark]
- Cue. .
Exam-style practice questions
Practice questions written in the style of SQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AH style: goodness of fit4 marksA die is rolled times with frequencies for faces to . Test at the level whether the die is fair. (Use .)Show worked answer →
Hypotheses: that the die is fair (each face equally likely) against that it is not (1 mark).
Each expected frequency is . Compute : (2 marks).
Degrees of freedom ; since , do not reject : there is insufficient evidence at the level that the die is unfair (1 mark). Markers reward the hypotheses, the expected frequencies, the chi-squared statistic and the conclusion with degrees of freedom.
AH style: contingency3 marksA contingency table is tested for association. State how to find an expected frequency and the degrees of freedom for the test.Show worked answer →
The expected frequency for a cell is , computed under that the two variables are independent (1 mark).
Degrees of freedom for an table are ; here (1 mark).
The statistic is then compared with the critical value on degrees of freedom; a large value gives evidence of association (1 mark). Markers reward the expected-frequency formula, the degrees of freedom and the test structure.
Related dot points
- Set up null and alternative hypotheses, choose a significance level, compute and use a test statistic and p-value, decide between one- and two-tailed tests, identify the critical region, and distinguish Type I and Type II errors.
A focused answer to the SQA Advanced Higher Statistics hypothesis testing framework: forming null and alternative hypotheses, the significance level, the test statistic, the p-value and critical region, one- and two-tailed tests, and Type I and Type II errors.
- Carry out the main non-parametric tests, including the Mann-Whitney U test for two independent samples and the Wilcoxon signed-rank test for paired or single samples, explaining when a non-parametric test is preferred over a t-test.
A focused answer to the SQA Advanced Higher Statistics non-parametric test content: the Mann-Whitney U test for two independent samples and the Wilcoxon signed-rank test for paired data, how each ranks the data, the assumptions they relax, and when to prefer them over a t-test.
- Carry out hypothesis tests for a single population proportion and for the difference between two proportions, using the normal approximation, stating the hypotheses, computing the test statistic and interpreting the result.
A focused answer to the SQA Advanced Higher Statistics proportion test content: testing a single population proportion and the difference between two proportions using the normal approximation, with the test statistics, the pooled estimate for two samples, and how to interpret the outcome.
- Analyse bivariate data using scatter plots, the sums of squares and products, the product-moment correlation coefficient, and the least-squares regression line, and assess the model with residual plots and the limitations of extrapolation.
A focused answer to the SQA Advanced Higher Statistics bivariate data content: scatter plots, the sums of squares Sxx, Syy and Sxy, the product-moment correlation coefficient, the least-squares regression line, prediction, residual plots and the dangers of extrapolation.
Sources & how we know this
- SQA Advanced Higher Statistics Course Specification (C803 77) — SQA (2023)
- SQA Advanced Higher Statistics Data Booklet — SQA (2019)