How do you measure how strongly two variables are related, and use a line of best fit to predict?
Measuring linear association with Pearson's correlation coefficient, fitting a simple linear regression line, interpreting its slope and intercept, and using it to predict while distinguishing interpolation from extrapolation.
A focused answer to the SQA Higher Applications of Mathematics content on correlation and regression, covering Pearson's r, the strength and direction of association, the least-squares regression line, interpreting slope and intercept, prediction, and correlation versus causation.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
The SQA wants you to measure the strength and direction of a linear relationship between two variables using Pearson's correlation coefficient, fit and interpret a regression line, use it to predict, and recognise that correlation does not prove causation. Software produces the numbers; your job is to interpret and apply them.
Pearson's correlation coefficient
Pearson's correlation coefficient, written , measures how closely points lie to a straight line. You obtain it from software (for example the PEARSON function in a spreadsheet or cor in R); the skill examined is interpretation.
State both the strength (strong, moderate, weak) and the direction (positive or negative) in context. Remember only detects a linear pattern: a strong curved relationship can still give an near zero, so a scatter plot should accompany the number.
The regression line
When the relationship is roughly linear, the least-squares regression line is the straight line that best fits the data by minimising the total squared vertical distance from the points. Software gives it in the form .
The regression is of on , predicting from , so identify which variable is the explanatory one (the input) and which is the response (the output you predict).
Prediction, interpolation and extrapolation
To predict, substitute an -value into the line. The reliability depends on where the -value sits.
For the heating line above, predicting cost at degrees (within typical data) is interpolation, while predicting at degrees is an extrapolation well outside any realistic Scottish data and should be treated with caution.
Correlation is not causation
A strong correlation shows two variables move together, but not that one causes the other. The link may be coincidence, reverse, or driven by a confounding variable that affects both. The classic example is ice cream sales and drownings, both driven by hot weather. The SQA expects you to state this caution and suggest a plausible confounder when relevant.
Try this
Q1. A correlation coefficient is . Describe the relationship. [1 mark]
- Cue. A strong negative linear relationship: as one variable increases, the other decreases.
Q2. A regression line is . Interpret the slope and predict when . [2 marks]
- Cue. Each unit of adds about to ; at , .
Q3. Sales of sunglasses and visits to outdoor pools are strongly correlated. Explain why one need not cause the other. [2 marks]
- Cue. A confounding variable, hot sunny weather, raises both, so the correlation reflects a common cause, not causation.
Exam-style practice questions
Practice questions written in the style of SQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
SQA Higher Apps style5 marksSoftware gives the regression line of exam mark on revision hours as , with correlation coefficient . Interpret the slope and the value of , and predict the mark for hours of revision.Show worked answer →
The slope means each extra hour of revision is associated with about more marks, and the intercept is the predicted mark for zero revision (2 marks).
A correlation of is a strong positive linear relationship, so more revision is associated with higher marks and the line fits the data well (1 mark).
Predicting for , marks (2 marks). Markers reward interpreting the slope as marks per hour, classifying as strong positive, and a correct substitution.
SQA Higher Apps style4 marksA study finds a strong positive correlation between ice cream sales and the number of drownings at beaches. Explain why this does not mean ice cream causes drownings, and name the likely underlying cause.Show worked answer →
A correlation only measures that two variables move together; it does not establish that one causes the other (1 mark).
Here a third variable, hot sunny weather, increases both ice cream sales and the number of people swimming, which raises drownings; this is a confounding variable (2 marks).
So the association is explained by the common cause of warm weather, not by any direct link between ice cream and drowning (1 mark). Markers reward the statement that correlation is not causation and the identification of a sensible confounding variable.
Related dot points
- Selecting and interpreting statistical diagrams, comparing data sets using measures of centre and spread, identifying outliers and misleading graphs, and choosing an appropriate sampling method.
A focused answer to the SQA Higher Applications of Mathematics content on statistical diagrams and sampling, covering box plots and histograms, comparing distributions, outliers, misleading graphs, data types, and sampling methods.
- Carrying out and interpreting hypothesis tests (t-tests and z-tests), using the p-value and significance level to reach a conclusion, constructing and interpreting confidence intervals, and recognising errors in statistical testing.
A focused answer to the SQA Higher Applications of Mathematics inferential statistics content, covering null and alternative hypotheses, p-values and significance levels, t-tests and z-tests, confidence intervals, and errors in statistical testing.
- Calculating probabilities of single and combined events using the addition and multiplication rules and tree diagrams, working with conditional probability, and finding the expected value of a situation with uncertain outcomes.
A focused answer to the SQA Higher Applications of Mathematics probability content, covering basic probability, combining events with the addition and multiplication rules, tree diagrams, conditional probability, and calculating expected value.
- Modelling real-life situations with variables, formulae and graphs, including linear, piecewise linear and exponential growth and decay models, and using the model to make predictions.
A focused answer to the SQA Higher Applications of Mathematics modelling content, covering how to define variables, build linear, piecewise and exponential models, read them off graphs, and use them to predict and explain a real situation.
- Understanding the course assessment: the question paper and the statistics project, how marks are split and combined into the A to D grade, and the use of software in both components.
A concise overview of how SQA Higher Applications of Mathematics is assessed, covering the question paper, the statistics project, the mark split and grading, and how software is used across both components.