Skip to main content
ScotlandApplications of MathematicsSyllabus dot point

How do you measure how strongly two variables are related, and use a line of best fit to predict?

Measuring linear association with Pearson's correlation coefficient, fitting a simple linear regression line, interpreting its slope and intercept, and using it to predict while distinguishing interpolation from extrapolation.

A focused answer to the SQA Higher Applications of Mathematics content on correlation and regression, covering Pearson's r, the strength and direction of association, the least-squares regression line, interpreting slope and intercept, prediction, and correlation versus causation.

Generated by Claude Opus 4.89 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Pearson's correlation coefficient
  3. The regression line
  4. Prediction, interpolation and extrapolation
  5. Correlation is not causation
  6. Try this

What this dot point is asking

The SQA wants you to measure the strength and direction of a linear relationship between two variables using Pearson's correlation coefficient, fit and interpret a regression line, use it to predict, and recognise that correlation does not prove causation. Software produces the numbers; your job is to interpret and apply them.

Pearson's correlation coefficient

Pearson's correlation coefficient, written rr, measures how closely points lie to a straight line. You obtain it from software (for example the PEARSON function in a spreadsheet or cor in R); the skill examined is interpretation.

State both the strength (strong, moderate, weak) and the direction (positive or negative) in context. Remember rr only detects a linear pattern: a strong curved relationship can still give an rr near zero, so a scatter plot should accompany the number.

The regression line

When the relationship is roughly linear, the least-squares regression line is the straight line that best fits the data by minimising the total squared vertical distance from the points. Software gives it in the form y=a+bxy = a + bx.

The regression is of yy on xx, predicting yy from xx, so identify which variable is the explanatory one (the input) and which is the response (the output you predict).

Prediction, interpolation and extrapolation

To predict, substitute an xx-value into the line. The reliability depends on where the xx-value sits.

For the heating line above, predicting cost at x=8x = 8 degrees (within typical data) is interpolation, while predicting at x=40x = 40 degrees is an extrapolation well outside any realistic Scottish data and should be treated with caution.

Correlation is not causation

A strong correlation shows two variables move together, but not that one causes the other. The link may be coincidence, reverse, or driven by a confounding variable that affects both. The classic example is ice cream sales and drownings, both driven by hot weather. The SQA expects you to state this caution and suggest a plausible confounder when relevant.

Try this

Q1. A correlation coefficient is r=0.91r = -0.91. Describe the relationship. [1 mark]

  • Cue. A strong negative linear relationship: as one variable increases, the other decreases.

Q2. A regression line is y=50+3xy = 50 + 3x. Interpret the slope and predict yy when x=12x = 12. [2 marks]

  • Cue. Each unit of xx adds about 33 to yy; at x=12x = 12, y=50+36=86y = 50 + 36 = 86.

Q3. Sales of sunglasses and visits to outdoor pools are strongly correlated. Explain why one need not cause the other. [2 marks]

  • Cue. A confounding variable, hot sunny weather, raises both, so the correlation reflects a common cause, not causation.

Exam-style practice questions

Practice questions written in the style of SQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

SQA Higher Apps style5 marksSoftware gives the regression line of exam mark yy on revision hours xx as y=32+4.5xy = 32 + 4.5x, with correlation coefficient r=0.86r = 0.86. Interpret the slope and the value of rr, and predict the mark for 1010 hours of revision.
Show worked answer →

The slope 4.54.5 means each extra hour of revision is associated with about 4.54.5 more marks, and the intercept 3232 is the predicted mark for zero revision (2 marks).

A correlation of r=0.86r = 0.86 is a strong positive linear relationship, so more revision is associated with higher marks and the line fits the data well (1 mark).

Predicting for x=10x = 10, y=32+4.5×10=32+45=77y = 32 + 4.5 \times 10 = 32 + 45 = 77 marks (2 marks). Markers reward interpreting the slope as marks per hour, classifying rr as strong positive, and a correct substitution.

SQA Higher Apps style4 marksA study finds a strong positive correlation between ice cream sales and the number of drownings at beaches. Explain why this does not mean ice cream causes drownings, and name the likely underlying cause.
Show worked answer →

A correlation only measures that two variables move together; it does not establish that one causes the other (1 mark).

Here a third variable, hot sunny weather, increases both ice cream sales and the number of people swimming, which raises drownings; this is a confounding variable (2 marks).

So the association is explained by the common cause of warm weather, not by any direct link between ice cream and drowning (1 mark). Markers reward the statement that correlation is not causation and the identification of a sensible confounding variable.

Related dot points

Sources & how we know this