Skip to main content
ScotlandStatisticsSyllabus dot point

How do you measure and model the linear relationship between two variables?

Analyse bivariate data using scatter plots, the sums of squares and products, the product-moment correlation coefficient, and the least-squares regression line, and assess the model with residual plots and the limitations of extrapolation.

A focused answer to the SQA Advanced Higher Statistics bivariate data content: scatter plots, the sums of squares Sxx, Syy and Sxy, the product-moment correlation coefficient, the least-squares regression line, prediction, residual plots and the dangers of extrapolation.

Generated by Claude Opus 4.813 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Scatter plots and the sums of squares
  3. The product-moment correlation coefficient
  4. The least-squares regression line
  5. Residual plots and limitations
  6. Try this

What this dot point is asking

Bivariate analysis studies how two variables move together. The SQA wants you to picture the relationship with a scatter plot, to quantify its strength with the product-moment correlation coefficient, to fit a least-squares regression line, to use it for prediction, and crucially to judge the fit using residual plots and to recognise where prediction is unsafe.

Scatter plots and the sums of squares

A scatter plot shows the form (linear or not), direction (positive or negative) and strength of a relationship at a glance, and it should always come first.

These three quantities are the raw material for both correlation and regression, so computing them carefully is the first calculation in any bivariate question.

The product-moment correlation coefficient

The correlation coefficient rr measures the strength and direction of a linear relationship.

The least-squares regression line

The regression line of yy on xx is the straight line that minimises the sum of the squared vertical residuals.

Residual plots and limitations

A residual is the vertical gap between an observed point and the line, ei=yiβˆ’y^ie_i=y_i-\hat{y}_i. Plotting residuals against xx (or against y^\hat{y}) checks whether the straight-line model is appropriate.

Try this

Q1. Given Sxx=50S_{xx}=50, Syy=72S_{yy}=72, Sxy=48S_{xy}=48, find rr. [2 marks]

  • Cue. r=4850Γ—72=4860=0.8r=\dfrac{48}{\sqrt{50\times 72}}=\dfrac{48}{60}=0.8, a strong positive linear relationship.

Q2. A residual plot of a fitted line shows a clear U-shape. State what this tells you. [1 mark]

  • Cue. The relationship is not linear, so a straight-line model is inappropriate and a curved model should be considered.

Exam-style practice questions

Practice questions written in the style of SQA exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AH style: correlation4 marksFor a sample of n=6n=6 pairs, Sxx=40S_{xx}=40, Syy=90S_{yy}=90 and Sxy=54S_{xy}=54. Calculate the product-moment correlation coefficient and describe the relationship.
Show worked answer β†’

Product-moment correlation: r=SxySxxSyy=5440Γ—90r=\dfrac{S_{xy}}{\sqrt{S_{xx}S_{yy}}}=\dfrac{54}{\sqrt{40\times 90}} (1 mark).

40Γ—90=3600=60\sqrt{40\times 90}=\sqrt{3600}=60 (1 mark), so r=5460=0.9r=\dfrac{54}{60}=0.9 (1 mark).

Since r=0.9r=0.9 is close to 11, there is a strong positive linear relationship: as xx increases, yy tends to increase (1 mark). Markers reward the formula, the computed denominator, the value of rr and a correct interpretation.

AH style: regression4 marksUsing Sxx=40S_{xx}=40, Sxy=54S_{xy}=54, xˉ=5\bar{x}=5 and yˉ=12\bar{y}=12, find the least-squares regression line of yy on xx and predict yy when x=7x=7.
Show worked answer β†’

Gradient: b=SxySxx=5440=1.35b=\dfrac{S_{xy}}{S_{xx}}=\dfrac{54}{40}=1.35 (1 mark).

Intercept from a=yΛ‰βˆ’bxΛ‰=12βˆ’1.35Γ—5=12βˆ’6.75=5.25a=\bar{y}-b\bar{x}=12-1.35\times 5=12-6.75=5.25 (1 mark).

Line: y^=5.25+1.35x\hat{y}=5.25+1.35x (1 mark).

Prediction at x=7x=7: y^=5.25+1.35Γ—7=5.25+9.45=14.7\hat{y}=5.25+1.35\times 7=5.25+9.45=14.7 (1 mark). Markers reward the gradient, the intercept, the equation and a correct prediction within the data range.

Related dot points

Sources & how we know this