How do you use a sample to estimate features of the whole population?
Using summary statistics to estimate population characteristics; estimating the population mean from a sample; predicting population proportions; the effect of sample size on reliability and replication.
A focused answer to Edexcel GCSE Statistics on statistical inference, covering using summary statistics to estimate population characteristics, estimating the population mean from a sample, predicting population proportions, and how sample size affects reliability and replication.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this dot point is asking
Edexcel codes 2h.01 and 2h.03 require you to use summary statistics from a sample to estimate population characteristics, in particular to use a sample mean to estimate the population mean and to predict population proportions, and to know that sample size affects the reliability and replication of results. This is the heart of statistical inference: drawing conclusions about a whole population from a sample.
Using a sample to estimate the population
A well-chosen sample stands in for the population, so its summary statistics become estimates of the population's. Edexcel expects you to recognise, for example, that approximately half the population is expected to lie above the sample median, because the median splits the data in two. The reliability of any such estimate depends entirely on the sample being representative.
Estimating the population mean
The sample mean is the natural estimate of the population mean. If a sample of components has a mean mass of g, you estimate the mean mass of all components as g, and you can scale up to a total (for components, an estimated total of g). This works because, for a random sample, the sample mean is centred on the population mean; it will not be exactly right, but it is the best single estimate.
Predicting population proportions
A sample proportion estimates the population proportion. If out of sampled voters say Yes, the estimated proportion is , and you predict that about of all voters would say Yes. To estimate a count in the population, multiply the proportion by the population size: . This scaling-up is one of the most common inference tasks in the exam.
Sample size, reliability and replication
Code 2h.03 stresses that conclusions based on larger samples are generally more reliable, because random sampling variation has less effect on a big sample, so a repeat study is more likely to give a similar result. However, sample size does not cure bias: a large but unrepresentative sample (from a poor frame or a non-random method) still gives a biased estimate. So both a good method and an adequate size are needed.
This links directly to the quality-assurance idea that sample means vary less than individual values: averaging over more data smooths out the extremes, so a mean based on a larger sample is a tighter, more trustworthy estimate of the population mean. It is also why a single small sample should be treated with caution, and why repeating a study (replication) and getting a consistent answer strengthens confidence in the conclusion.
Estimating other characteristics
The same logic extends beyond the mean and a single proportion. A sample can be used to estimate the population median (about half the population lies above the sample median), the population range or spread, and the frequency of any category. In each case you treat the sample statistic as the best estimate of the matching population value, and where appropriate scale it up using the population size. Edexcel may give you a sample summary (a mean, a median, a set of class frequencies) and ask you to make a statement about the whole population, so practise turning a sample figure into a population estimate and stating clearly that it is an estimate, not an exact value.
Exam-style practice questions
Practice questions written in the style of Pearson Edexcel exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Edexcel 1ST0 20204 marksA sample of light bulbs has a mean lifetime of hours. The factory produces bulbs. (a) Estimate the total number of bulb-hours the factory's output will last. (b) State one way the estimate could be made more reliable.Show worked answer →
(a) Use the sample mean as an estimate of the population mean ( hours per bulb). Total bulb-hours hours.
(b) Take a larger sample (or several samples), because a larger sample gives a more reliable estimate of the population mean and reduces the effect of random variation.
Markers reward using the sample mean for the population, the total million hours, and a valid way to improve reliability (larger sample).
Edexcel 1ST0 20223 marksIn a random sample of voters, said they would vote Yes in a referendum. (a) Estimate the proportion of all voters who would vote Yes. (b) If the electorate is , estimate the number who would vote Yes.Show worked answer →
(a) Estimated proportion , so about would vote Yes.
(b) Apply the proportion to the population: voters.
Markers reward the sample proportion and scaling it up to estimate for the population.
Related dot points
- The Petersen capture-recapture formula to estimate a population size; the assumptions the method relies on and their appropriateness; the role of sample size in the reliability of the estimate.
A focused answer to Edexcel GCSE Statistics (Higher tier) on the capture-recapture method, covering the Petersen formula to estimate a population size, the assumptions it relies on and their appropriateness, and how sample size affects the reliability of the estimate.
- Population, sampling frame and sample; simple random, systematic, stratified, quota, cluster, judgement and opportunity sampling; selecting random members; calculating strata sizes.
A focused answer to Edexcel GCSE Statistics on sampling, covering population, sampling frame and sample, simple random, systematic, stratified, quota, cluster, judgement and opportunity sampling, selecting random members electronically, and calculating stratified sample sizes.
- Mode, median and mean for discrete and grouped data; estimating the mean of grouped data with midpoints; linear interpolation for the median; weighted and geometric mean; effect of changes and transformations on averages.
A focused answer to Edexcel GCSE Statistics on averages, covering mode, median and mean for discrete and grouped data, estimating the mean with class midpoints, linear interpolation for the median, weighted and geometric mean at Higher tier, and the effect of changes and transformations.
- Characteristics of a Normal distribution; the notation N(mu, sigma squared); the symmetrical bell shape with equal mean, median and mode; the 68, 95 and 99.7 per cent proportions; conditions for a Normal model.
A focused answer to Edexcel GCSE Statistics (Higher tier) on the Normal distribution, covering its symmetrical bell shape, the notation N(mu, sigma squared), equal mean, median and mode, the proportions within one, two and three standard deviations, and the conditions that make a Normal model suitable.
Sources & how we know this
- Pearson Edexcel GCSE (9-1) Statistics (1ST0) specification — Pearson Edexcel (2017)