Skip to main content
EnglandStatisticsSyllabus dot point

How do you collect reliable, valid data and design questions that are not biased?

Sources of data, reliability and validity, designing questionnaires and data collection sheets, open and closed questions, leading questions, pilots, and cleaning data before processing.

A focused answer to Edexcel GCSE Statistics on collecting data and designing questionnaires, covering data sources, reliability and validity, open and closed questions, designing non-overlapping response boxes, spotting leading questions, pilots, and cleaning data before processing.

Generated by Claude Opus 4.89 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this dot point is asking
  2. Sources of data and collection sheets
  3. Reliability and validity
  4. Designing questionnaire questions
  5. Leading questions and bias
  6. Pilots and pre-tests
  7. Cleaning data

What this dot point is asking

Edexcel codes 1d.01 to 1d.06 require you to know the sources data can come from, the meaning of reliability and validity, how to design questionnaires and data collection sheets, the features of good questions (open and closed, avoiding leading questions, time factors, interview technique), the role of pilots and pre-tests, and why and how data is cleaned before processing. Rewriting a faulty question or set of response boxes, and spotting an error to clean, are classic exam tasks.

Sources of data and collection sheets

Edexcel lists several sources of data (code 1d.01): experimental (laboratory, field or natural), simulation (often using random numbers), questionnaires, observation, reference sources, census and sampling. You should be able to design a suitable data collection sheet (often a tally chart with clear categories) to record the data efficiently. Sources of secondary data must be acknowledged.

Reliability and validity

A bathroom scale that reads differently each time you step on it is unreliable; a survey of "fitness" that only asks about gym visits may be reliable but not valid, because it misses other forms of exercise. Edexcel expects you to comment on both when judging a data collection method.

Designing questionnaire questions

Good questions share clear features:

  • Closed questions offer fixed response options, which are quick to analyse; open questions allow any answer, giving richer detail but harder analysis.
  • Response boxes must not overlap and must cover every possible value (exhaustive), and should include units and a time frame.
  • Questions must be neutral, never leading or emotive.

For example, "How many hours of TV do you watch per day?" needs non-overlapping boxes such as "00 to 11", "more than 11 up to 22", "more than 22", with a stated time frame ("per day"). Boxes such as "00 to 11" and "11 to 22" are faulty because 11 falls in both.

Leading questions and bias

A leading question steers the respondent towards a particular answer, for example "Don't you agree our excellent service is the best?" Emotive words ("excellent", "delicious") and built-in assumptions bias the results. Edexcel also expects awareness of other planning factors: avoiding biased sources, time factors, and choosing an appropriate interview technique (face to face, phone, online), each with its own advantages and disadvantages.

Pilots and pre-tests

A pilot survey (for questionnaires) or pre-test (for experiments) is a small trial run before the main study. It checks that the questions are clear, the response boxes work, the sampling is practical and the data can be analysed, so problems are found and fixed cheaply before committing to the full investigation.

Cleaning data

Before processing, raw data often needs cleaning (codes 1d.05 to 1d.06). Typical issues are missing data, incomplete responses, values in the wrong format, extraneous symbols (especially in spreadsheets), and outliers or anomalies. You decide whether an outlier is a genuine unusual value or a recording error, and you correct or remove erroneous entries so later calculations are not distorted.

Exam-style practice questions

Practice questions written in the style of Pearson Edexcel exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Edexcel 1ST0 20204 marksA cafe owner uses this question on a survey: 'Do you agree that our delicious cakes are reasonably priced?' with response boxes 'Yes' and 'No'. (a) Give two criticisms of this question. (b) Write an improved version of the question with suitable response options.
Show worked answer →

(a) Two criticisms, for example: it is a leading question (the words "delicious" and "reasonably priced" push the respondent towards "Yes"); it asks two things at once (taste and price), so a single answer is ambiguous; the Yes/No options give no middle or "no opinion" choice.

(b) An improved version: "How would you rate the price of our cakes?" with options such as "Very cheap, Cheap, About right, Expensive, Very expensive".

Markers reward two valid distinct criticisms and a neutral rewritten question with balanced, non-overlapping options.

Edexcel 1ST0 20223 marksA researcher records the time, in minutes, that 55 people spend on a website: 12,8,15,200,912, 8, 15, 200, 9. (a) Identify which value should be checked when cleaning the data, and explain why. (b) State what a pilot survey is and give one reason for carrying one out.
Show worked answer →

(a) The value 200200 should be checked: it is far larger than the others and is likely to be an error (for example seconds recorded as minutes, or a typing mistake), so it is a potential outlier that should be verified or corrected before processing.

(b) A pilot survey is a small-scale trial of the questionnaire and methods carried out before the main survey. One reason: to check the questions are clear and unambiguous (or to check the response boxes and data collection work) so problems can be fixed cheaply first.

Markers reward identifying 200200 with a reason, plus a correct definition of a pilot and one valid reason.

Related dot points

Sources & how we know this