Simply having enough data is not enough to guarantee that a conclusion drawn is warranted; it is also important that the data is drawn from a variety of sources and obtained under a variety of different conditions.

A survey of voting intentions conducted outside the local Conservative Club is not going to provide an accurate guide to who is going to win the next general election. A disproportionate number of people in the vicinity will be Conservative voters, and so the results of the survey will be skewed in favour of the Tory party. The sample is not representative.

A survey to find out what proportion of the population own mobile phones would be similarly (though less obviously) flawed if it were conducted near a Sixth-Form College. The sample of the population would be skewed towards teenagers, who are more likely than average to own mobile phones, distorting the figures.

Collecting data from a variety of sources is one thing; collecting it under a variety of conditions is another. A survey of what type of vehicles use local roads conducted at a variety of locations, but always at the same time of day, would not yield representative data. Conducting it during rush-hour would mean that commuter-traffic would be over-represented in the results; conducting it in the evenings might mean that public transport would under-represented in the results. Differences in what types of drivers drive at what times would need to be factored in when designing the experiment.

The quality of a data-set is thus not just a matter of how much data it contains, but also of how representative that data is likely to be. To minimise the problem of unrepresentative data, evidence must be collected from as wide a range of sources as possible, and under as varied conditions as possible.