Saturday, 14 March 2015

An introduction to surveys

Richard Payne of ITV has asked by way of our strategic code-sharing and data mining partners, Twitter, how to explain bias and self-selection in surveys.

This is a topic dear to our hearts for a number of reasons:
  1. We consider ourselves to Bristol's premier data-driven traffic analysis site.
  2. We recently conducted a survey on traffic issues for the city —a survey which has been completely ignored by the evening post, the BBC and ITV.
  3. We have just received an SERC grant for a new project to measure the weight of the city using a stopwatch and a trampoline —and plan to conduct our survey at the BRI next week.
Population: A collection "set" of things you want to measure values from.  Examples: the population of Bristol or all the residents of an area within Bristol.

Subset: Some or all entities within a subset. Example, some of the population of Bristol or some of the residents of an area within Bristol.

Proper subset: A subset of a set which is actually smaller than the original set. (fancy mathematical word: Cardinality). Examples: some but not all of the population of Bristol, or some but not all of the residents of an area within Bristol.

What is important here is that, by definition, a subset of a population must not contain any members outside that population. As examples, a subset of the population of Bristol must exclude people from North Somerset. Similarly, a subset of the residents of an area within Bristol must not contain anyone who does not live within that area.

In our survey we actually measured the origin of our self-selected sample to assess this. We could have just ignored them, but instead chose to include them in our answers on the basis that it was easier just to leave them in.




Data: Numbers. May be analysed by somebody with a statistical background to reach some meaningful conclusions. Without those mathematical skills you'll end up with something as useful as having a rabbit do your tax return.

Measurement: Using some form of scientific mechanism to come up with data about the things you measure. Examples: determining the weight of someone with a weighing scale. Determining the parking and driving habits of people by recording where they park or tracking where they drive.

Invalid Measurement: trying measure something by using the wrong tools, badly calibrated tools or reading the numbers off wrong. Example: determining the weight of people by having people jump onto a trampoline and using a stopwatch to time how long it takes for them to stop bouncing.

Poll: Asking people for their opinions. This is different from a survey in that it is assessing the beliefs of those people, rather than through measurement. Example: asking someone how much they think they weigh rather than putting them on a weighing scale. Asking people about parking and driving rather than actually recording or tracking them.

Leading questions. A sequence of questions which may, unintentionally or not, change the answers to follow-on questions. As example of leading questions, imagine the following sequence
  1. Are you aware that being overweight can lead to an increase in coronary heart disease and diabetes?
  2. Do you believe that overweight people should be billed by the NHS for medical care for weight-related conditions.
  3. Are you a fat bastard?
After the first two questions, nobody will say yes to question 3.

Census: Measuring or polling a Population. Examples: people whose weight you want to measure, or the residents of an area whose opinion on parking you want to known. A census of a population is the only way to come up with a value of the measurement or poll which can be considered 100% accurate in terms of sample set. Everything else is incomplete and therefore inaccurate to some degree.

Survey: Measuring or polling a proper subset of a population —with the goal being to extrapolate the results to the entire population. Examples: weighing only some of the people in Bristol to extrapolate the weight of everyone in the city, or polling some of the residents in part of the city to extrapolate to the opinions of all the residents of that area.

Sample: The proper subset of a population used in a survey. Examples: some but not all of the population of Bristol, or some but not all of the residents of an area within Bristol. Another term is Sample Set.

Defensible: Something which you can present to people who understand statistics without being laughed at.

Invalid Sample Set: A sample for a survey which cannot be used to extrapolate to the entire population. Examples:
  1. Including people from North Somerset in a survey to determine the average weight of the population of Bristol.
  2. Weighing only those Bristolians who have been referred to the BRI heart clinic in a survey to determine the average weight of the population of Bristol and using a trampoline and a stopwatch to do so.
  3. Using too small a survey set for the size of the total population. Example, weighing two people and attempting to reach a conclusion about the weight of the entire population of the city.
  4. Attempting to conduct an opinion poll of residents of part of the city without excluding non-residents of that region.
  5. Attempting to conduct an opinion poll of residents of the city within, say, residents parking zone, yet deliberately choosing to exclude parts of the area —such as, say, Kingsdown and the city centre.
  6. Excluding some of the population on the basis that they do not meet some criteria. Example: excluding anyone who doesn't own a car from any opinion poll on the topic of residents parking.
  7. Conducting an opinion poll of the residents of part of the city by only asking those people who have opinions on one specific outcome of the survey. Example: asking only people opposed to residents parking of their opinion on the topic. Conducting a survey by requiring participants to perform some action such as posting in their survey. The latter tends to something called self-selecting samples.

In our survey 32% of respondees declared they couldn't afford a car. These people don't have valid opinions on parking, nor on other parts of our own survey.

Statistical Outliers

These are a fun thing in experiments. Something way out of the expected. You can include these in your answer, though you can also try and work out how the outliers got in there and then discount them —this is especially useful if you are trying to make sure the survey reaches the conclusions you want it to.

Look at our question on the number of wing-mirrors replaced since the 20 mph rollout.


70% of the respondees claimed that they hadn't replaced any wingmirrors since the 20 mph zone. This was utterly unexpected, and, if used when trying to determine the average number of wingmirrors lost per resident per year, we get an arithmetic mean of 0.87 mirrors/year -less than one!

Yet can be explained if we include two other facts from our survey
  • the number of respondees who asserted that they lived outside the city: 54%
  • The number of respondees who asserted that they were too poor to own a car: 32%
As we are measuring the impact of 20 mph zones, we should be discounting those people from our analysis of this question:

Discounting non-car owners: 70-32 = 38. Therefore of respondees who owned car, only 38% of them got through the year without needing a new mirror.

Discounting non-residents. 54-38: -16! Which seems impossible, unless you consider that many of those non residents will have driven into a 20 mph zone, and so lost a mirror.

Once you discount the non-car owners and non-residents, we get the result we expected: since the 20 mph rollout, everyone in the 20 mph zone has lost one or more wing-mirrors, with the average number being 3. At 15-25 pounds a shot,  that wingmirror-tax is yet another tax on the hard-working motorist.

Causality and co-relatedness
Again, fascinating. Merely because two things appear correlated over time, doesn't mean that one causes another.


In this question Why has congestion got worse in Bristol over 25 years?, 17% said BT added an extra digit to all the phone numbers in the early 1990s. Some people may say "so what?", or even "the growth in Bristol's population caused BT to add more numbers; that same population growth increased the number of cars, hence the resultant congestion". We say something else: It was the adding of that digit which made it possible, in a pre-mobile-phone era, to move to the city. That 17% were right. And from this survey nobody can prove us wrong!

Invalid Survey
Any survey that can be considered invalid from a statistical perspective. Common causes are: invalid sample sets, leading questions, bad measurement, leading questions and bad-analysis, including confusing correlation for causation.

For some examples:
  1. Asserting facts about the average weight of Bristolians through an opinion poll with leading questions conducted at the BRI heart clinic of 4-5 people, without even excluding any attendees from North Somerset. That fails: invalid sample set and leading questions.
  2. Asserting facts about the entire population of Bristol's opinions on residents parking through an opinion poll with leading questions conducted against a self-selected sample set of some people who care about the subject.
  3. Getting your maths wrong when you add things up, divide the answers, etc.
  4. Misinterpretation of results. Reaching the wrong conclusions. If you want to reach a set of conclusions, you are less likely to question the sampling or analysis if the outcome agrees with your expectation. This is sometimes called confirmation bias
One classic way to bias a survey is simply to discount the "don't know" respondees and other non-participants. If you explicitly exclude these people from your survey —example by asking different people the question until you get one whose answer is interesting enough to write an article about, you've got an invalid subset of the population, hence the results cannot be extrapolated.

This problem of the don't-know answer is particularly bad in any self-selected survey because the members of the population who don't not hold opinions tend not to participate in it. Instead you get that subset of the population who hold opinions one way or the other.  It is also common for any survey which requires an action on behalf of the respondee, be it jump on a trampoline holding a stopwatch, or fill in a paper questionnaire and then post it.

Summary

As you can see, it is a lot easier to produce an invalid survey than a valid one. More subtly, its very easy to misinterpret a invalid survey for a valid one without knowledge of the sampling and measuring process, and knowledge of statistics.

For that reason, while surveys can provide some data about a subject, you can't consider the conclusions to be valid without knowing about the sampling, measuring and analysis —and any bias of the surveyors.

When you reviewing a survey, you should really query
  1. The population for which the survey is meant to be analysing
  2. The sampling process conducted in order to get a valid sample set
  3. How things were measured
  4. If it is some form of poll, the sequence and content of the questions.
  5. Outliers: what were they? were any discounted?
  6. What compensation have you made for non-participants?
  7. How do you defend your claim that this survey can be extrapolated to the population it was meant to.
Alongside invalid surveys, you have bad reporting of surveys. Often this where an invalid survey has been conducted yet the survey is reported as if it is actually represented as "fact" or representative of the entire population. This is a shame that reporters do accept surveys so unquestioningly -as if they did, they'd realise how often politicians use bad maths to make decisions. Or, in the case of Bristol, to generate controversy for the local press where otherwise there'd be nothing in the papers to talk about.

Further Reading

We hope readers found this introduction to surveys and censuses informative and timely. Please practice what have learned by using some of the terms introduced above in your everyday conversation —at least once per day. Example uses

  • "Please can I sample some of your chips"
  • "the causality relationship between eating chips and being overweight is not clear",
  • "your survey is utterly indefensible due to its painfully awful selection bias and leading questions  —your attempt to extrapolate it to any larger population hence so ridiculous you'd fail a GSCE if you sat one this week"

For anyone interested in learning more about this topic, here are some great online books on the topic

No comments: