Polling Data Tables 101: Sampling!
In which there are less pictures and more warnings about not overemphasising a single polling result!
The Smokeless Room is a newsletter by Rushaa Louise Hamid designed to help you clear the smoke from the air and better understand the tools of decision makers, with a special focus on all you need to know to sift the bunk from the gold of opinion surveys!
Sampling - i.e. picking the people you are going to ask questions to - is something that is tied strongly to method. Choosing people to call on the telephone or talking to people on the street are all going to produce particular biases even if you try really hard to avoid that. And results can invariably be manipulated, intentionally or otherwise, by how you choose who to include.
UK wide populations
For big polls of the UK population most polling companies use either telephone or online panels to find their samples as face-to-face polls are quite expensive to run.
To do a gold-standard poll you’d want to be able to randomly pick people from the entire UK population. Unfortunately that is quite hard to do so it doesn’t happen. Instead a lot of polling companies use a quasi-random sampling method called random digit dialling - basically just putting in random phone numbers and seeing who picks up.
The idea behind this is that you’re reaching people who otherwise might not think to do surveys, and reaching them through their landlines (and sometimes mobiles). This is especially good for getting older people who might not be online as much (and is part of the reason that online polls will be more left-leaning than telephone ones).
This is the most used type of sampling, as it’s often how you accomplish online polls.
Most online polls are done through what we call panels. An online panel is basically somewhere that anyone can sign up to and input their demographic information and most provide rewards for doing surveys. YouGov has their own panel, as does Survation and Populus, though there are also other providers that polling companies can get their data from.
In principle you could then send out invitations which would create a perfect sample and you wouldn’t need to do any weighting. Here’s the thing - not everyone you ask to participate will participate, so even online panel polling needs to adjust their invites for those more and less likely to respond.
Sampling online is especially used for fast turnaround polls - if the timeframe is a day or two you’ll tend to also see that there are far less older people than usual, and younger people are more connected and more likely to do the polls quickly. Bear this in mind when looking at snap polling figures as the weights tend to do a lot more work in these polls.
When it comes to smaller populations you can sometimes still use the above sampling methods but it’s often a lot harder to reach people via phone directories or through online panels. If that is the case then sampling requires more thought.
Let’s say you wanted to get a population of gay men to answer your survey and for some reason couldn’t get them via an online panel. You might chose one or more of the following methods:
Go to Soho and talk to people outside bars
Put adverts in the Gay Times
Ask prominent gay men to retweet your request on twitter
Ask people to forward your survey to their friends
These can all lead to a type of sampling error which is entirely avoidable. People in Soho are obviously all either London-based or able to travel in to London so will likely be richer than average. Friend circles and people who follow certain figures on social media will often have similar views. Readers of the Gay Times might have a particular outlook which is why they are subscribers in the first place.
How do you solve it then?
In short you often can’t avoid it entirely, but there should be an outline in the data tables to explain what exactly was done to avoid this, and the aim should be for a big enough sample to help get more variety too. Maybe different local newsletters were used to avoid a London bias. Perhaps how surveys were forwarded on were tracked so that you could see how connected social groups were. If a poll doesn’t have these you need to question how reliable it actually is, or if it’s just a fancy version of a Twitter poll.
One of the biggest things to bear in mind when it comes to sampling is that you’re never going to be perfect. There will always be differences with your target population and your sample, even if you use weights. Normally it will float in the region of + or - 3% but it should always say in the data tables what the anticipated margin of error is.
To demostrate how big an impact this can have on the polling landscape let’s look at the following polls:
These all look very exciting. No doubt on the 14th October the headlines in the newspapers were “Green and Blue Party neck and neck in latest poll”. But if we take into account the margin of error none of these polls are telling us anything different. It could just be normal sample changes.
When it comes to sampling, and the error that comes with it, only when you start seeing sustained changes and ones bigger than the margin of error does it really mean something significant. The biggest takeaway is to not get excited over small incremental gains.
Next week we’ll be looking outwards to other aspects of the polling industry!
As always, if you'd like to drop me a note, you can contact me by replying to this email or over on Twitter at @thesecondrussia.