Polling Data Tables 101: Weights!
And a demonstration of what sample weighting is using my very poor MSPaint skills!
The Smokeless Room is a newsletter by Rushaa Louise Hamid designed to help you clear the smoke from the air and better understand the tools of decision makers, with a special focus on all you need to know to sift the bunk from the gold of opinion surveys!
One of the first parts you’ll see when you open polling data tables should be two rows near the top that look something like this:
These two numbers might be the same or different depending on what column they are in but understanding the difference is key.
Unweighted means how many people were actually asked the question. There will be different numbers in the different columns to show how many of each group of people were asked. Normally the columns start with the total number of people, then go into sex, age, region (these are called crossbreaks and we’ll return to them next week).
Weighted means how many people the polls treats the question as being asked to once weighting is factored in. All poll figures come from the weighted, not the unweighted figures (unless for a particular reason there is no weighted figure).
Weighting? That sounds complicated.
Let’s imagine the below figures represent a population and their views on apples. So 3 blue and 2 red people, with 3 happy and 2 sad about apples.
Now with polling we are limited in who we can ask - after all we can’t realistically ask everyone in a country their thoughts. What we can do instead is get a reasonable size chunk of people (for most polls this will be between 1-2 thousand people) and try to make sure it is similar to the overall population.
In the above case we know 40% of our population is red and 60% is blue, so when we find that 1000 people we’ll try and make sure that 40% is red and 60% is blue. If we do it correctly we should find out that around 60% of people are happy with apples, give or take the margin of error.
Let’s say though we only asked three people (a terrible idea in polling but far more useful to explain it).
So here we have about 33% red, and about 66% blue give or take - not bad but not ideal either. Currently we have too many blue people’s opinions and too few red people’s opinions.
This is where weighting comes in.
Weighting is essentially the process of assigning a value to the opinion of each person we ask so that we match the overall opinion better to the target population.
We want our answer to “Do you like apples?” to reflect the fact that 40% of our population is red. To do this we will value our red person’s views a bit more (around 1.21x rather than just 1) than each blue person (around 0.9x). Our overall figure will still be 3 opinions, but we’ve tried to make it so it’s as if we’ve asked the right amount of people from each group.
In our data tables it would look something like this:
This means when we produce our figures we’ll say about 70% of the population like apples, instead of saying 66% as would have happened before.
But wait, isn’t that is even more wrong than before weighting?
Well yes, it is.
Here’s the thing - weighting can’t fix everything, it’s just something to tidy up at the end. And one of the biggest flags for a poll that is likely to have mistakes is one that has really big weights.
Remember how I said asking 3 people was really bad in polling. If we look closer at our poll, with or without weights we would say it seems like all red people love apples. But we know from our population that actually only 50% do.
We’ve made this mistake because we didn’t ask enough red people in the first place. And the less red people we ask, the bigger the weights need to be to correct for this. The bigger the weights the more likely it is that we enhance errors like this.
It’s why it’s really important that your polling sample is as close as possible to what it is in real life because you’re far less likely to accidentally only get random outliers of the population.
Asking too many people is okay, but ask too few and you start getting absurd situations where you’ve asked 100 people but they’re representating 400 people in your poll - in other words each person’s opinion is value 4x what it should be.
You’ll find some common patterns with this in polling - for instance when looking at polling of BAME people or religious minorities, the unweighted numbers tend to be really, really low. This is sometimes to the point where figures are published in media supposedly representing a group, when less than 50 people were actually asked the question.
Beware though - even if it looks good the weighting could be wrong!
Let’s say though you find that your unweighted and weighted figures look really close. That’s a good sign, however you still need to check what the figures are being weighted to.
There should be somewhere in the polling that says what data they are using to make assumptions of how the overall population looks like.
For instance if we wanted to do a UK wide opinion survey for voting we might use the census initially to weight to.
Here’s the thing though - the ONS census is really good data in the sense that everyone in the country is asked. However it’s also really old (almost 10 years now) so a lot has likely changed in the intervening years. Ideally there will be some amendments to compensate for this - maybe taking figures from the quarterly Labour Force Surveys the ONS do to make a few adjustments, or using population estimates instead.
There’s no perfect solution, and sometimes sticking to the census figures might be right, but it’s important to use your judgement.
Next week we’ll be continuing to look at other parts of polling data tables, focusing on crossbreaks!
As always, if you'd like to drop me a note, you can contact me by replying to this email or over on Twitter at @thesecondrussia.