Polling Data Tables 101: Questions and Scripting!

How to manufacture a story from data, and how to spot that's what is happening

The Smokeless Room is a newsletter by Rushaa Louise Hamid designed to help you clear the smoke from the air and better understand the tools of decision makers, with a special focus on all you need to know to sift the bunk from the gold of opinion surveys!


A lot of people like to say data is made into narrative. The truth is data is already a narrative, a creation of a series of choices a human being - with all the biases that entails - has made. Who you pick, how you ask them, what you ask them are all key in creating an outcome in polling, and no poll is free of these influences.

To get a bit academic the social and political position of the researcher has the potential to shape responses in a variety of ways, conciously and unconciously, as well as the interpretations of those responses. Differences in understanding of supposedly 'common' vocabulary may be missed creating a skewed impression and a risk that interpretations of both data and questions themselves might be affected by personal sentiments and/or experiences of the researcher. For instance “sympathy” for people who go to Syria to join fighters there could mean either to feel sorry for or support - a very, very important difference ignored by a ‘1 in 5 Brit Muslims sympathy for jihadis’ headline by the Sun, putting aside the fact that the question did not even mention ISIS or jihad.

To get a bit less academic:

How to assess scripts

A lot of assessing a script comes down to common sense and digging in the data tables. This is especially key as a lot of media produce a summarised version of a question which might inaccurately capture what was asked. Fortunately here polling companies in the UK publish a list of all questions asked in a poll.

You will also note that in some data tables only the answers to a few questions will be published. This is because polling companies are only required to show the questions that lead up to a question with published results, but don’t have to release figures they don’t want to. It’s normally a good flag that something opposite to the desired result was going on in those questions.

Here are some more common issues with scripts:

Leading questions

To add to the issues shown by leading questions in the video above, throwing back to our very first newsletter we can revisit this terrible question:

It demonstrates a key thing that we’ve seen a lot with second referendum polls - if a question is leading and the answer goes the opposite way it does not mean the answer is extra accurate. In the above example there’s a clear mango bias in the framing of the question, but perhaps people could have been influenced to vote pineapple to spite that. Maybe without the framing mango would have actually won. We don’t know because it’s a bad question - there’s no extra correct figure that this provides to support the pineapple agenda.

Questions that encourage a positive answer.

People like to agree with statements - it is a confusing but true fact, and as a result there’s always going to be a bias towards the “Yes” answer in most questions. As a result good practice is to always remind people they can disagree - “Do you approve or disapprove?” is always better than “Do you approve"?”.

It’s not just obvious questions though. Some questions are skewed towards a positive answer due to social desirability. This factor means that even when people are alone they like to think of themselves in a positive light, and so answer questions according to an ideal rather than reality. It’s where the concept of shy Tories comes from.

As a consquence polls will often have higher rates of intended voter turnout than actual voter turn out - “voting is good!” - that only dampens a bit when people are reminded in the poll itself that a sizeable portion of people do not vote. This is something much harder to correct for but it’s why asking “Are you a racist” is not going to get you accurate results. Far better to ask about past behaviour and statements that aren’t presented as good or bad.

Questions that are confusingly worded.

Sometimes questions are just very badly worded, to the extent that people can come away with radically different ideas of what the question means.

A key example is the Women Ask Questions poll that was conducted by Populus (full tables can be found here). One of the questions asked; “We would now like you to think about a person who was born male and has male genitalia but who identifies as a woman. In your own personal view would you consider this person to be a woman or a man?” - here the question itself is both leading (with the reiteration of male) but also very confusingly worded which can create understandable problems for people answering; some people might read it as about trans women, others about men pretending to be trans women and as a result we can’t really say anything about the figures produced.

As a result for most polls a process called cognitive testing should be used - in other words checking that people from varied backgrounds understand the intent of the question and the meaning of terms. This is important even with questions that seem clear. “Do you think Boris Johnson is a leader?” could be read as either of the following:

  • Do you approve of Boris Johnson as a leader?

  • Do you know Boris Johnson is a leader of a country?

Not flagging a respondents knowledge and bias properly.

Imagine I asked you a bunch of questions about astrophysics and you felt compelled to guess - chances are you’d probably pick randomly but by and large those answers would be useless. Now imagine I asked for your views on Arsenal but you are a West Ham fan - you might have some strong opinions but they’d definitely be impacted by your team allegiance.

In both those scenarios important aspects of your own views weren’t explored first, rendering all the subsequent answers pointless. Important analysing context has been lost.

How you get around this differs depending on the topic, but a good rule of thumb is that non-common knowledge needs to have a simple explanation about the topic before the question, presenting multiple sides if it is a contensious issue. If a question doesn’t have this preamble then chances are a chunk of your survey takers are going to give answers that wouldn’t correspond to their true views.

It’s one of the reasons that opinion polls about politicians should have both the name and image of the person - people can get confused easily and you need to know they know who they’re answering the question about.

Equally polls that would be strongly impacted by a particular view (people who use the local swimming pool might be more in favour of it) need to have questions to account for that which can be provided as a cross-break. A lack of this and you can’t tell how bad the sample skew is.

So what else?

Scripting is hard to get perfect - most polls will have errors and a lot of the time it’s simply about balancing out the negative skews as best you can. In general this is why tracker polls are often favoured - asking the same questions over time can help figure out the general trends even if the results themselves might not be spot on. In other words, check around first to see if your poll is a weird outlier or if there is other context that can help make sense of the figures.


Next week we’ll be continuing to look at other parts of polling data tables, focusing on sampling!

As always, if you'd like to drop me a note, you can contact me by replying to this email or over on Twitter at @thesecondrussia.

Newsletter icon made by Freepik from Flaticon.

Buy Me a Coffee!

Share The Smokeless Room