Never trust a statistic that you

haven’t visualized yourself.

It’s election time in Germany and, as usual, there are tons of opinion polls telling us who is going to win the election anyway. It is debatable whether or not pre-election polls are healthy for our democracy in general, but at least everybody agrees that the polls should be kind of neutral. And if they are not, the institutes publishing the polls should be blamed publicly.

But how do we know if an institute publishes ‘biased’ polls? You guessed it: with data. More precisely: with data and the unique power of data visualization.

## You don’t know it until you see it

So I got the polling data of the last 14 years from wahlrecht.de and used Python and Dataset to convert the data into a R-friendly structure, with one number per row, ending up with 27.1k rows in total. Then I put all my R skills together to create this nice colored scatterplot showing all the polls, on all the political parties in Germany.

Too many numbers for one chart, so let’s focus. But unlike most other charts I’ve seen from this data (exception: SZ), we’re not going to focus on just one poll institute, but on one party. Here are all the polls published for the Social Democratic Party.

This chart alone is pretty interesting, as it demonstrates how the poll trends change dramatically *after* each election. But we’ll get back to that later. At first I added a global trend line for the polls, which is the simply median value of each polls within one quarter. The median values are centered in each quarter, e.g. the median of Q2 is displayed on Mai 15th instead of April 1st. The resulting line nicely follows the polls:

## Blaming the polling institutes

Starting from this nice overview let’s locate the individual polling institutes. In the next views, the big red points represent the focussed institute (mentioned in the title) while the small gray dots represent all the other institutes. The line is the same quarter median trend as shown above. The first plot shows the polls published by TNS Emnid. As you can see, the values are pretty much centered around the median, which is what you would expect from an un-biased poll.

Unfortunately, not all institutes are following this example. The next plot shows Infratest dimap polls, and you can see that since around 2007 there seems to be a slight bias towards the SPD as the majority of values are on or above the median line, but rarely below it.

The third example shows an even worse picture. Since around the same time, the majority of polls published by Forsa are below the median of the polls by all institutes. So while Infratest polls seems to be a bit in favor to the Social Democrats, Forsa is clearly disadvantaging them.

## A closer look using the median-difference plot

So while the above plots look dramatical on their own, I thought it’d be a good idea to take a closer look at the bias using a median-difference plot. These kind of plot goes back to John W. Tukey (who, among many other things, invented the box-plot), and his mean-difference plot. The idea is simple: instead of plotting both the median and the individual measures we use the median as base line and just look at the differences from the median.

So in the following plot, a value of -5% means that the predicted result was five percent below all the available polls of that quarter. Again we see the shift from almost no bias prior to the election in 2005 (until which the Social Democrats were leading the government) to a clear bias against the party afterwards. There are a few positive outliers, but the vast majority is negative.

To get a even more clear view on this we can also take the median trend of the median-differences (now it gets weird, I know ;-) ). The shows us the **median bias** the Forsa polls had towards the SPD in each quarter. It’s remarkable to see the constant median bias around -2% over a period of six years.

## Random error vs. systematic errors

Let’s stop here for a moment and think about the implications of this. The research institute Forsa publishes polls in which the Social Democratic Party over a period of six years is predicted on average two percent below the median of all the polls published in that period. This is hard to believe given that the institutes are all claiming to interview a *representative* group of the population. Even if we fully acknowledge that the polls are producing some error, there can’t be this kind of bias. Unless there is a **systematic error** in the polls, an error that occurs again and again unless they fix the system.

And there are several ways how this error can be produced, intentionally or by accident. For one it might be that the group interviewed by Forsa is not representative of the full population because they have a different definition of what representative actually means (otherwise the error would be random). Another cause for the systematic error could be the actual wording of the questions in the interview, or the context in which the question has been asked (more often than not the institutes combine the poll with several other questions).

Unfortunately we cannot analyze how these errors are being produced because the institutes never publish any raw data. To lower the costs, the control sample is too small to produce more precise predictions – around 1,000 to 2,000 people are interviewed to make assumption about ~60 mio eligible voters. Therefor there is too much noise in the raw results to publish them without further ‘correction’.

And this is the third possible cause for the systematic error: the correction algorithm each polling institute uses to ‘smooth’ the data and minimize the noise. And, of course, these algorithms are kept private as business secret.

## More examples, please

The examples shown above are not the only ones. Here’s the systematic error of another polling institute, Forschungsgruppe Wahlen (FGW), who is creating polls by order of a public TV broadcasting. Between 2000 and today, the FGW polls were massively biased towards the big parties the Chrisian Democratic Union (CDU) and the SPD. As the median-difference median plot (2nd image) shows, the average bias(!) towards the CDU had stayed around **+5%** for almost four years.

Over the same period, the FGW had systematically ‘underrated’ smaller parties, and among them especially the left-wing opposition party DIE LINKE (LEFT).

Between 2005 and 2009, the average bias for the LEFT party had been around -4%. What’s interesting here is to see how FGW had suddenly ‘corrected’ their prediction after the 2009 elections, in which the LEFT party had gained about 12%.

And the election result of the LEFT party didn’t came as a big surprise to other institutes. Here is again Emnid as a example of a lower error. However, in this case the error is so low, that I wouldn’t be surprise if Emnid would adjusting their predictions towards the average of the polls published by other institutes. At least this would explain why they are that close to the average.

## How to reproduce the plots / analyze your own data

As mentioned in the headline of this post, all the plots and analysis are done with the fabulous, and free statistical analysis tool R. For your convenience I prepared a Github repository with all the scripts I used to generate the plots. The repository also includes the German polling data, so you can start analyzing right away. The API hopefully explains itself by checking out the main script.

(If you never used R before but want to start right now, go download and install R-Studio, a convenient GUI for R, get and extract the ZIP archive from Github and double-click polls.Rproj)

Hope you enjoyed the post. Feel free to share it like crazy and drop comments in the form below or on Twitter.

And: Never trust a statistic that you haven’t visualized yourself.

### Update (Sept 12, 16:00)

Improved the way the quarterly median is computed. In the first version I had simply picked the median of all polls for one party within a quarter. This however, is only the correct median if all polling institutes publish the same amount of polls per quarter (which is not the case). So to improve this I now compute the quarterly median for each institute first, and then take the mean of them afterwards. This ensures that each institute has the same ‘weight’ in the median computation, no matter how many polls they published.

The images in this post have been updated accordingly. Thanks to Github can see a side by side comparison of the old and new images. The difference is only marginal, but I still think its more correct.

Hi, I’m a French student working on the Ishihara test. I would like to know how you created the image with the different couloured circles and how long it took.

Thank you for your time.

Pingback: Mid October Article Wave | KevRichard

Pingback: Data Viz News [26] - Blue-Orange

Pingback: Data Viz News [26] | Visual Loop

Excellent work. Since May this year I have been doing something similar: trying to gather data and figure out bias of institutes polling elections in Poland.

Pingback: » Wybory w Niemczech, sondaże i ,,obciążenie” ośrodków badania opinii SmarterPoland

Pingback: Visualisierungen & Daten zur Bundestagswahl 2013

Pingback: Somewhere else, part 77 | Freakonometrics

@Hoffmann:

Thanks for leaving your thoughts. As I published the data and the R code, I would be more than interested to see your analysis of the data. Please share your findings with me :-)

So what you see are visual indicators. But as you already have data in R you should run some significance tests so you don’t need not rely on your eyes.

My second point was in a way already statet. At some point your quarterly median becomes the truth and hence the differences become errors. That is imho an politically biased overinterpretation of the data you should at most mention in your conclusion.

Further more I don’t know if knowing, that the institutes prediction algorithms are tendiciose in one direction is not better than just knowing, that thay may be far off “the real value”.

I would even go as far as saying a “real value” of whatever they try to meassure is as existent as an answere to the question of survival of that schroedinger guy’s lolcat.

Btw: a moving window is another fine aproach for a median/mean range (e.g. one month back and forth).

Pingback: Analyzing bias in election polls with R | carsten.io

Sorry, my description about the median-median-diff plot was a little misleading: my fault. What I am actually doing is to compute the median of all institutes per quarter and the median of just one institute per quarter (both filtered on one party), and then I take the difference of both.

You can check the R code here

Okay, I try to replicate this – maybe even in R :)

(As a first guess, when you “median”(quartlery) over the median-differences, this doesn´t solve the problem, because the deviations using the quarterly method (i.e. the squared sum of all residuals from the median line; first plot) are higher than the residuals using the weekly or monthly method –> Thus, even if you “median” over these derivations (quarterly median of median differences , line in the second plot), they are higher than with the weekly method)

One could of course easily compare both method within a single median-median-difference plot (including your new method computing the median) ^^(which will very likely proove that I´m wrong!)

@dvorak: Yes, that’s a valid point. But this is only true for the comparison of single values and the mean, as shown in this plot:

But the comparison of the quarterly median of all institutes with the quarterly median of just one institute removes the effect off different time frames.

Yes, I think it might be a pretty good estimator – and again, I really enjoyed your post on this (and providing the data, of course!). However, checking the results in Python and Pandas (i really like ggplot, but handling data in R is somewhat strange for me, once your are used to python..), computing the median within quarters may over-estimate the effect –> Choosing a smaller timeframe shows less deviations. I think it´s somehow inappropriate to compare daily data with a quarterly median). Here is a quick & dirty plot of the effect: https://dl.dropboxusercontent.com/u/20490817/rollineffects.png

@Susanna14:

Someone on Twitter just pointed me to an article that compared predictions with actual election results (in German):

http://democracy.blog.wzb.eu/2013/09/20/umfrageergebnisse-sind-keine-wahlergebnisse-aber-doch-gute-prognosen/

@Dvorak:

Yes, that’s a limitation of not having the all the raw data. My assumption, however, was that given that all institutes try to measure the same opinions using samples of the same population, they must, looking over several years of data, somehow vary around the ‘real’ opinions, which is approximated by the quarterly median. An institute which is above the median over a long period must have some kind of systematic errors incorporated in their polls.

@Lightkey:

Thanks for adding this contextual information. Very interesting, and also this is a strong evidence for biased predictions, isn’t it?

Thanks for pointing to the raw FGW data, will check it out..

Some info missing is that Forsa had been accused many years ago of being

*pro*SPD and that the TV station that Forschungsgruppe Wahlen (FGW) does their polls for is the Zweites Deutsches Fernsehen (ZDF), itself tied heavily with the CDU.One thing I would like to see is the raw data of the “polls” (a misnomer, it should be called predictions) that FGW also publishes, you have to give them credit for that, because if you think their predictions are biased towards black, then their actual polls should look hilarious!

As mentioned by Susanna14, why do you expect the median to be a good or even unbiased estimator for the true results?

This has to be proven. To my understanding, the assumptions that the median is an unbiased estimator is somewhat a circular argument: You try to prove bias/show deviation based upon biased data (and I´m not sure wether taking the median of biased polls will reveal a true, unbiased value).

IMO, it´s misleading to interpret the mean-difference plots as a visualization or bias, but instead it´s a visualization of variance between each institute. Correct me, if i´m wrong!

Besides that: Amazing work and thx for providing the data!

There are many people in germany, that think exactly this way. And I hope, we are smart enough to choose the right partei without listening to all these lieing voices. There is one really interesting fact, in my beloved country germany. Its really good, that most of the discusting things from the nazi-time are in the light. But its a really a shame, how ignorant we went on with business as usual after the DDR. It would be really interessting, if someone could display all the strange statistic numbers, which show that the stasi wasnt really stopped after 1989. Its so crazy, that nobody thinks it could be possible. Only few people write about this ugly fact. Your visualization would be much more interesting, if it would show the difference between before 1989 and after 1989. And last but nor least. We need this in german language for very much more people to read! Thank You very much for your engagement.

Pingback: Visualisierungen & Daten zur Bundestagswahl 2013 | Datenjournalist

Have you considered comparing the poll results not to the median, but to the real election results?