Webopedia on Google+Webopedia on TwitterWebopedia on FacebookTech Bytes Blog
Main » Blog »

The Problem with Statistics

Poll Averages"95% of statistics are made up on the spot." – popular statistics quote

Here in Canada our upcoming federal election is sparking a veritable barrage of polls and statistics. Often times the figures just don't seem to agree. What are we to make of all this? How do we wade through all of this information to glean any meaningful insights? Moreover, how does one evaluate which studies are most accurate, or more to the point, truthful?  Now that the information age is truly upon us, it's perhaps time to re-evaluate the purpose of statistics in our daily lives and what they mean. All-too-often, it's not what you think. 

Today, I'd like to run through a few of the more troubling aspects of statistics and how these may be used to advance an agenda or skew the facts to someone's favor.

Epidemic or Random Clustering?

Examine the location of the Aces in a recently shuffled deck of cards.  Are they spaced a similar number of cards apart or are they all within close proximity of each other? Say that the four Aces were all within ten cards. Would you simply accept that it’s a perfectly normal consequence of randomization or would you suspect that cheating might be involved? It might depend on how the cards were shuffled. You would probably want to observe the person or machine doing the shuffling.

The same concept becomes much more volatile when people’s lives are involved. Every year, we hear about small towns that have an unusually high rate of disease such as cancer. Should a smoking gun – like a chemical plant – be found nearby, and you have the makings of a sensational story.

In truth, determining the presence of a disease cluster is quite difficult.  In many reported cases, the clustering is due to the "bull's-eye effect", which is something akin to drawing a target on the wall after the darts have been thrown.  So next time the media reports a cluster scare, don't jump to conclusions until a thorough investigation has been performed by the relevant authorities.

Why Average Describes both Everyone and No One

I used to wonder why life expectancies were so much shorter historically than they are now, even though I have read so much about old people in books and seen them portrayed on the screen. Were these people exceptionally lucky to have lived to a ripe old age?

The answer lies in how averages are calculated: you add up a collection of values and divide by the number of values. Hence, if you had ten people and they lived to 69, 90, 89, 45, 78, 67, 79, 82, 76, 64 their average lifespan would be equal to 73.9 years. That's

(69 + 90 + 89 + 45 + 78 + 67 + 79 + 82 + 76 + 64) / 10 = 73.9

That works very well, so what’s the problem?

The flaw becomes apparent when we use an example that factors in the higher rates of infant mortality. For instance, during the Middle Ages, infant mortality was likely to be around 30%, and may even have been as high as 50%! [1]

Let's try that again with some much lower numbers - 69, 90, 1, 45, 78, 4, 79, 5, 76, 64.

That gives us an average of 51.1 years:

(69 + 90 + 1 + 45 + 78 + 4 + 79 + 5 + 76 + 64) / 10 = 51.1

So who in the above group expired at 51 years old? Did anyone even live close to that length? The closest match is entry four at 45 years. That is why averages often describe everyone and no one at the same time. Without knowing how many outliers may have been included in the calculation, it's impossible to know how typical the resulting figure may be of real word outcomes.

Sampling Gone Bad

Getting back to the polling mentioned in the introduction of this article, for economic and logistical purposes, it's simply not feasible to ask every adult in the country who he or she will be voting for. Instead, data is obtained by taking a sample from a larger group that hopefully has the same characteristics as the larger group. For example, if pollsters were to ask 100 people who they are going to vote for in the next election, and 45 of them say they will vote for Johnson, we might extrapolate that about 45% of all the voters will vote for Johnson. 

Sampling provides many benefits, but it's not without some important limitations.

Database JournalThe first issue is that the pollsters could have just happened to talk to an unusually large percentage of Johnson supporters by blind luck. This is the problem of sample size. The smaller the sample, the greater the influence of luck on the results we get.  For a population of millions, a sample of one hundred participants is far too little.

Another issue is that the way the people in the sample were picked might preclude a certain result. If Johnson supports spending a lot of money on the arts, and the pollsters approach people attending a free concert, we might find an atypically high percentage of Johnson supporters. On the other hand, if those same pollsters were to sample people at a bar frequented by mostly single people, they might find a much lower percentage of Johnson supporters. In either case, the results will be unreliable.

Finally, another type of bias results when the people being sampled are free to choose whether or not to respond. A radio talk show might ask people to call in and vote on some issue. If the issue is especially contentious, people may be more likely to vote one way than the other.  For example, if people were to vote about placing a new land fill in their neighborhood, chances are good that most people who called in would be opposed. This tendency is known as response bias.

Conclusion

Recognizing that statistics people present to us are frequently flawed doesn't imply that statistics are useless. On the contrary, statistics by-and-large offer excellent evidence, and are often the easiest and most concise way to express it. Just be aware that the burden to examine the figures for relevance, validity and authority fall squarely on your shoulders. Conversely, taking statistics at face value might lead you to draw erroneous conclusions.

[1] The Medieval Child, Part 3: Surviving Infancy, Page Two

Rob Gravelle
Rob Gravelle resides in Ottawa, Canada, and is the founder of GravelleConsulting.com. Rob has built systems for Intelligence-related organizations such as Canada Border Services, CSIS as well as for numerous commercial businesses. In his spare time, Rob has become an accomplished guitar player, and has released several CDs. His former band, Ivory Knight, was rated as one Canada's top hard rock and metal groups by Brave Words magazine (issue #92).


davidRees said on January 25, 2016 20:06 PM PST

Will pay attention in future to what he writes. Many are afraid to disagree with stats. All the polls were wrong the the Canada elections after his article was written proving him right.! Someone once said something like this:there are lies, damn lies, & statistics! davidRees

Holly said on January 21, 2016 16:05 PM PST

EXCELLENT article! Here in the States, they are constantly throwing stats at you like they are facts... almost everywhere you go! Stereotyping/Racism is rampant from continued stirred up statistics... which are just a SAMPLING of the full data! (much like a CD is of a full analog song.) Music/TV/Movies, Commercials/Ads, Every form of media, Sportscasting, and especially Politics!! The media is constantly trying to scare us with the statistics they give us. The different political sides try to control us through stats. And I swear if I could get rid of sportscasting, I would be insanely happy! *LAUGH* How many times someone made that shot or kick before tonight has NO bearing on whether they will make it tonight! (unless they've never been able to make it at all due to lack of ability)

Lynda Bundy said on January 20, 2016 14:44 PM PST

This is a very interesting article! For those further interested, I highly recommend a book titled: "How to Lie with Statistics". This book also inspired another interesting book: "How to Lie with Maps" -- since each map is a "so-called lie" -- or rather, misrepresentation; since a true "globe" is always "misrepresented" or "distorted", in some capacity--when presented on a flat surface.

John said on November 04, 2015 10:54 AM PST

Nice article, wish more people would look at statistics this logically.

Jordan said on October 19, 2015 09:02 AM PDT

Interesting points here. I do believe people try to rely too much on statistics rather than looking at the whole situation. Thanks for sharing your insight on this!

Make a comment






    (Maximum characters: 1200). You have characters left.


     


    LATEST ARTICLES
    Slideshow: 5 Hot Holiday Gifts for Tech Enthusiasts

    From cute electronic toys to VR gaming, here are 5 hot gifts to give to your special tech enthusiast this holiday season. Read More »

    What's Hot in Tech: AI Tops the List

    Like everything in technology, AI touches on so many other trends, like self-driving cars and automation, and Big Data and the Internet of Things... Read More »

    DevOp's Role in Application Security

    As organizations rush to release new applications, security appears to be getting short shrift. DevSecOps is a new approach that holds promise. Read More »

    STUDY GUIDES
    Java Basics, Part 1

    Java is a high-level programming language. This guide describes the basics of Java, providing an overview of syntax, variables, data types and... Read More »

    Java Basics, Part 2

    This second Study Guide describes the basics of Java, providing an overview of operators, modifiers and control Structures. Read More »

    The 7 Layers of the OSI Model

    The Open System Interconnection (OSI) model defines a networking framework to implement protocols in seven layers. Use this handy guide to compare... Read More »