statistics

Why it is Difficult for People to Understand Statistics

Behavioral Economics | Econometrics


Fundamentally humans like to bring meaning to their life. Without meaning there is too much randomness to make sense of anything. We don’t like randomness, we try to always associate the unknown with something known.

The creation of life and death on earth is the most popular unknown that we encounter. Who created life? Why am I alive? What is the purpose of life? What happens after you die? When you really think about these questions it becomes quite scary. We know nothing about our existence. Yet we go to work each day without a care. Why?

Well, it is because we associate what is unknown (life) with what is known (ideas). We created stories from our thoughts and ideas to help explain who created life, why each individual is alive, the purpose of life, and the afterlife. I am referring to religion. That is all religion is, turning the unknown into the known through thoughts and ideas.

So now when asked ‘what happens after you die?’ You can respond ‘my soul will live on with God in Heaven for eternity.’ That sounds quite comforting, an eternity in heaven and in peace. When in reality the answer to the question is ‘I don’t know.’ That is a terrifying answer to a question about the inevitable. Hence our need to bring meaning to our life. We don’t go about our day being terrified about everything. So we create our own explanation based on thoughts and ideas.

Random Events

This is what statistics is all about, it is the story of random events. We have simple statistics like mean, median, and mode. As well as complex statistics like hypothesis testing and regression models. These are all tools to try and predict the outcome of a random future event.

Like how people use the bible to explain what the purpose of life is. Economists use statistics to explain the outcome of random events. In both cases, we try to predict and explain the unknown. We do this through multiple regression models and different tests. In the same way, there are numerous different religions try to predict the truth about life.

Understanding Statistics

So why don’t humans understand statistics? It is the same reason some people are atheists. Poor assumptions and estimations.

There is a truism in statistics that ‘correlation does imply causation.’ It is a great little reminder that just because two things look related doesn’t mean they are. There are many examples of this like the correlation between shark attacks and ice cream sales or US spending on STEM vs suicides by hanging.

As humans, we see this ‘relationship’ and think there has to be a reason for it. These things are highly correlated but does that mean shark attacks cause ice cream sales to increase? Or ice cream sales cause more sharks to attack? No in reality on hot days people are more likely to go to the ocean and/or buy ice cream. A third variable causes both events. This isn’t always the case and sometimes things just simply happen randomly. And they just happen to correlate. Hence the truism.

But we can’t grasp the idea that sometimes things are random and don’t cause one another because it doesn’t help us understand and simplify our world. In fact, it does the opposite, it makes our world more complex and confusing.

As a species, we try to make our lives as easy as possible. That’s why engineering is such a popular, demanding, and well-paying field. Money = value. Engineers make life easier and an easier life is valuable. We do this in our everyday lives too in a simpler way.

Chunking

There is this idea of chunking which breaks up information and data into chunks to easily remember it. If I was to tell you to remember the number 7918631752. That is a bit difficult. But if using the chunking method: just remember these three numbers 791, 863, and 1752. That becomes a bit more manageable. This is because we aren’t trying to remember 10 numbers but only 3. This is precisely why phone numbers are chunked together like that, it’s easier to remember. A bit unnecessary now that your phone remembers it for you, but still the reason nonetheless.

So going back to statistics when we run tests we run them to see if the results are statistically significant. Quite often they are not statistically significant. So our first question to that result is ‘well then what is it?’ The answer is ‘I don’t know, I just know it’s not what we tested for.’ There is nothing from that test to associate the variation of the dependent variable to. That is key, there might be a reason which would not make the variation not seem random, but we don’t know what it is, yet. Just like there might be a God, but we don’t from ‘which religion he is from.’

We Search for Significance

It is very difficult for people to wrap their heads around insignificance because our biology is designed to find significance. Humans have a significance bias, meaning they have a bias toward significant events. It is beneficial for us to find significance even when there isn’t so we can categorize it in our brains. It might be good for humans but it isn’t good for statistics.

In statistics, there are discrete and continuous data. Discrete is similar to categorical data, it is finite. While continuous data has an infinite number of possible measured values. For some reason humans like associate the data with the wrong group.

Hurricane Season

Hurricanes are quite popular among the southeast states. When they refer to the hurricane it is given a name and a category. The name is to keep track of how many we have had that season and to not confuse hurricanes from previous years. The category is based on the Saffir-Simpson Hurricane Wind Scale and is broken down like so:

  • Category 5: > 156mph
  • Category 4: 130mph – 156mph
  • Category 3: 111mph – 129mph
  • Category 2: 96mph – 110mph
  • Category 1: 74mph – 95mph
  • Tropical Storm: 39mph – 73mph

We have taken continuous data and applied a discrete or categorical filter to it. It doesn’t make much sense. A Hurricane can gain 26mph in and still be a Category 4 but then increase just 1 more mph and now magically it is a Category 5. Apparently, the reason for categories is for first responders to quickly understand the severity of the storm. But if you hear the hurricane has 100mph winds you don’t need to be an expert to know that can do some damage.

This may seem like a nit-picky topic to complain about, the categorization of hurricanes but it can be quite serious. Some insurance companies don’t cover hurricane damages based on the label it is given. So if it is a category 1 vs category 2 could determine whether there can be an insurance claim and the type of coverage. Or damage from a tropical storm of 73mph won’t be covered. But a Cat 1 of 74mph will.

We determine relief from natural disasters based on data that is continuous that has been forced into discrete categories. These are big decisions we are making based on a misunderstanding of statistics. You can see this in many places in our world and has an impact on just about everyone.

Data in Gender

Another example of this is the mischaracterization of data is gender. Gender is binary, male and female. Yet, there is a movement to make the claim that gender is fluid and continuous. But how could that be? What are you measuring that is continuous for it to be fluid? There is nothing, it is once again just a misunderstanding of statistics.

The people who are a part of this movement typically have gender dysphoria. A mental disability that causes them to be confused about their gender. Well, we are rational beings, we don’t like to be confused. Well if someone who has this condition changes the data type of gender from discrete to continuous the confusion goes away. This is why we see people promoting being gender-fluid when it makes no sense. This movement is growing because gender dysphoria is a mental disorder that spreads socially, like anorexia.

We conflate discrete and continuous all too often and it affects people’s daily lives. We need to get back to the basics of statistics. I don’t expect everyone to know how to run a regression model or do A/B testing. But simply to not make statistical claims when you don’t fully understand the concept behind them.

My two main takeaways are:

  1. Not everything has to have a cause, some things are just random.
  2. Don’t conflate data types.

Humans at the core try to make life simple. That is the main reason for finding a cause when there isn’t one. That is the same reason why we mix up data types too. We change the data type to help us explain the unknown world around us when the correct data type leaves us in the unknown. Understanding statistics is about being able to live with the darkness and unknown. Biology tells us not to do that.

Leave a Comment

Your email address will not be published. Required fields are marked *