Uncategorized

Crows can do stats (and so can you)

A lot of people have a visceral, fight or flight reaction to the word “statistics,” only surpassed by their horror of the term “Data Science”. (In hindsight, naming my organisation “The Australian Data Science Education Institute” might have been a tactical error. As a marketing strategy, I might have been better off calling it almost anything else, up to and including “The Australian Institute for Large and Terrifying Spiders.”)

It is a mammoth task to convince the world that statistics aren’t scary, and that we all do stats and data science instinctively as we go about our daily lives. As Greg Jericho said on my podcast, Make Me Data Literate, data science is just “counting things and telling stories.” Something we can all do. New evidence in this challenge is always welcome, so I was very excited to see that researchers have shown that crows actually do stats.

Much like data science, crows have an undeservedly bad reputation. Largely, I suspect, because they appear so intimidating. We have a pair of crows that nest every year in a gum tree outside our house, and watching the fluffy baby crows learning to fly has been so charmingly entertaining that any negative stereotypes we might have harboured about these large, glossy, black birds have been forcefully dispelled.

Two glossy black crows on green grass. The one on the left is standing, looking slightly startled by their friend, who appears to have fallen over.
Image Credit: Nick Falkner

They are smart, vocal, and endlessly entertaining. Devoted parents and loyal friends, they are adept at recognising people, and they can even share this information with other crows. Carl T. Bergstrom, co-author with Jevin West of Calling Bullshit: The Art of Skepticism in a Data-Driven World describes how you can befriend them (though he cautions, wisely, against trying to assemble your own crow army). (Read the book, btw, it is awesome.)

When we think of statistics, it has a nasty tendency to stir up school based maths trauma, which can make us feel dumb and slow. It conjures the spectre of monstrously complicated equations, low marks on maths tests, and ideas we can’t even imagine being able to understand.

And yet, like crows, we do stats every day. We calculate the likelihood of the traffic being bad, of making it to the bus stop on time, of someone misinterpreting the wording of our email. We go to the supermarket that’s more likely to have our favourite biscuits in stock, and we choose the best day to do the weekly shopping, based on a whole host of factors, including how busy it’s likely to be, whether our favourite Smokehouse pizza bases were delivered recently, and whether we’ll be able to get a carpark. That’s some very impressive statistical inference that we do without even being aware we are doing maths, let alone stats.

In my Data Science courses, I often recommend using super simple techniques like finding the highest and lowest values, calculating the average, getting Excel, Google Sheets, or your software of choice to create a simple graph to help you wrap your head around the data. It’s incredible how much information you can extract – how many stories you can tell – with a dataset just using these very simple techniques.

In actual fact, many of the things you need to do to understand data aren’t maths at all. First, you need to figure out what’s wrong with your data. No real data is perfect, there are always flaws, including measurement errors, people telling you the wrong thing, numbers that have a whole heap of decimal places after them that they do not really deserve, etc. Sometimes the data isn’t really the data you want, but it’s the closest you can get. This is proxy data. For example, an exam result does not tell you how much a student has learned and can do. It tells you how well that student did on the exam. We hope it’s close, but it’s not actually the same thing.

Then you need to figure out what the data means. What does each row or column of a spreadsheet represent? How do the columns relate to each other? For example, if you have a seagrass dataset, the depth of water that particular species of seagrass are found in can tell you a lot about the water quality. If a deep water species is found in shallow water, it probably means the water is murky. For this part you need to understand where the data came from, how it was collected, and something about the topic the data covers (for example, you need to know which species of seagrass normally grows in deep water). None of this is maths.

Then you need to figure out what stories the data can tell. A low level of bacteria in your water supply does not tell you that the water is safe to drink (there might be chemicals, or parasites, or other nasties in there that are not bacteria). It’s important to be clear about the story your data tells.

None of these things are particularly difficult, though they can seem daunting at first. Like all skills, they’re not magic, or unattainable. They just require practice. You have the skills to do data science. If a crow can do stats, so can you. Have a little faith in yourself.

Leave a Reply