It’s easy to get caught up in highly technical aspects of Data Science. To focus on complex numeric analysis using programming languages like Python or R, and think of outputs like fantabulous heatmaps and stunning geospatial visualisations.
But an article I saw in The Age today highlighted some of the deceptive data practices we see every day. Some of them are wholeheartedly deliberate, designed to mislead us and persuade us of untruths. Some, like this one I suspect, are purely accidental. But the journalist who wrote this article should never have let it stand, and the readers need to be able to think critically about what these numbers mean. Read the paragraph below for a moment.
Can you see why it got my hackles up?
“From a 2-million-ton butter stockpile… dwindled to less than 12 days’ supply.”
So hands up if you know how many tons of butter constitutes a days’ supply?
Are you actually able to compare those two figures without further research? I certainly couldn’t. As it happens, I did further research and found a web page that puts global butter consumption at 8,000,000 tons annually. I have no idea how valid that webpage is, but let’s roll with it for a moment. To compare the figures we divide 8 million by 365 to get a daily figure, and then multiply by 12 to get back to tons per 12 days. I get 263013 tons (and some assorted decimal places which I am going to wickedly ignore for now, but that will be a whole other blog post).
By dividing 263,000 by 2,000,000 we find that it’s roughly 13% of the stockpile we had before. Which is, it’s true, a significant decline. But now we can see how much of a decline it is, which trying to compare 2 million tons with 12 days’ supply made impossible. Even better, let’s represent it visually:
This is just the default graph from Google sheets, but it conveys the size difference quite effectively. (There are, of course, plenty of ways we could improve the graph, but one snark at a time, ok?)
It wasn’t hard maths. Or a challenging graphing exercise. Or even a tricky research problem – although, as noted, I have no idea how valid the figure of 8 Millions tons per year is, or indeed what year the measurements were taken (although the graph on that page implies either 2004 or 2009, but from the information available on that page I can’t be sure about the aggregate figures). But then I don’t know how valid the figures are in the original article (and, to be honest, I’m just not that excited about butter consumption, unless it’s on my own toast, and if we’re measuring that in tons then I probably have a problem).
The point is that if you’re going to use figures to support your argument, and you’re going to compare them by saying they “dwindled” from one value to another, it’s not rocket science to make those figures easily comparable.
This is one of the reasons Data Science needs to be core in schools. To make sure that when we present our own data, we present it in a way that’s both valid, and easy to interpret. And to ensure that when others show us data, we can analyse it critically, and call it out when it doesn’t make sense. Whether it’s by mistake, or by design.