This morning I read an article on the latest covid wave on the Australian ABC news site.
Although our tracking of covid waves is now all but non existent, since we’re not collecting case data, there are some numbers, presumably gathered from hospitals. It’s hard to figure out exactly where the data used in the article comes from, since it’s not on the page the article links to, nor is it on any of the pages those pages link to, but assuming it’s real – or as real as it can be under the circumstances – let’s look at the story it’s telling.
Here is a graph from that article. It’s a graph of National Covid-19 cases on a 7 day rolling average from August to October 2023. It looks scary, with a sharp upwards jump at the end. What’s wrong with this graph?
If you said “The Y axis doesn’t start at 0”, you are correct. Why does that matter? Let’s see…
The ABC helpfully provided a “get the data” link, which gives you the numbers they used to construct this graph. Here’s what happens if you make the same graph with the Y axis starting from 0.
See how much less dramatic the uptick at the end is? That’s because starting from 0 gives a much better sense of scale. The first graph uses slightly more than the range of the data for the Y axis, so it starts at 550 and goes to 950. This, by the way, is the default in a lot of graphing software, including Excel. The second graph starts at 0 and goes to 1000.
Let’s put them close together to make it easier to see. (I’ve also made them smaller so you can see them both at the same time on a phone screen.)
I’m certainly not saying we don’t have a covid problem. Given the data that we’re NOT collecting (and check out this podcast episode with Margaret Hellard and Richard Denniss for more on that), we almost certainly have a much bigger problem than this graph shows. However, this is a nice clear illustration of how not including a zero on the scale gives a very distorted impression of the story.
It’s one of the first things many of the guests on Make Me Data Literate say, when they look at graphs in the media. Where’s the zero? What story is this graph trying to tell, and is it the truth?
This is a key part of data literacy. How accurately does this graph represent the data story? In this case, not very accurately at all!