Axes of awful

Dr Linda McIver

2 years ago

I’ve ranted before about the importance of the y axis on graphs starting at 0. Most software autoscales graphs so that the range on the y axis is just slightly larger than the range of the data, which makes graphs difficult to compare, and often wildly over-emphasises the scale of change. You can see a classic example in the post linked above.

But. It turns out that, just sometimes, using the range of the data as the scale allows you to see change that would otherwise be very difficult to see. This is particularly important for climate and weather data, because small changes can be incredibly significant.

Consider that the world is heading for (or possibly has already exceeded) a warming of 1.5 degrees, which is quite catastrophic, and we desperately need to avoid hitting 2 degrees of warming. That 0.5 is deeply significant. But when you apply that to local maximum temperatures, it can be quite difficult to see. It also makes no sense, really, to apply climate change numbers to local weather, in part because climate and weather are two different things, and also because climate change can take temperatures down as well as up. But sometimes you want to see what’s going on in your own hood. It makes it more personal.

To that end, I grabbed the temperature data for Melbourne from the Bureau of Meteorology’s brilliant weather station data download page, and set to work. The page lets you download monthly or daily data, but I wanted to play with some code, so I got daily maximum temperatures. There were lots of stations to choose from.

I wanted a station with a good long recording period, so I went with Essendon Airport. It turns out this dataset is missing some years from 1972-2003, for some reason, but it was plenty to work with.

First I wrote a program in Python to just give me temperatures for December. You can do this in a spreadsheet, but I know Python better than I know spreadsheets, so it was easier for me to do it this way. It’s important to remember that there is no “best way to do things” and no points for using one system over another. Whatever works for you and your students is the right way to do it!

I then wrote another script to give me a file containing the average December maximum for each year in the dataset. I opened that file in Excel and graphed it, which gave me this. Note that I had to faff about quite a lot to take it from the default graph choice to a graph that made some kind of sense!

Once I added a trend line, you could see that the average temperature was rising overall, but with a y axis range from 0 to 35, it wasn’t easy to see how large the change was. This is a situation where not starting the y axis from zero makes sense – to zoom in on the change and get a better sense for how large it is.

So I changed the minimum value on the graph to 20, which gives us this graph.

Now the trend line is much easier to read. It clearly starts just below 24, and continues to a little over 25. That’s a change of just over one degree from 1939 to 2023. Now I’m curious to try it for other weather stations, to see if the trend holds right across Melbourne, and Australia. I love the way finding something out with one dataset raises more questions, and more opportunities to explore!

This is another example of how there are no absolute rules in data science (except for: there’s no such thing as a perfect dataset – that one holds inviolable!). Everything is context. The y-axis not starting at 0 is sometimes ok. Pie charts are sometimes a great way to compare values. A line graph is sometimes useful for discrete data.

There are no hard and fast rules in data science. Anyone who says otherwise is selling something.