Graphs are fabulous tools for helping understand your own data, and for telling the story of your data to others. Unfortunately, like any means of communication, graphs can be used to tell stories that are… well… closer to fiction than we’d like.

Take this graph I spotted when Greg Jericho shared it to twitter (If you haven’t checked out Greg’s episode of Make Me Data Literate, get onto it, it’s amazing!). (thanks Greg!)

Greg described it as possibly “the worst graph crime I have seen in a few years”.

The lines look so similar that it’s hard to resist the assumption that they are related, or correlated, to use the technical term. The original poster used this graph to tell the story “renewables cause higher electricity prices.” The trouble is that graphs and data can tell you what. They can’t tell you why or how. That’s a much longer, more complicated story. I’ll write a post about correlation and causation soon, but when I looked at the graph more closely I noticed several things.

Firstly, there are two different y axes – which makes sense, because the two sets of values are measured differently. One is a percentage, the other is… actually we don’t know what it is, presumably it represents money, but whether it’s the average price per kilowatt hour, the cost of a typical bill, or what, we don’t know. Leaving that aside for a moment, the really sneaky bit is that neither graph starts at zero. Let’s try a couple of different versions of this graph. First, what happens if you put them on the same axis?

Doesn’t look remotely the same, does it? But maybe it’s not fair to graph them against the same axis, because it does rather flatten the blue, renewables line, and they’re not actually the same kind of values. So, ok, let’s try giving them their own scales back, but starting them both at 0. Any vertical axis that doesn’t start from 0 should be examined very closely indeed.

Notice how the lines start at the same point in the original published version (right), and finish at the same point, too. But in my graph, on the left, starting at 0 in both of the y axes makes them look quite separate.

Now let’s look at the last point on both graphs. Notice it’s 2023/24, which we don’t have prices for yet, so it’s labelled “Projection” on the original. That means it’s a guess. Maybe an educated guess, but still a guess. So let’s take out the last point on both graphs.

Doesn’t look nearly so correlated now, does it? The graphs certainly both trend upwards, but they’re hardly closely intertwined the way they looked on the first graph.

As for trending upwards. Yes, prices are trending upwards. So are renewable energy installations. Maybe rising prices actually encourage more people to install renewable energy sources like solar panels and windmills. Maybe rising electricity costs and renewable energy are both correlated with climate change. Maybe prices are rising because of inflation. All three are almost certainly true. But there may be other reasons, too. Remember that graphs can tell you what, but they can’t tell you why.

The other big issue, of course, is that these are not the same kinds of values. The red line represents electricity prices, the blue represents renewable share, which is presumably the percentage of renewable energy in the total electricity consumption in that year. Anyone who puts two wildly unrelated values on the same graph, whether the indices are the same or not, is, as the Dread Pirate Roberts might say, selling something. It’s a dubious move at the best of times. It doesn’t make sense to compare prices with percentages. It doesn’t give you a meaningful story. And graphs are supposed to tell you a meaningful and valid story. As in this case, they often don’t!

Now, let’s look at one more part of the crime: the graph title. “Impact of Renewable Energy on Australian Electricity Consumers.” Is that what the graph shows? Even if you buy the story the author presumably hopes this graph is telling, we don’t see impact. We see prices. I’ll admit I’m a bit of a pedant for correct labelling of graphs, but prices are not an impact, they are simply prices. You could try to tell a story of impact, perhaps, by graphing the percentage energy prices make up of total household budget – but I suspect that would not tell the story that particular source was hoping for.

Dissecting a graph like this takes a little practice, but once you get into the habit it’s difficult *not* to do. We can all ask these tricky questions about what graphs are trying to tell us, and why. It’s worth investigating what other stories the data could tell, given the opportunity. Because as Greg said on Make Me Data Literate, data is as simple as counting things and telling stories. How great would it be if we could make sure those stories were true??

PS please leave a comment if you can spot more crimes in this graph!

Is this the correct link to Greg Jericho’s item: https://adsei.org/podcast/greg-jericho-on-communicating-with-data/

good catch, thank you! Fixed.

this xkcd always plays through my head (buy the t-shirt!) when I see someone do this

https://xkcd.com/552/

also this is worth a browse

https://tylervigen.com/spurious-correlations

I love the lemons imported from mexico and US highway safety correlation

I love that site. I even refer to it in my book. 🙂 And that xkcd is so perfect.