Uncategorized

Graph crimes. (part 1)

Graphs are fabulous tools for helping understand your own data, and for telling the story of your data to others. Unfortunately, like any means of communication, graphs can be used to tell stories that are… well… closer to fiction than we’d like.

Take this graph I spotted when Greg Jericho shared it to twitter (If you haven’t checked out Greg’s episode of Make Me Data Literate, get onto it, it’s amazing!). (thanks Greg!)

Greg described it as possibly “the worst graph crime I have seen in a few years”.

A line graph with two different scales. There is a blue line representing renewable share of generation, and a red line representing consumer electricity prices. Neither scale starts at zero, and the lines are very close together.

The lines look so similar that it’s hard to resist the assumption that they are related, or correlated, to use the technical term. The original poster used this graph to tell the story “renewables cause higher electricity prices.” The trouble is that graphs and data can tell you what. They can’t tell you why or how. That’s a much longer, more complicated story. I’ll write a post about correlation and causation soon, but when I looked at the graph more closely I noticed several things.

Firstly, there are two different y axes – which makes sense, because the two sets of values are measured differently. One is a percentage, the other is… actually we don’t know what it is, presumably it represents money, but whether it’s the average price per kilowatt hour, the cost of a typical bill, or what, we don’t know. Leaving that aside for a moment, the really sneaky bit is that neither graph starts at zero. Let’s try a couple of different versions of this graph. First, what happens if you put them on the same axis?

A line graph with a wobbly red line at the top, trending upwards from around 70 to around 160, with a blue line at the bottom trending much less upwards from around 5 to around 39. The red line is electricity prices, the blue is renewable share.

Doesn’t look remotely the same, does it? But maybe it’s not fair to graph them against the same axis, because it does rather flatten the blue, renewables line, and they’re not actually the same kind of values. So, ok, let’s try giving them their own scales back, but starting them both at 0. Any vertical axis that doesn’t start from 0 should be examined very closely indeed.

A line graph with two different scales, and a red line that kicks up sharply at the end, and a blue line that trends upwards from the start. The two lines cross over around 3/4 of the way through.
A line graph with two different scales. There is a blue line representing renewable share of generation, and a red line representing consumer electricity prices. Neither scale starts at zero, and the lines are very close together.

Notice how the lines start at the same point in the original published version (right), and finish at the same point, too. But in my graph, on the left, starting at 0 in both of the y axes makes them look quite separate.

Now let’s look at the last point on both graphs. Notice it’s 2023/24, which we don’t have prices for yet, so it’s labelled “Projection” on the original. That means it’s a guess. Maybe an educated guess, but still a guess. So let’s take out the last point on both graphs.

A line graph with two vertical scales, both starting at zero. There is a red line that trends roughly upwards, and a blue line that does the same. They end at around the same place.

Doesn’t look nearly so correlated now, does it? The graphs certainly both trend upwards, but they’re hardly closely intertwined the way they looked on the first graph.

As for trending upwards. Yes, prices are trending upwards. So are renewable energy installations. Maybe rising prices actually encourage more people to install renewable energy sources like solar panels and windmills. Maybe rising electricity costs and renewable energy are both correlated with climate change. Maybe prices are rising because of inflation. All three are almost certainly true. But there may be other reasons, too. Remember that graphs can tell you what, but they can’t tell you why.

The other big issue, of course, is that these are not the same kinds of values. The red line represents electricity prices, the blue represents renewable share, which is presumably the percentage of renewable energy in the total electricity consumption in that year. Anyone who puts two wildly unrelated values on the same graph, whether the indices are the same or not, is, as the Dread Pirate Roberts might say, selling something. It’s a dubious move at the best of times. It doesn’t make sense to compare prices with percentages. It doesn’t give you a meaningful story. And graphs are supposed to tell you a meaningful and valid story. As in this case, they often don’t!

Now, let’s look at one more part of the crime: the graph title. “Impact of Renewable Energy on Australian Electricity Consumers.” Is that what the graph shows? Even if you buy the story the author presumably hopes this graph is telling, we don’t see impact. We see prices. I’ll admit I’m a bit of a pedant for correct labelling of graphs, but prices are not an impact, they are simply prices. You could try to tell a story of impact, perhaps, by graphing the percentage energy prices make up of total household budget – but I suspect that would not tell the story that particular source was hoping for.

Dissecting a graph like this takes a little practice, but once you get into the habit it’s difficult not to do. We can all ask these tricky questions about what graphs are trying to tell us, and why. It’s worth investigating what other stories the data could tell, given the opportunity. Because as Greg said on Make Me Data Literate, data is as simple as counting things and telling stories. How great would it be if we could make sure those stories were true??

PS please leave a comment if you can spot more crimes in this graph!

4 thoughts on “Graph crimes. (part 1)”

Leave a Reply