Site icon Teaching Kids to Change the World

When simple values are complex

Today I want to explore a simple question. What temperature is it in Melbourne right now?

An incredibly simple question, you might think. So I head to the Bureau of Meteorology website and I look at the weather observations for the Melbourne area.

Now you can quibble about which measures are actually properly in Melbourne, but even if you just take the top two values, Melbourne Airport and Melbourne (Olympic park), they are not the same. Melbourne airport was 26.7 at 9:50am, and Melbourne (Olympic park) was 28.2. If there was a classic Melbourne cool change coming in, those values could differ by more than 20 degrees Celcius, which is a bit mind boggling!

Suddenly, even when we are asking a simple question like “what’s the temperature right now?”, we have to define our terms and techniques. Where exactly do we want to measure? Perhaps the problem is in the scale of Melbourne? It’s a large, sprawling city, after all. We could measure at the Melbourne General Post Office, which is often used as a kind of proxy for Melbourne in things like distance measurements. (Sadly the Melbourne GPO building no longer houses the actual GPO, but that’s a different story…)

We could use one of the weather stations the Bureau of Meteorology (BoM) has about the place. That says it’s 28.8 degrees in my suburb. Or we could measure the temperature ourselves somewhere, but that comes with its own complexity. I measured a range of places outside my house, and this is what I found (all temperatures are in degrees Celcius):

That’s a difference of 4.6 degrees from lowest to highest, and all within about 30 metres of each other, within 5 minutes of each other, measured using the same device.

What if I want to compare the temperature at my house with the temperature at my sister Jane’s house, some 16km away? How can we get numbers we can compare? What if Jane measures in the sun on her black driveway, and I measure in the shade by my front door? What if we both measure by our front doors but my front door happens to be in the sun while Jane’s is in the shade? It seems we need to specify where we take our temperature measurements, and under what conditions, in order to get a consistent result.

The BoM has a 92 page document on the requirements for weather station sites, but this line sums it up for me: “The selection of a site for meteorological installations is a complex process, and a degree of judgement is normally required.” High level considerations include how well the site represents the surrounding area, so under the trees in my front yard might not be terribly representative, given the number of trees that have been cut down in this area lately. Probably out in the open and next to the road is better, but that wouldn’t be represented of the more leafy patches… You can see it’s not straightforward.

Like many problems in Data Science, there’s no simple, clear answer, or unambiguous set of rules to follow. It’s complicated, messy, and you just have to do the best you can with the tools you have. Whenever you’re shown a dataset, always ask exactly what was measured, and how. You might be surprised how much a little tweaking of measurement technique can change the data.

It’s all very Princess Bride, really. “Data Science is pain, Highness. Anyone who says differently is selling something.”

Exit mobile version