Why get excited about Data Science?

This is an edited version of a talk I gave in Perth for the Innovation Institute, for the opening of Data Science Saturdays, aimed at 12-18 year olds. Huge thanks to the folks at Pawsey Supercomputing Centre, NeSI, and NCI for awesome examples!

Hi there! I’m going to start by recognising that I am coming to you from unceded Wurundjeri land, and pay my respects to the Wurundjeri people and their elders, past, present and emerging. 

My name is Dr Linda McIver, I’m the Founder and Executive Director of The Australian Data Science Education Institute, ADSEI,  a charity dedicated to empowering every student with critical thinking, data literacy, and stem skills in the context of projects that matter. 

I’ve been asked to talk to you today because I get crazy excited about Data Science, and I want you to know why. You’re welcome to mock me for that, but before you do, let me tell you that I wasn’t terribly keen on maths at school. I didn’t understand logarithms, and I found calculus terribly dull. I couldn’t see the point of a lot of the stuff we were learning. I just needed marks on the exam, and that’s not particularly exciting or motivating.

So how did I get from there to someone who’s crazy excited about data science? I founded ADSEI because I found out that Data Science is a superpower. It says so on my shirt, which was a gift from the San Diego Supercomputing Centre. I don’t know how clearly you can see it, but it says “I do Data Science. What’s your superpower?”

Data Science is a superpower because it gives you the power to solve problems. It gives you the power to prove that there are problems – like proving that your classroom is way too hot for compulsory blazers, or showing that the noise level in the gym is actually a health and safety issue (I hated sport at school!) – and it gives you the power to figure out how to fix them, as well as the power to show how well you’ve fixed them. 

So I want to start with some examples of some really amazing data science applications that happen in the real world.

Oddly enough, I’m going to start with my physiotherapist, Joshua Heerey. A lot of physios approach the job somewhat unscientifically. They poke, prod, and wrangle you about, pronounce their diagnosis and then give you some fiendishly painful exercises to do that may or may not solve the problem. When I developed hip problems, I was in a lot of pain. I saw a physio who poked, prodded, and diagnosed me with bursitis. He gave me a few things to do, applied ultrasound and heat, and made no difference at all. He then diagnosed something different, gave me more exercises, and again we achieved nothing. If anything, it was getting worse.

So I went to see Josh. Josh’s approach to physiotherapy is rather different. After listening to the problem and asking questions, Josh measures weakness in different muscle groups using a dynamometer – a force meter.  He uses repeated measurements to ensure accuracy. He finds the weak muscles and records just how weak they are. He also measures the angles each joint can bend to. He assigns exercises (they still hurt, btw) to strengthen the muscles that are weak. Each time I went back he’d measure them again, see which ones were improving, and by how much. In short, he applied data science to physiotherapy, and voila! Together we cured my hip. 

This is a very scientific approach to healthcare. Measure the problem. Work to fix it. Measure it again to see how well the fix has worked. Adjust treatment if necessary. Measure it again. It’s not rocket science, but it absolutely is data science. 

The next story is about a study by Professor Rosalind Picard at MIT that used a wearable device that measured skin conductivity to measure stress (this first study was before wearable devices were common). Your skin conducts more electricity when you sweat, and you sweat when you’re stressed, so in theory higher conductivity means more stress. Of course, there are other reasons why you might be sweating, or why your skin’s conductivity might change, hence the study. They wanted to figure out how good the device was at measuring stress. The device recorded measurements throughout the day, which were then matched against a diary kept by the participant, so that the researchers could track whether people were actually stressed when the data made it look as though they were. 

The researcher loaned the device to a student who wanted to use it to measure his autistic brother’s anxiety levels.  One day this device gave a massive spike in readings. Nothing the researchers could do in the lab could trigger a reading this high. They tried all sorts of stressors and exercise tests, and simply could not get a reading like that. You could show someone a massive tarantula and not get a response like that.

They thought it must be an anomaly. But rather than throw away the data as an outlier, they carefully tracked it back to the matching diary and discovered that the spike in data happened right before an epileptic seizure.

So those researchers could have ignored a value that wasn’t relevant to the study they were doing, or they could have thrown it away as an outlier, but what they did instead was develop this device – the Embrace – a seizure monitoring watch that not only detects epileptic seizures, it can message caregivers to let them know a seizure has occurred, and it also uses accelerometers, or motion sensors, to figure out if the wearer has collapsed. The Embrace has provided epilepsy sufferers with a new level of independence and safety. And it couldn’t have been done without data science.

This next story is about Jennifer Yeung, a Canadian, plane spotter, aerospace engineer, and PhD student. Jennifer’s PhD uses a system called Artemis, which is designed for real time monitoring of neonatal babies, sending data from regional hospitals to specialists elsewhere in the world, so that they can receive the best of healthcare even if their doctors are thousands of kilometers away. In 2019 Jennifer visited Pawsey Supercomputing Centre, and used Artemis with machine learning to track changes in babies’ vital signs BEFORE their health crashed, so that they could receive lifesaving treatment before their condition became critical. Incidentally, Jennifer’s main PhD project is to adapt Artemis to monitor the vital signs of astronauts in real time. How cool is that?! And, again, it’s all data science. 

Now we’re off to New Zealand, where Dr Céline Cattoën-Gilbert  analysed 40 years of climate data on a supercomputer named Maui at New Zealand eResearch Sciences Infrastructure (NeSI) to create high resolution weather and river flow forecasts to predict floods up to 48 hours in advance. This is obviously amazing news for people in the path of those floods, who used to have to wait until the water was lapping at their doorstep to know there was a problem! Now we can use data science to warn people in time to take precautions, or even evacuate if the flood levels are going to be dangerously high.

We tend to think of data as numbers – counting things, measuring things, monitoring things. But data can also be sound and images. For example Dr Giacomo Giorli is an oceanographer at the National Institute of Water and Atmospheric Research (NIWA) in New Zealand. There, his team tracks marine mammal populations around New Zealand through underwater acoustic monitoring, again using NeSI supercomputers. Dr Giorli is particularly interested in whales, and wants to track their movements. But it’s hard to detect and monitor whales 24/7. It’s expensive, often cold and wet, you get seasick, and whales can be just plain hard to find sometimes. If you can place microphones underwater, suddenly you can do 24/7 monitoring from the comfort of your local supercomputer. 

Now off to space! The craters on a planet’s surface tell its history.  Volcanic activity tends to smooth the planet’s surface, by covering it with lava, so the more craters we can see, the older the surface since a volcanic event wiped it ‘clean’. The current database for Mars contains 385,000 identified craters with diameters of 1 km or larger.  But it took at least six years to construct, before it was published in 2012. Planetary scientist Professor Gretchen Benedix at Curtin’s Space Science and Technology Centre used machine learning and the Pawsey Supercomputing Centre’s systems to identify 94 MILLION craters in just 24 hours.  Even cooler, they can now identify craters as small as 5meters across – 200 times more sensitive!

Now let’s get physical. Curtin Graduate student, Jordan Makins, with the help of Pawsey Supercomputing Centre, has developed an open source tool for analysing soccer player performance. Feed the tool data about recent games, and it can tell you how well players are performing, and where their weaknesses are. Data Science is heavily used in sport to try to monitor and improve performance. 

Any trainspotters here? Let’s talk about how Data Science caught Singapore’s rogue train. In 2016 the circle line in Singapore suffered a series of strange disruptions. Trains on the line, apparently at random, lost contact with the control system, which triggered the emergency braking system, leaving the trains dead on the tracks. This is obviously a bit of a problem for a busy train line! The events seemed so random, though, that the train company had no idea what was going on. They called in some data scientists and gave them a dataset containing the date and time of each incident, where it had happened, the ID of the train, and the direction the train was travelling in.

The data scientists tried everything to find a pattern in the data, but it wasn’t always the same train, it didn’t seem to be in the same place, or even the same set of places. It was bizarre.They visualised a whole range of different aspects to the data, using complicated graphs, simple ones, anything they could think of. They crunched all kinds of numbers. Nothing. Eventually they spotted a small pattern in all of the noise: When a train lost signal, another train behind that train but headed in the same direction would often also lose signal directly afterwards. They started to think that perhaps there was a rogue train, causing signal interference with other trains. Complicating their investigation was the fact that the rogue train never interfered with itself, so it did not appear in their data. But that, in itself, was a clue! An extra complication is that a small number of shutdowns are normal, so there was some noise in the data.

Eventually, after a lot of work, they zeroed in on a possible suspect, and checked when that train, Passenger Vehicle 26, was not in service. Lo and behold, very few shutdowns happened during those times! Culprit identified! Passenger Vehicle 26 was repaired to prevent the interference, and the Circle Line went back to normal. Another problem that would have been really hard to solve without data science.

Now let’s talk about something particularly close to my heart, since I’m in Victoria and only just out of lockdown! Professor Linsey Marr is a scientist who proved back in 2011 that the flu was airborne rather than aerosol. Aerosol and airborne might sound the same, but the technical difference is crucially important. Diseases spread by aerosol transmission spread by droplets – particles emitted when you cough or sneeze. Droplets are heavy. They don’t stay in the air, but they CAN land on surfaces and make you sick if you touch those surfaces and then touch your face or your food. They can also land straight in your mouth, nose, and eyes if someone coughs or sneezes nearby. (how gross is that!?) That’s why social distancing is really important with aerosol diseases.

In contrast, diseases that are airborne make you sick if you breathe them in. And, crucially, they stay in the air for much longer. Marr took samples of the air in different rooms, in places like up near ceiling air vents, where droplets simply couldn’t be (because they fall, they don’t fly!), and she found enough flu virus to make people sick. The trick, though, is that she couldn’t get published, because the medical establishment was convinced that the flu was aerosol transmitted.

The reason? In the 1930s a study of tuberculosis found that only particles smaller than 5 microns could infect people with the disease. This somehow got translated into “only particles smaller than 5 microns can be airborne.” 

The thing is that Professor Marr is an expert in airborne pollutants and indoor air systems, and her engineering training told her quite clearly that the physics of this assumption was all wrong. Particles larger than 5 microns hang in the air all the time!

When covid came around, Professor Marr was quite sure it was also airborne, while the WHO and the American CDC among many others were busy saying it was droplet, so social distancing and hand sanitising were promoted as the way to stop the spread, rather than masks and ventilation.

Frustrated, Dr Marr teamed up with a history researcher by the name of ​​Katherine Randall who conducted what was effectively research archaeology – digging down into the history of a topic to figure out where certain ideas come from. Randall discovered that the original tuberculosis study, from the 1930s, did indeed establish that only particles smaller than 5microns can infect a person with tuberculosis, but not because larger particles don’t hang around. Tuberculosis can only make you sick if it gets deep into your lungs, and our lungs very efficiently filter out particles larger than 5 microns well before they get that deep. 

Particles larger than 5 microns DO hang around in the air, and while they can’t give you tuberculosis, they can certainly give you covid19 or the flu, because those can make you sick if they get anywhere in your respiratory system. They don’t need to get anywhere near as deep as tuberculosis does.

Linsey Marr challenged scientific orthodoxy, and she’s one of the heretics I talk about in my book, Raising Heretics, because we need people to challenge orthodoxy, but only on the basis of evidence, data, and rational evaluation. Not on the basis of youtube rabbitholes, reddit, and tiktoks!

We desperately need people who are prepared to be rationally heretical.

Who are prepared to ask “why? “how can we be sure?” “what have we missed?” “how can we do better?” “who are we hurting?” “how can we fix this for everyone?” “how will we know how well it works?”

These questions are often heretical. By asking them, I’ve sometimes made people very unhappy. These questions are uncomfortable. But they are crucial to building an ethical, sustainable, positive future for all of us.

Heresy has been crucial to our scientific development. In the 1840s Ignaz Semmelweis came up with the radical heresy that doctors washing their hands before (and after) surgeries prevented disease. Prior to this doctors went from autopsies to childbirth without washing their hands or changing their clothes. And they wondered why people died. The idea that this could cause disease was considered so ludicrous that it took decades for the idea of washing hands to be accepted. Semmelweis was so ridiculed and pilloried that his colleagues committed him to an asylum where he was beaten and died.

In 1917 Alice C Evans made the laughably heretical suggestion that milk should be heated to a high temperature, or pasteurised, to kill bacteria that could be harmful to humans. She was not taken seriously, being a woman and without a PhD (which, by the way, were not offered to women at the time), and it took over a decade before milk was regularly pasteurized in the US. After her discovery but before its general acceptance, Alice became significantly ill with Undulant fever, a disease caused by one of the bacteria found in raw milk.

In the 1940s and 50s, Barbara McClintock discovered that genes aren’t static sets of instructions passed from generation to generation, but that they can be regulated – turned on and off – by other parts of the genome. She described the reaction to this discovery as “puzzlement, even hostility”, but in the end her research radically changed our understanding of genetics.

In the 1960s, Frances Kelsey of the American Food and Drug Administration refused to approve Thalidomide for use as a morning sickness drug, because she was concerned about the lack of data about whether the drug could cross the placenta, and directly affect babies’ development in the womb. This averted thousands of birth defects in American babies. Sadly, other countries were not so cautious.

More recently, Marshall and Warren’s original paper on ulcers being caused by bacteria rather than stress was rejected and consigned to the bottom 10% of submissions. Barry Marshall eventually drank helicobacter pylorii – the bacteria that causes ulcers – to prove it, thus inducing an ulcer which he then cured with antibiotics.

It might surprise you to know that Florence Nightingale was one of the first data scientists, and her use of statistics actually saved a lot of lives. Nightingale discovered that the way field hospitals were recording deaths was wildly inconsistent, and it made it very difficult to understand why soldiers were dying. By standardising the way they were recorded, she was able to analyse the data and figure out that by far the greatest proportion of soldiers were dying from infections spread in the hospital itself, rather than injuries received in battle. Knowing what the problem actually was meant that they could work to fix it. Once hygiene was improved throughout the hospital, deaths and illnesses dramatically reduced, and many lives were saved.

You can see that there is no practical limit to the ways we can use Data Science to solve problems. To change the world. From sport to disease, from the ocean to space, Data Science is a tool that empowers us to understand the world, and change it for the better. 

We need you to be data scientists. Not necessarily professionally, but to have enough data literacy to ask difficult questions, to challenge the status quo, to be heretics.  And we need you to do it on the basis of evidence and data. 

Raising Heretics to Save the World

This is an excerpt from Raising Heretics, available now online in ebook and paperback format (check out adsei.org for international links & ebooks).

It’s time to change the world. We need creative problem solvers to address catastrophic climate change, income inequality, pandemics, ecological collapse, misinformation, radicalisation, and many more problems facing humanity. We need critical thinkers. Rational Sceptics. People willing to challenge the status quo.

Unfortunately, we have an education system that’s exceptionally good at turning out obedient people full of “facts” and unshakeable opinions. This book proposes a new approach to education that empowers our children to solve real problems, to challenge their own results, and to shake up the status quo on the basis of evidence and data.

I founded the Australian Data Science Education Institute in 2018 because I wanted to show kids that they are capable of working with technology, that it is relevant to them, and that they don’t have to look like Sheldon from the Big Bang Theory in order to learn to program.

It’s well known that the technology industry has a diversity problem when it comes to women, but lack of diversity goes way beyond gender. By trying to increase the number of women and girls in STEM, we are only tackling the easy part – though it’s actually not that easy, judging by the sheer volume of women in STEM programmes and the persistently stubborn failure of the numbers to actually shift.

The problem is that we consistently attract the kinds of people to tech that are already there. We are missing big chunks of the population – boys included. Boys who don’t see themselves as nerdy, or who don’t see the point of tech. Girls who don’t see it as relevant to them. Non binary and gender queer kids who don’t see themselves as represented or welcome in any of the tech programmes available to them.

If we had true diversity in technology and Data Science, we’d have a range of ethnic and cultural backgrounds, as well as people with a wide range of physical abilities. We’d have people on our design teams that are mobility compromised, vision impaired, with allergies, with varied gender identities and sexualities, with every possible skin tone and body shape. We’d have people who act differently, dress differently, think differently, and have different needs. I have headphones that don’t work well with long hair, for goodness’ sake! Guess who was on that design team?

This lack of diversity is bad for the technology industry, but it’s even worse for the rest of us, because technology is changing the shape of our world at an alarming rate, and we currently have very little say in our own future. Companies like Uber and Doordash are radically changing our working conditions and eliminating hard won entitlements and protections, while Facebook and Youtube spread misinformation and encourage radicalisation, all in the name of keeping people on their platforms and maximising their profits. Our world is being directly shaped by technology companies that are working in ways we don’t understand and have no control over.

Meanwhile we see human resources companies using AI to filter job applicants, claiming that their system eliminates “human bias”, without admitting the possibility that it introduces new forms of machine bias. We see “predictive policing” algorithms being used to predict crime and target particular communities in disturbing ways. We see a rush towards machine learning and artificial intelligence systems for their own sake, rather than for problems they can legitimately solve, and we have a wholly unwarranted confidence in the accuracy, reliability, and objectivity of their output.

It turns out that diversity in the technology industry is only a small part of the reason why teaching all kids Data Science and STEM skills matters. The big part is that we need a technology and data literate population who are trained to think critically and creatively, and, in particular, trained to believe that they can solve problems. That’s the world we need to build. And the foundation stone of world building has to be education.

We have a choice. We can train kids to be obedient process followers who don’t rock the boat, or we can train them to be challenging, critical and creative thinkers who ask difficult questions and come up with innovative solutions to our worst problems.

Above all, we need people who are prepared to be heretical.

Who ask “why?

Who ask “how can we be sure?”

Who ask “what have we missed?”

Who ask “how can we do better?”

Who ask “who are we hurting?”

Who ask “how can we fix this for everyone?”

Who ask “how will we know how well it works?”

These questions are often heretical. By asking them, I’ve sometimes made my bosses very unhappy. They make people uncomfortable. But they are crucial to building an ethical, sustainable, positive future for all of us.

I have a PhD in Computer Science Education and over twenty years experience teaching Computational and Data Science at both Secondary and Tertiary levels. Now I’m the Founder and Executive Director of the Australian Data Science Education Institute (ADSEI) – a registered charity dedicated to ensuring every student is empowered with data literacy, Data Science, and STEM skills. I started ADSEI because I figured out how to engage kids with STEM and Data Science skills, and I wanted to engage all kids, not just the kids in my own classes. I thought this would help improve diversity in the technology industry, but I have come to realise the problem is far more fundamental than that.

All of my time in education has made it clear to me just how badly wrong education has gone. We continue to make the same educational mistakes we’ve been making for decades. We are failing our children, and, in doing so, we are sabotaging our future. If we want to build a future that is evidence based, rational, and inclusive, then our education system clearly needs to change.

There are so many signs that our current education system is missing the mark. When my teenager gets frustrated because she doesn’t understand how what she’s learning in maths could ever be useful. When a primary school kid says science is boring. When a high school kid says maths is too hard, or science isn’t for them, or they aren’t smart enough to program a computer. None of these things would happen if education was working. It’s obvious that it’s not.

And that’s unsurprising, since the primary focus of education is a matter of facts, rote learning, and mindless application of procedures. By giving kids “experiments” to do that have known inputs and known results, we teach science as confirmation bias. This trains them that the important thing is to get the right, expected answer (and if you get a different answer, fudge things until it’s right!), rather than exploring the unknown and looking for new things.

Although the importance of STEM is widely acknowledged, it is frequently taught as a matter of tech toys, rather than a crucial tool for solving real problems. This commonly comprises a day of robotics play, or the installation of a maker space where kids can tinker with 3D printers and laser cutters. These toys are frequently error prone and difficult to use, so when kids don’t find them fun, or have trouble using them, they assume that STEM is something they can’t do.

Even when problem solving tools like Design Thinking are introduced in the classroom, they are often only used to solve toy problems that don’t relate to challenges that kids can tackle in real life. Design Thinking plays with trips to Mars, or responding to a famine in Ethiopia, instead of taking one of the many problems in our own schools and communities and empowering kids to solve it. You can’t teach problem solving properly if you skip the really tough part; implementing your solution and then troubleshooting all the ways it doesn’t work the way you thought it would.

By doing this, we tell kids that they can’t make a difference until they are grown up, when we could be giving them the tools to make a positive difference in their world today.
The truth is, with this kind of education we have got really good at turning out obedient kids who follow the rules and do as they are told. And those are not the kind of people we need to overcome the huge crises we’re facing. We need people who are confident, skilled, knowledgeable, and prepared to stand their ground and argue a point. We need people who see things differently, who look for new answers, who understand uncertainty, and who ask hard questions. We need people who are “unbossable”,who don’t do what they’re told without first understanding why it’s the right thing to do. We need people who challenge the status quo. We need people who consider ethics first, rather than as an afterthought or not at all.

Meanwhile, Science has somehow become a partisan political football. Australia’s response to the Covid19 crisis was effective, largely because the Government followed the advice of experts in epidemiology. Unfortunately, we face a larger and more serious existential crisis in the form of climate change, and in this case, the Government is ignoring experts and investing deeply in denialism and cheap grabs for immediate power and profit.

Policy in this country (and most of the world) is largely driven by ideology, powerful lobby groups, and manipulative media organisations, rather than by science and evidence. This kind of destructive behaviour is justified with dodgy data and deeply suspect visualisations, and all too often even the media lack either the scepticism or the skill to call them out.

Inequality is rising under the influence of capitalism-driven globalisation that promises better lives for all via the concept of “trickle down economics”, which the data shows quite clearly does not work. We resist Universal Basic Income on the basis that people would stop working out of laziness, when the data from the trials so far shows not only that people don’t stop working, but also that they become more entrepreneurial. Our governments sell off natural assets, log native forests, privatise essential services like health and education, and give tax cuts to big business despite evidence showing that the best way to stimulate the economy is to give money to poor people. As a population, we swallow the line that it is all for our own benefit, and vote the same people back in.

Social media also drags us by the nose, constructing ever more cunning ways to tie us to their platforms, milk us for data and profit, and manipulate our behaviour, all without our informed consent. Our social and workplace gains are casually undermined by disruptive technologies, while we have no input into, and even less control over, the way they shape our future.

This is why we need a rationally sceptical population. We need to stop being irrationally sceptical of climate science and vaccines and start being rationally sceptical of government policy, business motives, and media beatups.

We need to build a new world. And world building has to start with education.

Raising Heretics

This is the text of my keynote from the NSW ICT Educators Conference in Sydney earlier this year.

I am Dr Linda McIver, founder and ED of the ADSEI, a registered charity dedicated to putting students in the driver’s seat of our data driven world.

Today I want to talk to you about the importance of heresy.

I’m going to take you through the place of heresy in science, as well as the phenomenon known as survivorship bias, and how these relate to the extraordinary claims being made in the field of AI, and then I’m going to talk about how I’m aiming to fix it all.
Much of our Science, Technology, Engineering, and Maths education starts from a foundation of facts and known answers. This teaches our kids that the point of STEM is to Get The RIght Answer, whereas the actual point of real world STEM disciplines is to fix things, understand things, and solve problems. In this talk I will show why I founded the Australian Data Science Education Institute, why we’re dedicated to raising Heretics, and why Heresy is something we desperately need right now, both in the Data Science industry and the world as a whole.

First of all, let’s define our terms. Heresy is an opinion profoundly at odds with what is generally accepted. And heresy has been crucial to our scientific development.
Let’s talk about some historical scientific heresies:

  • In the 1840s Ignaz Semmelweis came up with the radical heresy that doctors washing their hands before (and after) surgeries prevented disease. Prior to this doctors went from autopsies to childbirth without washing their hands or even changing their clothes. And they wondered why people died. The idea that this could cause disease was considered so ludicrous that it took decades for the idea of washing hands to be accepted. By the way Semmelweis was so ridiculed and pilloried that his colleagues committed him to an asylum where he was beaten and died.
  • In a well known heresy, Galileo Gallilei so outraged the church with the idea that the earth revolves around the sun, rather than the other way around, that he was accused of literal heresy and committed to house arrest. He only narrowly escaped death.
  • In 1912 Alfred Wegener began to publicly advocate the idea that the continents moved over time – what became known as continental drift. He, too, was widely ridiculed, and did not live to see his ideas finally vindicated.
  • More recently, Marshall and Warren’s original paper on ulcers being caused by bacteria rather than stress was rejected and consigned to the bottom 10% of submissions. Barry Marshall eventually drank helicobacter pylorii – the bacteria that causes ulcers – to prove it, thus inducing an ulcer which he then cured with antibiotics.

It seems like heresy is a pretty dangerous business!

In fact a lot of scientific breakthroughs have been considered heretical. Especially in medicine!

Now let me digress for a moment to tell the story of the WWII planes that were examined for bullet holes to work out where to armour them. Researchers figured that the places with the most holes – the wings and the fuselage – needed the most armour. They found no holes on engines or fuel tanks so they figured they didn’t need armouring… until statistician Abraham Wald pointed out that the planes they were studying were the ones that made it back. The planes needed armour where none of the planes had holes, because clearly the planes that had holes in those other places (the engine and the fuel tanks, btw) were the ones that DIDN’T COME BACK.

I love this story, because it’s a classic example of the obvious conclusion being dead wrong. In similar results, the introduction of helmets in WWI resulted in more head injuries being treated in the field hospitals, so the first reaction was “stop using helmets!”… what data was missing?

Both of these are examples of survivorship bias – where there is a chunk of data missing from the study. In these cases it’s literally survivor bias because it fails to take into account those who don’t make it back.

Have you heard of HireVue? They’re a Human Resources Tech company that uses artificial intelligence to select or reject candidates in job interviews based on… well, nobody actually knows.

They say it’s a machine, therefore it’s without bias. And we could laugh and snort, but over 100 companies are already using it, including big companies like Hilton and Unilever.

According to Nathan Mondragon, HireVue’s chief industrial-organizational psychologist, “Humans are inconsistent by nature. They inject their subjectivity into the evaluations, But AI can database what the human processes in an interview, without bias. … And humans are now believing in machine decisions over human feedback.”
Which is really just all kinds of disturbing when they make all sorts of claims for it, but can’t actually explain how it is making its decisions.

They say that the system employs “superhuman precision and impartiality to zero in on an ideal employee, picking up on telltale clues a recruiter might miss.”

Of course, HireVue won’t tell us how their algorithm works – in part to protect trade secrets, and in part because they don’t really know…

I am fairly confident I’m not the only one who finds that an incredibly disturbing idea.

Luke Stark, a researcher who studies emotion and AI at Microsoft, describes this as the “charisma of numbers”. If an algorithm assigns a number to a person, we can rank them objectively right? Because numbers are objective. And simple. What could possibly go wrong, reducing a complex and multifaceted human being to a simple numerical rank? (Helllooo ATAR – Australian Tertiary Acceptance Rank…)

I think Cathy O’Neil sums it up beautifully: Models are opinions embedded in mathematics, and algorithms are opinions formalized in code. It’s incredibly important that we dispel this pervasive myth that algorithms are unbiased, objective statements of truth.

This whole HireVue system is a textbook example of survivorship bias: looking only at the people who made it through the same hiring process that we are now calling fatally flawed… and thinking we can predict ideal new hires with only that data. It completely ignores the people who didn’t make it through the initial processes, who might have been amazing.

It also highlights an issue I’ve seen raised again and again in works like “Weapons of Math Destruction”, “Made by Humans,” and “Automating Inequality” – that people believe in numbers, computers, and algorithms over other people, even when they’ve been explicitly told those systems are broken. And I have a story about that, too.
Niels Wouters, a researcher at Melbourne University, some years ago designed a system called Biometric Mirror. It was a deliberately simple, utterly naive machine learning system that took a picture of the user’s face and then claimed to be able to tell a whole lot about the person, just from that picture.

The system spat out a rating of ethnicity, gender, sexual preference, attractiveness, trustworthiness, etc. And Niels created the system to start a conversation with people about how transparently ludicrous it was to believe a system that does this. So he set up booths where people would come, have a photo taken, and read all of this obviously false information about themselves, and then have a conversation about trust and ethics and the issues with Artificial Intelligence. So far so good. A noble goal. But there are two postscripts to this story that are horrifying in their implications.

First of all, Niels would overhear people walking away from the display, having had the conversation about how obviously false the “conclusions” drawn by the system were, saying “But it’s a computer, it must be right, and it doesn’t think I’m attractive…”

And secondly, after speaking publicly about all of the issues with Biometric mirror, Niels was contacted by HR companies wanting to buy it…

So here is where we start to make the connection between education and the tech industry.

One of the problems in Data Science is that we often don’t have a lot of time to challenge even our own results, never mind anyone else’s. The rush to data riches (Step 3, Profit!) means we don’t really have time to be cautiously sceptical. We get a result, report it, and move on to the next dataset. And people are all too willing to believe in those results.

When I asked a group of data scientists if they had ever had to release/report on a result that they felt hadn’t been fully tested, that they couldn’t bet their lives on, around half put the hands up. And then when I asked how many hands would have gone up if it had been anonymous, the other half put up their hands.

So all of the discoveries I mentioned in the first half of this talk were made by people being sceptical. Challenging the status quo. Questioning accepted wisdom. By people who were quite prepared to examine new evidence and consider that “what everybody knows” might be wrong. Of course, we need educated heretics, so that our scepticism is rational and fact based, rather than denialism and wishful thinking, which is what we are seeing quite a lot of now. So education is clearly key.

But let’s consider STEM education. We mostly teach Science, Technology, and Maths in schools as a matter of facts and known outcomes (and, yes, I know there’s one more letter in STEM, but we rarely, if ever, actually teach any Engineering) .

Consider the average school Chemistry experiment. We take known substances, apply a known process, and achieve an expected outcome. What do kids who don’t get the results they expect do then? Do they go back and try to find a reason for their results? Do they ask questions and challenge outcomes?

Nope. they don’t have time for that, and they get no credit for it. They copy their friends’ results. Or they simply adjust the results to get the outcome they expected to get. Marks are allocated for the expected results. For the right graph.

Occasionally we’ll run a prac with unknown reagents and ask the students to identify the inputs. But here, again, marks are for the correct answer.

But this isn’t science education. This is an education in confirmation bias. In finding what you are supposed to find. In seeing what you expect to see. It is the exact opposite of the way science should work. Science should be about disproving theories. And you only accept a theory as plausible when you have tried your hardest to disprove it, and failed.

Maths is much the same. The emphasis is on correct answers and known outcomes. On cookie cutter processes that produce the same result every time.

Technology education is often even worse. With a severe shortage of teachers with programming skills, we tend to default to education using toys. Drawing pretty pictures. Making robots follow lines. Writing the same code. Producing the same output.

What if we could teach with experiments where we don’t know the answers?
Well with data, we can easily do that. Can we find a dataset that hasn’t been fully analysed and thoroughly understood? I could probably hit a dozen with a bread roll from where I’m standing.

How do you mark it, then, when you don’t know the right answer? You mark the process. You mark the testing. You ask the students to test and challenge their answers really thoroughly. You give points for their explanation of how they know their answer is right, for how they confirmed it by trying their hardest to prove it wrong.

It has been said, most famously by Grace Hopper, that the most dangerous phrase in the English language is “we’ve always done it that way”. Now, more than ever, we need people who challenge the status quo, who come up with new ideas, who are prepared to be heretical.

By teaching Data Science in the context of real projects, where the outcome isn’t known, we can actually teach kids to challenge their own thinking and their own results. We can teach them to think critically and analyse the information they’re presented with. We can teach them to demand higher standards of validation and reproducibility.

The trouble with this is that it requires a significant amount of setup work. Finding the datasets isn’t hard, but making sense of them can be really challenging – for example when I downloaded a vote dataset from the AEC and tried to find someone who could explain to me how the two dimensional senate ballot paper translated to a one dimensional data string, I literally couldn’t find anyone at the AEC who knew. I mean… presumably there is someone! But I couldn’t find them. It took me hours and hours to make sense of the dataset and design a project that would engage the kids, and give them room to stretch their wings and really fly.

The only reason I was able to commit that kind of time is that I was only teaching part time, so I used my own time to build these engaging projects. In year 10 we did projects on climate, on elections, on microbats. In year 11 we worked with scientists to solve their computational and data needs, in fields like marine biology, conservation ecology, neuroscience, astrophysics and psychology. The possibilities are truly endless.

But a teacher with a full time load doesn’t have the capacity to take on that kind of extra work. It’s just too time consuming, even if they have the skills to start with.

So that’s why I created the Australian Data Science Education Institute, or ADSEI. To develop project ideas and lesson plans that empower kids to explore data and become rational sceptics. To develop their data literacy, critical thinking, and technical skills in the context of projects they really care about. And also to provide professional development training to teachers right across the curriculum – not just in Digital Technologies – to integrate real data science into their teaching. To use data to make sense of the world.

At ADSEI we have created projects where kids use real datasets to explore the world. To solve problems in their own environments and communities, and most importantly: to measure and evaluate their solutions to see if they worked. We’ve got projects that do things like:

  • calculate how much carbon is embodied by the trees on their school grounds and then do various comparisons of the school’s carbon emissions from electricity.
  • construct a set of criteria for good science journalism and then evaluate a bunch of different sources according to those criteria and visualise the results
  • analyse the litter on the school grounds, find ways to fix it, and then analyse it again to see if they worked
  • record and analyse the advertising they see around them in a week and explore its impact on their behaviour
  • use solar energy production & power usage data to explore a household’s impact on the environment
  • use the happiness index data to explore world differences in measures like income inequality and social support
  • use data from scientific observational studies to learn about whales, turtles, climate, and more

When I was teaching Computer Science at John Monash Science School in Melbourne – a school for kids who are passionate about science, who you might be forgiven for assuming were already engaged with tech – we started by teaching them with toys. We had them draw pretty pictures, and program robots to push each other out of circles. And the number one piece of feedback we got was “This isn’t relevant to me. Why are you making me do this? I’m never going to use it.”

When we shifted to teaching the same coding skills – variables, if statements, for loops, etc – in the context of Data Science, using real datasets and authentic problems, that feedback disappeared and instead we heard “this is so important. This is so useful. I’m using this in my other subjects.” and the number one thing I live to hear when teaching tech: “I didn’t know I could do this!”

So not only does teaching tech skills in the context of data science teach the kids that STEM skills empower them to solve problems and find out more about their own world, it gives them the motivation to succeed. To actually learn the skills and put them to good use.

And make no mistake, motivation is the single most important factor in learning.

So Data Science empowers students to learn technical and other STEM skills in the context of real problems. It gives them the capacity to create positive change in their own communities – and to prove that they have. It teaches them to communicate their results.

And most importantly, it teaches them that this is something they can all do.

And that point is crucial, because at the moment we have hordes of students – even at a high performing STEM school like John Monash – believing that tech is not something they can do. Not something that interests them. Not something that’s relevant to them.

Which means that we are continuing to get the same kinds of people choosing to go into tech who have been choosing it for decades now. We are actively perpetuating the stereotypes, because those stereotypes are now so strong that everyone believes that only those types of people should or can go into tech.

One of my friends who works in data science recently met someone who, on learning her occupation, literally said to her: “You work in tech. So, are you on the spectrum?”
Because if I ask you to picture a computer scientist, or a data scientist, chances are you will imagine a young white male who is on the spectrum.

Current figures suggest that women make up as little as 15% of the Data Science industry.
And it’s lack of diversity in the tech industry that leads to systems like the HireVue AI – because there are not enough voices in the room prepared to say things like “Um, have we really thought this through?” or “What are the ethical issues with doing that?”

It also leads to tech solutions that work beautifully for the types of people represented on the development team, but that have serious limitations for everyone else.

And lest you think that women simply aren’t cut out for tech, and there isn’t actually any bias in the field, allow me to remind you of the 2016 study of open source code on github that found that code submitted by a woman was more likely to be accepted than code submitted by a man, but only if the woman’s gender was not identifiable from her github id.

ADSEI’s work isn’t going to turn every student into a data scientist. But it will give kids the option of being data scientists, who wouldn’t have had it otherwise. Because they will understand the power of data science, and they will know that it’s something they can do. And that is phenomenally empowering.

Measuring with Added Data Science – Primary School Lesson

You can add a little Data Science into any lesson, but Measurement in Primary School is just crying out for a little added Data Science. And when I say Added Data Science, I really mean added critical thinking and scepticism. Here is a Grade 6 lesson that I just trialled at Gillen Primary School in Alice Springs, where we took a basic measurement lesson on height and injected some cool data concepts. This lesson might be worth splitting over two lesson times, depending on how the discussion goes.

The goal here is to be asking questions and evaluating what you’re doing at every step.

  1. Pick two students that are very different heights, and have them stand at opposite corners of the room. Have the kids guess who is taller.
  2. Now pick two students that are very close in height, and do the same thing. Have the two students stand back to back and work out who is actually taller. Now ask the kids: which was easier to guess? Why?
  3. Class discussion: what does it mean to “estimate” a value? What’s the difference between an estimate and a guess? If an estimate is an educated guess, what factors did you use to “educate” your estimate of who was taller? (One student today said that the taller person came further up the board than the shorter person, which was a great way of using comparisons to inform your estimate!)
  4. Have your students make a list of the people in their class who are here today and rank them by height, without talking to each other or comparing answers.
  5. Class discussion: Did you all rank every person the same? Which positions were easiest to rank? Often the tallest and shortest students are really easy to rank, but sometimes there are a few students very close in height that make it difficult. The middle positions tend to be the hardest, and you can have some discussion about why this is.
  6. Ask the class who is the tallest student. Take one answer and then ask if there are any different answers, until you have the set. Then do the same for shortest. You can do some back-to-back measuring at this point to settle these questions.
  7. Ask the class why their answers might be different, and discuss how estimates are not exact.
  8. Now get the class to stand up and sort themselves into height order. You might want to get the tallest and shortest up first, and then gradually fill in the middle one or two students at a time, to avoid chaos.
  9. Class Discussion: How much easier was it to do in person than try to compare them in your head? What made it easier?
  10. Now for the measurement! Put the class into groups of 3-5. Each group picks one person to measure, and every other person in the group should measure that person and write down their height, without telling the other members of their group what height they got. 
  11. Groups compare their results and see how similar they were. Each group should record the size of the range of their measurements. So a group that recorded measurements of 143, 145, and 146 would record 3 as their largest, because the lowest value was 143 and the highest was 146.
  12. Come back together as a class. Class Discussion: How accurate do you think your measurements were?
  13. Class Discussion: Did every student use the same measuring technique? What were some different ones people used?
  14. Class Discussion: How big was the biggest difference between measurements? What factors made the measurements hard? We heard things like:
    1. The person we were measuring was taller than us.
    2. The person was taller than the tape measure (at this point you can explore strategies for solving this problem! Eg measuring against the wall, marking where the tape measure stops, and putting the tape measure above that mark to measure the remaining length, or measuring them lying down on the floor).
    3. It was hard to hold the tape measure straight.
    4. It was hard to hold the tape measure still.
    5. It was hard to read off the exact value because of the distance between the tape measure and the actual top of the person’s head.
    6. The actual measuring part of the tape measure starts a few centimetres in from the start of the tape, getting it exactly in the right spot on the floor is hard!
  15. As a class, brainstorm techniques for making the measurements more accurate.
  16. To wrap up the class, ask them again how accurate they thought their measurements were, and then ask them if they think they were accurate enough? Think of several scenarios where you might need to measure height, and ask how accurate each needs to be. The goal here is to consider that data is rarely completely accurate, but it can still be accurate enough. Eg.
    1. Measuring the length of bed someone needs. Because beds come in fixed sizes you only need to know which range the person fits into.
    2. Measuring whether someone will fit through the doorway. As you are very unlikely to have primary school kids who won’t fit through your doorway, it’s reasonable to think they don’t need to be very accurate! “Are you less than <however tall your doorway is>?” can usually be estimated rather than measured! Consider whether they might know someone for whom this would not be sufficient – eg a professional basketballer.
    3. Measuring whether a cape would fit
    4. Pilots in some aircraft have to be under a certain height to fit in a cockpit
    5. Sailors in a submarine (because the ceilings are low)
    6. What others can you think of?

There are many more questions you can explore using this lesson, and many more types of inaccuracies you could consider. As always, these steps are a starting point, and some points to ponder. You can use a subset of the steps, or expand on them.

If you modify the lesson it would be wonderful if you could share it back by emailing it to contact@adsei.org so that other teachers can learn from your approach.