Artificial Intelligence, social justice

Questioning the Answers: The Role of Scepticism in Data Science (and Education!)

A large screen showing a picture of Inigo Montoya with the words "You keep using that word, I do not think it means what you think it means." with Linda McIver standing at a lectern near the screen looking serious.

This is an edited version of the keynote I gave at the Melbourne Data in Schools Conference last week.

It’s a privilege to be invited here to deliver a keynote at this year’s Melbourne Data in Schools. I’d like to acknowledge that this talk was written on the unceded lands of the Bunurong people, and is being delivered on those of the Wurundjeri people, both of the Kulin nation, and I recognise the First Nations people of this land as our first scientists, environmentalists, teachers, and storytellers, who we could learn a lot from, if we choose to listen.

I’ll tell my story as we go along, but to set the scene: I have a PhD in Computer Science Education, and was an academic and a secondary teacher before I founded the Australian Data Science Education Institute. I spent the first part of my career trying to make programming easier to learn, and figuring out how to engage reluctant learners in coding, until I got to the point where I realised it wasn’t the coding that matters. But we’ll get to that!

There’s a lot of talk about how Artificial Intelligence is breaking the education system, but the education system was already dying.

How do we know it’s dying? How do we measure its vital signs?

You tell me – how do we measure our education system?

How should we measure our education system?

At the moment we’re mostly measuring atar and student retention as the most important outcomes of the system. How many kids make it to year 12 and what do they score? We should be measuring: covid deniers, flat earthers, anti vaxxers, sovereign citizens, people with blind faith in technology.

We should be measuring whether people understand how diseases spread, how food poisoning happens, what food allergies are and how to avoid killing people who have them.

We should be measuring how much people understand about sexual and reproductive health, and what consent means. Whether they understand how sun exposure increases your risk of skin cancer, how sleep impacts wellbeing, or how wet roads affect stopping distance, and what kinds of things change our reaction times. How well people understand ecosystems, their importance, and our impact on them.

In other words, we should be measuring how well our kids are equipped to survive and thrive in the world we find ourselves in.

For that, we should be measuring problem solving and critical thinking. How easily we believe things that are not true. We believe too many things there is no evidence for. We are even prone to believing things that contradict all the evidence we have.

We believe all too easily that chatbots – statistical text extruders – are intelligent enough to correctly analyse data, to effectively and accurately summarise reports, to write meaningful text, to act as search engines and give us reliable results.

Recently I had an appointment with a new medical specialist. He used AI to summarise our appointment (without my consent, but that’s another issue), so I was interested to see the letter he sent back to my GP and I afterwards. He assured me that he would carefully check the details of the summary before sending the letter, so it was rather horrifying to find that there were at least 9 significant errors in that letter.

The AI summary system he used, specifically designed for medical appointments, had changed the number of pregnancies I had experienced, included drugs I was not on and conditions I did not have, and and left off drugs I was on and conditions I did have, even though they were relevant to the issues I was seeing him for. This is despite his “careful checking” and the referral letter that contained many of the details that the summary got wrong. When I raised this problem at our next appointment, he assured me that it would have been much worse if he had not used AI. Which raises some serious questions about his competence and his quality of care!

Our faith in AI, our susceptibility to misinformation, our blindness to bias (our own, and that of others), our belief that we, personally, are not racist, sexist, or biased – these are all much better measures of the success, or rather failure, of our education system than how many students at your school get ATARs of over 90.

When my kids were in high school I was on the school council for a time, and we would occasionally have presentations from the heads of different faculties. I remember the Head of English coming in and saying “Our goal is to improve Naplan results.”

No. No it is not. Our goal is to improve literacy, communication skills, critical thinking… We hope that Naplan provides a measure of skills like literacy and numeracy, but it is not the goal. But the way we measure our schools becomes the thing they optimise for. So they optimise for Naplan, for ATAR, for attendance, when we need to optimise for empathy, for critical thinking, for literacy, numeracy, data literacy, and problem solving skills.

When I first started teaching at a secondary school, we were designing a new year 10 subject, and there was a bit of a tug of war around the content. I kept trying to pull us back to the big picture: what is the goal of this subject? But the head of faculty wasn’t interested in that. He didn’t see the point. So we never actually defined the purpose, and the tug of war continued. Which is a shame, because identifying the goal would have answered a lot of questions about the subject. Those big, purpose questions can help to steer us in the right direction.

So I’d like to ask you: What is the purpose of our education system? I want you to sit with that question for a moment, while I tell you my story, and then we’ll come back to it.

You’ve heard that I am the Founder and Executive Director of the Australian Data Science Education Institute, or ADSEI. I got here via a somewhat roundabout route. I have a PhD in Computer Science Education, and the goal of my PhD was to figure out why learning to program is hard, and how to fix it. With hubris almost reaching tech bro level, I started by designing a new programming language, and by the end of my PhD I knew one thing for sure: programming language doesn’t really matter. Sure, some of them are hard to use, and some of them are worse, but what really matters, what makes all the difference, is motivation. If a student doesn’t care about learning programming, it doesn’t matter what language you give them, they’re not going to get very far.

After my PhD I was an academic for a while, doing research in the same field, and I got frustrated that the work the Computer Science Education community was doing was slow to translate into the classroom. I felt like I wasn’t having any real impact. And so I left academia, had a couple of kids, tried a range of different jobs, and eventually settled on high school teaching. Here, I could see my impact. I could trigger those magical “Ahah!” moments that take kids from thinking they couldn’t do this, to having a breakthrough and achieving a goal. It was a powerful drug!

While I was wrestling over the content of our year 10 subject, I had complete freedom to create something innovative in year 11, so we did what I thought were the most interesting parts of Computer Science. We had a unit on AI, where we started by trying to come up with a definition of intelligence (something the tech industry could usefully learn from). We did units on Data Science, Visualisation, Usability, and programming, among other things. This was in 2011 when it wasn’t common to teach this stuff at high school level.

Most importantly, we had a capstone project where students worked with scientists to meet their data needs. These were real projects, with real impact. My first class of year 11s had the choice of working on cancer research or Marine Biology. The scientists knew they were student projects and might not produce anything they could use in their research, but they also knew they had data and programming needs that they couldn’t fill by themselves, so they were happy to take a chance on some talented year 11s.

Two of the year 11s working on the cancer project produced a program that the researcher went on to use in his research, because it did a task in 20 minutes that had been taking him weeks. As they went into year 12 they were still working on the program, improving it, making it faster and more accurate. Their teachers told them to stop working on it, and to focus on what was really important: their ATAR.

Over the years, in addition to cancer research and Marine Biology, we did projects in Astrophysics, Psychology, Neuroscience, and Microbiology, among many others. I repeatedly had students who were completely disengaged in the year 10 subject, where they were doing “fun” stuff involving robots following maps, and making code to draw pictures, and other such pointless games, who would hit the year 11 subject and just burst into life. Suddenly they were working at all hours, giving themselves wholeheartedly to this opportunity to make a difference. It didn’t make me popular with their other teachers, mind you!

The year before I left that job, I had a student who wasn’t even enrolled in my year 11 subject come to me and beg to be allowed to work on the project.

It was clear that doing something real and meaningful was incredibly motivating, so I agitated for that in the year 10 subject as well. Teaching the same skills, but using real, meaningful datasets. Giving the students the chance to make real discoveries, and teaching them data literacy as well.

It took a long time to persuade the leadership team that this was a good idea – largely, I think, because most people who are not already data nerds are actually quite terrified of Data Science.

Eventually I persuaded them to test drive a short data science unit towards the end of the year. A low stakes trial. And it changed things radically.

The year 10 subject was always challenging, not least because the kids who had been programming since they were in the womb left the kids who were new to it massively intimidated, and convinced it was too hard for them. And doing “fun” things that these kids found anything but fun left them absolutely unmotivated. It wasn’t fun. They consistently said the subject was too hard, not interesting, and not relevant to their careers. They deeply resented having to do it.

But as soon as we started working with real data, relevance became obvious – it was built in, and they could see how they could apply it to any other data – so they now had a reason to stick with it, to push past the hard parts and get results. Now they were telling me that the subject was so useful, and so relevant to everything else they were doing. They were coming up to me outside class and telling me how they’d used what they learned in that class when they were doing their science projects, and in their maths exam. They’d accost me during yard duty, outraged, to tell me how they’d seen a graph on the news last night and it was SO MISLEADING, there was no zero on the scale!

Suddenly this irrelevant subject that they hated had a meaningful place, not just in their schoolwork, but also in their lives. Motivation went from being entirely absent to being built in.

It also helped that we didn’t start with code. We started with data literacy. There’s no point, after all, in getting them to make graphs before they understand what graphs are for.

Starting with data literacy had the bonus side effect of having everyone on a level playing field. Because, as it turns out, data literacy is rarely taught, and poorly understood. By the time these kids had got to year 10 they knew the Maths curriculum’s approach to graphs, which was technical and didn’t cover communication at all. So they knew that you use a bar chart for discrete data and a line graph for linear data. Which, from a communications point of view is actually not correct. Sometimes a line graph shows the trend of discrete data better than a bar chart.

Graphs are actually about communication. There’s no point in a graph being technically correct if your audience doesn’t understand what it’s supposed to say. In 2020 I saw a graph of covid case numbers published by the ABC, and it used a log scale. Now, a log scale was technically correct in this instance, but hands up if you intuitively read a log scale? You’ll notice my hand isn’t up. We see a line on a graph, we interpret it as though the scale is linear, because that’s what we’re used to. It’s how we understand line graphs. So technically correct is important, but it’s not the only thing you should be thinking about when you’re making graphs.

While I was developing the Data Science unit, one of the other teachers for the subject, a Maths teacher, kept bugging me about why I was teaching graphs. “We already teach graphs in Maths, there’s no point. It’s a waste of time.”

But then she saw me teach the content and suddenly she caught fire. “OMG, this is so useful! I’m going to use this when I teach Maths!”

So. That first trial unit we used an election dataset that I found on the website of the Australian Electoral Commission. It was over three million lines of text, in CSV format, which was too big to even open in Excel. Gosh darn it, they were going to have to learn to code in order to extract the data. They each had to come up with their own question to ask of the data, so there were rich conversations about what questions the data could actually answer.

They all wanted to ask “who’s the best party?” which was great, because then I had them try to define “best”, until we all gave up and agreed that the questions would have to be more specific. They asked how the people in their closest polling booth voted? How did the below the line votes fall when gender was taken into account? Where were Pauline Hanson voters mostly concentrated? Which party’s voters were more likely to follow the party’s how to vote cards? And many, many more.

Of course, as year 10s, they hadn’t actually voted themselves yet, so we had to go through the rules for senate voting. Once we had, the innocent darlings assumed that voters followed those rules, which led to broken code and wonderful conversations about what you can (and can’t) assume about people, and rules, and data (not much, as it turns out, aside from “it’ll be messed up”).

Most students only needed a tiny bit of code to extract the data they needed, but there was a fair bit of thinking required to figure out exactly what data would answer their question, and how to get it. Those who were new to programming had the opportunity to achieve success (teaching kids to believe that programming is something they are capable of doing is MUCH much harder than actually teaching them to program, and it’s a necessary pre condition).

Those who had been programming forever could take the challenge and run with it, creating sophisticated machine learning programs to crunch the data in a myriad of ways. One student created a whole system to calculate the senate positions resulting from that csv file, which, if you know anything about senate voting rules, is EXTREMELY impressive, AND he went on to use it in subsequent elections.

Differentiation was built in. They differentiated themselves on interest (finding their own question) and skill. Most of the visualisations were done by hand, because we emphasised the communication aspect, and making the visualisations valid, interesting, and compelling, which even now, ten years later, is something that I have not found really good software for.

We talked about qualitative versus quantitative, and the way data can tell you what, but can’t tell you why. We talked about data quality, about communication, and about interpreting results.

After that unit, the number of girls choosing the elective year 11 Computer Science course doubled, and the number of students overall increased by about a third. It was changing the way they viewed data, and programming, and their future.

Having developed this new course that made sense to the students, was relevant, had built in motivation and differentiation, and achieved a whole new level of engagement, both in the year 10 core subject and the year 11 elective one, I quit.

I quit because I wanted all students to have these kinds of experiences, not just the kids in my classes. But the only reason I could build these projects was that I was half time, so I was able to use my own time to research and design the projects. Of course, it also helped that I have a PhD in Computer Science – not exactly a typical skillset among teachers.

My first thought was to look around and find an organisation that was doing this kind of work and join them. But the more I looked, the more I realised there was a significant gap in both skills and opportunities in data science education. I started the Australian Data Science Education Institute to try to close that gap. To build resources, train teachers, advocate for change, and, ultimately, put myself out of business. The goal was not to build an empire, the goal was to effect change. I created ADSEI as a charity, to ensure that funding was never a barrier to access.

And initially the goal was to build data literacy and stem skills, but over time it has become clear to me that it’s not so much about particular skills. The really crucial gap in education is critical thinking. Because we say we teach it, but we teach it the same way we teach data science. With toy examples, and problems you can look up in the back of the text book.

Data seems straightforward, but it typically needs a lot of interpretation. Unfortunately, the extremely sanitised datasets, the toy examples we use, typically only have one possible interpretation. Just look at any data science teaching examples online. They are simple, neat numbers that make simple, neat graphs and provide simple, neat answers. So there’s a correct answer they can look up in the back of the text book. You don’t need data literacy to teach with these toy datasets. There’s only one way to graph it, only one way to interpret it, no critical thinking required. No thinking at all, really, just rote application of standard techniques. But real datasets have multiple interpretations.

My body is a festival of chaos, and I have a close relationship with my physiotherapist, so I’m going to use a physiotherapy metaphor here. Teaching Data Science with toy datasets is like teaching people how to engage their core muscles by lifting a small teddy bear. it’s nonsensical. they get out into the real world having never lifted anything more substantial than a teddy bear, and the first time they lift anything heavy they pop something. Now that core strength matters, they haven’t got it. They don’t even know what it is, how to get it, or how to use it effectively once they have it. Teaching Data Science with toy datasets is just like that.

When my kids were little, they’d often come home with tales of woe about someone doing something upsetting… and then they’d say something like “she obviously hates me” or “my teacher obviously thinks I’m dumb” I used to ask them to come up with 3 other possible reasons for the behaviour they saw, on the basis that, although we can see someone’s behaviour, we can’t see inside their heads, and any “there’s obviously only one reason why they would have done that” story was a lie told to us by our insecurities. We can see what people do, but we can’t see why. Maybe he was having a bad day. Maybe she was in pain. Maybe they weren’t looking at me, but at someone behind me. Maybe they were frustrated with themselves, not with me.

It’s not often taught in Data Science courses, but we need to do the same with data. Because data, particularly quantitative data, can tell us what, but it can’t tell us why. When we come up with an explanation for what we see in the data, it’s important to remember that it’s just a guess, and there could well be more guesses that fit the patterns that we see. It could be a highly educated guess, but we can’t be sure it’s the right one. Or even a right one. So if we want to get the best out of our data, and understand it really well, we need to learn to question our own work. To be sceptical of what we think we understand. To test it, challenge it, and try to disprove it as much as possible. Which leads to questions like “If that were true, what else would we see in the data?” and “What other data could we collect, that might prove this idea wrong?” Not to mention the most important question of all: what’s wrong with this data?

We often hear Science described as a way to prove things, but Science, at its best, is not about proof of our rightness. It’s about trying our hardest to prove our wrongness. It’s about testing things carefully, rigorously, and thoroughly, to try to find flaws in our understanding. To try to find something that does not fit what we think we know.

Science, in fact, never proves things. It only disproves things. And the things we think we know are all up for challenge, if – but only if – new information comes along that doesn’t fit.

What we’re talking about here is rational, evidence based scepticism. Critical Thinking.

And the great news is that when you teach students using authentic problems in their own communities, using real datasets, critical thinking is built in. Because there’s no such thing as a perfect answer to a real problem. Solutions to real problems help some people, and harm others. They improve some metrics, but worsen others. They are messy and complicated.

Even measuring real problems in the first place is messy and complicated – are the measurements I took today representative of every day? If you’re measuring traffic on a particular road, is this a “normal” day? Is it a weekday or a weekend? Is it a rostered day off on nearby building sites? Is the local high school offsite for its swimming sports today? Is the nearby university closed for semester break? Is it raining, or is there a public transport strike? Does the local train line currently have buses replacing trains, so more people are choosing to drive? What is a “normal” day anyway?

When you implement your solution and then re-measure, in what ways are the measurements likely to be different to the originals, aside from your solution? For example, if you’re collecting and counting litter in the yard, was one of them just after a windy day when some of the litter might have blown away? Or were the grade 5s out on camp that week? Was there a rainy day timetable the day before so no one was in the yard? What other differences could there be? More answers you can’t look up in the back of the textbook!

So not only are you trying to critically evaluate how your solution works, and where it falls down, you’re also having to critically evaluate the measurement process. And the analysis. And the communication. There’s no part of this that’s rote application of textbook formulae.

This idea of critically evaluating your own work is incredibly powerful for a number of reasons, but here are two you might not have thought of:

Number 1: Perfectionists and neurodivergent folks with Rejection Sensitivity get used to the idea that perfect is not the goal, and that finding flaws in your own work is a good thing.

If you haven’t met Rejection Sensitivity before, it’s a wonderful feature of ADHD that, among other things, makes it incredibly traumatic to receive negative feedback. None of us likes negative feedback, but folks with rejection sensitivity are constantly looking for proof that we absolutely suck, and negative feedback is the worst possible evidence of how awful we are. So making finding flaws in our work a good thing is incredibly powerful, because rather than get defensive and dig our heels in, now we want to make it even better!

Imagine your own workplace, if finding a problem with someone’s work was a success moment, not a failure moment. Because we assume flaws exist, so finding flaws means solving problems. It’s making progress. It’s making things better. How would your workplace change if that was the default?

I’ve seen techniques like saying FAIL is First Attempt In Learning – trying to reframe mistakes as learning opportunities, which is great. But as long as you continue to mark mistakes down, you’re giving the lie to that reframing. You’re making finding mistakes costly. When you actually reward the finding of mistakes in their solutions, then you change the game.

And number 2: We produce generations of adults who are used to critically evaluating their own work, and the outcomes of programmes, and who bring that to the workplace, and to government.
Let’s be real, hands up if you’ve ever worked somewhere where implementing the latest, greatest education or management fad was high on the agenda. Now keep your hands up if evaluating the results of implementing that fad was on the agenda at all.

On my podcast, Make Me Data Literate, a while ago, I interviewed Jarrod Hughes, Impact & Evaluation specialist at the Aurora Foundation, and he talked about how the government’s measures of success in education don’t necessarily match the Indigenous Communities’ priorities for education. I’d like you to listen to this bit:

“Some of the really strong themes that have been coming in this work are a real emphasis on social and emotional wellbeing above kind of just narrowing sets of academic outcomes. So things like social connections, connections to community, emotional wellbeing, those kinds of things. And then we’re also thinking much more about what it might look like to measure cultural outcomes or cultural related outcomes. So things like students opportunities to learn about their culture, their sense of pride and strengthening their culture and other kinds of outcomes along those lines. “

Hands up if you feel like those outcomes are important to all communities, as well as Indigenous ones?

So I asked you a question earlier – what is the purpose of our education system?

I suggest that it’s to support our young people to

  • Grow and flourish, physically and mentally
  • Think Critically
  • Solve Problems
  • Build a compassionate, inclusive, and equitable society


The Mparntwe Education Declaration from 2019 has as one of its key goals that:
All young Australians become:

  • confident and creative individuals
  • successful lifelong learners
  • active and informed members of the community.

I like that. I just don’t think we’re achieving it.

In fact, I think we’re sometimes doing the opposite. We’re so busy teaching all the facts we think are important that we don’t have time to teach the thinking. And Science actually isn’t about facts. It’s ALL about thinking. Remember at the start of the Covid19 pandemic when we were all very zealous about washing our hands, and very conscious of things we touched that might have virus on them? Hand sanitiser was EVERWHERE. And remember how it turned out that Covid was actually airborne, and that the riskiest thing to do was actually to breathe the air in the same room as someone with the virus – or even in a room that someone with the virus had been in up to two hours earlier? Remember how this was taken by some folks as proof that Science was untrustworthy, because the Science had changed its tune?

We know, of course, that Science changes its tune all the time, and that this is, in fact, proof of Science doing what it should – changing our understanding based on new evidence. Unfortunately, this is not the way we teach it. All too frequently, we teach confirmation bias and adherence to orthodoxy.

This actually cost lives during the pandemic.

Let’s talk about Linsey Marr, an Aerosol Scientist from Virginia Tech who also studies infectious diseases. In 2011, Marr tried to publish her findings from a study she conducted that sampled the air in various public spaces. Those samples found the flu virus where doctors said it couldn’t be – suspended in the air. At that time the accepted orthodoxy was that flu was transmitted by droplets, which do not remain airborne. In other words, to catch the flu you need to pick up an infected droplet from contaminated surfaces, or be in the direct line of someone’s cough or sneeze. Marr proved that there was actually enough flu virus suspended in the air to give people the flu. The paper was rejected as wildly implausible, because the accepted orthodoxy was that only particles of 5 microns or smaller could remain airborne.

Marr persisted, and when the WHO insisted that Covid19 was not transmitted by aerosol particles, she and many others beat their heads against the brick wall of this entrenched 5 micron theory. Eventually it was proven that Covid19 is actually primarily transmitted by airborne particles, and Marr’s work was vindicated – but not before hundreds of thousands of people died unnecessarily, due to the WHO’s mistake about how the virus was transmitted.

The kicker to this story is that the rigid 5 micron idea came from a study of Tuberculosis in the 1930s, which showed that only particles smaller than 5 microns in the atmosphere could actually lead to Tuberculosis infections. The study was accurate, but misinterpreted, because Tuberculosis needs to lodge deep in the lungs to cause infection, and only tiny particles (smaller than 5 microns) can make it that deep into the lungs. But most viruses don’t need to go anywhere near that deep in order to cause infection, which means that much larger particles can be dangerous.

So the original study didn’t even show that only particles smaller than 5 microns could remain in the air. It showed that only particles smaller than 5 microns could transmit Tuberculosis. But that “fact” was rigidly applied in medical Science for nearly a hundred years, until Marr and her fellow heretics successfully challenged it. And though they’ve won the battle with respect to Covid, I bet many other diseases are still being treated according to this “fact”.

Science doesn’t fail when it “changes its tune”. Science fails when we believe in “facts” instead of theories, even when those “facts” don’t fit the evidence.

We teach facts a lot more than we teach examining the evidence. We might say that Science is about examining the evidence, but kids are very astute in picking up the disparity between what we say and what we do, and unfortunately what we do is we assess the kids based on facts and known processes. We teach them that Science is done by doing experiments we know the outcomes of, and by looking up the answer in the back of the textbook. We teach them that the important thing is to get the right answer on the exam.

My friend Kathryn is a fantastic Physics teacher, and during the Covid lockdowns she went in to school to film herself doing a thermodynamics experiment, meticulously recording all the data as she went, so that even though the kids couldn’t do the experiment themselves, they could at least see her doing it, and still have real data to work with. It was a simple experiment, where she had two samples, one cold, one hot, and when placed in close proximity, their temperature gradually equalised. But something interesting happened when the samples got close in temperature – they actually crossed over for a moment. The cold thing went PAST the hot thing, instead of coming to equilibrium and stopping.

Kathryn was overjoyed, because this was a weird result, and there was so much to talk about. Such a rich conversation to be had about why this unexpected thing had happened. Was it an issue with the calibration of the temperature sensors? Was there something in the environment that the experiment didn’t account for? What a great opportunity for learning!

Unfortunately, Kathryn’s fellow teachers were appalled, and demanded that the anomalous results be deleted from the data, lest they confuse the students and lead to them putting the wrong answers on the end of year exam.

And, given that this was a VCE subject, from the point of view of how we measure our kids, and our education, those teachers were right.

And how depressing is that?

Science is actually about questioning, and understanding. It’s not about knowledge. But that’s not how we teach it.

Years ago I got into an argument with Ted, the leader of the faculty I was working in at the time. (Names have been changed to protect the infuriating.) I argued that the course we were teaching was fundamentally flawed. The students we were teaching were not learning what we wanted them to learn, and they couldn’t see the point of the subject. Ted argued that the course was great, the kids loved it, and we didn’t need to change a thing. I wanted to run an anonymous survey to find out for sure, which he finally agreed to – with one small flaw. He didn’t make it anonymous. Out of 200 students he got 20 replies, and, what do you know? They said “the course is great, it doesn’t need changing”.

Ted was outraged when I suggested that collecting the kids’ identities made it unsafe for them to say what they truly believed, directly to the teacher responsible for the course. We argued a lot, and in the end we asked a neutral party – a teacher from another faculty – to run a focus group. Now the kids were reporting on the subject to someone who was uninvolved in it, and uninvested in the outcomes. And what they said was horrifying to Ted, but no surprise to me. They hated it. They couldn’t see the point. Some of the kids in the focus group had also answered the survey, but their responses were very different.

That survey did not answer our questions. What we wanted to know was “what do kids really think about this course?” and what we wound up asking was “what will kids say about this course when asked by the guy who designed it, and when he knows who they are?”

Had we based the continuing shape of the course on that survey feedback, we would have been basing our ideas on data that didn’t say what we thought it did. This is one of the big problems with all data, not just assessment data. No dataset is perfect. No collection technique is foolproof. It’s very easy to ask leading questions on a survey that get you the result you are hoping for (such as: “how awesome was the course?” Rather than “how did you feel about the course?”), to survey a subset of people that don’t represent the entire population, or to assume the data is complete when there are significant parts missing. So we need to be super cautious when we use data to justify our actions, and to shape changes in our systems. Does the data say what we think it does? How can we be rationally sceptical and test our assumptions?

Our education system is heavily measured. Between standardised testing such as Naplan and PISA, external final year exams, and all of the performance measures imposed on teachers, we are measuring outcomes constantly. Unfortunately, we don’t seem to pay a lot of attention to the question of whether those outcomes are the ones we really want to aim for.

We mostly seem to be looking for assessment reliability – will we get the same result if we do the same test again? – rather than validity – are we measuring what we think we’re measuring, or, indeed, asking the question: what we should be measuring? And that, in a nutshell, is the issue with education.
We shape our education systems to maximise outcomes. Unfortunately, the outcomes we are trying to maximise are PISA scores and exam results. In an ideal world, these would be measures of learning but they don’t always measure what we think they do. The other issue, of course, is that we also shape our education systems to minimise cost.

Any dataset we work with has flaws. Usually, they are not exactly the information we want, rather they are as close as we can easily/cheaply/quickly get to that data, or simply the data that we have access to. For example, population datasets are nearly always data from a sample of the population, rather than from everyone, which means some people will not be represented by that data.

Some of the data we want is just not easy to measure. We want to measure kids’ learning, so we have them sit exams. They are great for measuring recall of facts, and application of known procedures. They are rarely used to measure problem solving, creativity, or ethics – the attributes we consistently say we care about. Plus they don’t necessarily even measure recall very well, depending on how well the exam was written, what the conditions were on the day, how good students are at doing exams, etc. We tend to use exam results as a proxy measure for learning, especially when we use those results, for example, to decide who gets into a particular university course.

Unfortunately in education we sometimes forget that what we are measuring is not actually what we want to know. We tend to shape our education to be measurable, rather than to be meaningful.
All datasets have issues like these. The challenge is to identify the issues, and take them into account when we’re using the data to shape our future.

The end product of the Australian school system for many kids is the ATAR, and which university course they can get into with it. A particularly disturbing aspect of this focus on a final ranking is that kids often choose – and are encouraged to choose – subjects in which they are more likely to do well, rather than subjects that they are actually interested in.

The other obvious problem with this system is that what we are doing is training kids to be very good at exams. We then use how good they are at exams to select the courses they will do, and those courses mostly use exams to determine how well they do in those courses. So we are selecting kids to train to be engineers, doctors, architects, lawyers, teachers, scientists, etc, on the basis of how well they do in exams. And then we rank them as engineers, doctors, architects, lawyers, teachers, scientists, etc on the basis of how well they do in exams. And then we send them out to be engineers, doctors, architects, lawyers, teachers, scientists, etc, where they will be required to do a huge range of things that bear no resemblance to sitting exams at all.

If we truly want creative, ethical, rational, critically thinking problem solvers, then it makes sense to ask if our school system is actually producing kids with those characteristics. It’s not clear that we’re even turning out kids who value these kinds of characteristics. The system currently runs on marking criteria and constrained outcomes that punish the kind of kids who see a problem with the assignment definition and create a whole system to solve that problem. The kids who misinterpret exam questions because they think laterally. The kids who solve problems differently and come up with creative solutions, but that don’t fit the rubric. The kids who write more, like my year 11 student, who struggled so much with the word limit on an assignment that he made a separate web page to add in the in-depth data that explored the topic in so much more detail. We teach kids not to do that. We teach them to meet the criteria and stop. Do the minimum.

Some systems even punish kids for exceeding the word limits. My son is studying Literature at university and he is literally marked on being within 10% of the word limit. As though the length of an essay is any measure of quality. “Oh, but he’ll need to learn to meet word limits in the real world.” Will he, though? Are government reports ever judged for being too short?

We wrap our assessments with these arbitrary metrics not because they are truly important, but because they are measurable. And the idea that we can take a student and judge them by a single number – be it ATAR, or Naplan level, or whatever – is nonsensical.

Of course, judging students by their marks is foolish. It is an attempt at objectivity. If we make decisions based on a number, then surely we can’t be accused of bias, or prejudice. It is a way of wriggling out of the ethical and emotional complexity of a decision. How do we choose which kids are given the chance to become doctors? How do we choose which kids could become engineers? How do we select teachers, nurses, or data scientists? We remove ourselves from any possibility of making it personal – no-one can say “She just didn’t like me”, or “He didn’t like the colour of my skin”, or “He didn’t let me in because I’m a girl.” It’s all down to this simple number. Objectivity guaranteed.

But if there is a correlation between socioeconomic status and exam results… if girls are driven out of particular subjects by the perception that they are not suited to them… if rural kids don’t have access to the same range of subjects… if some schools don’t have great teachers or support structures… then what we have is the pretence of objectivity and fairness, rather than actual objectivity and fairness.
It’s time we rebuilt the education system into something that gives every child the safety, support, encouragement, and motivation, to reach their full potential. That lays the foundation of a world that is evidence based, rational, and compassionate.

You might all be wondering why I haven’t spent this talk ranting about AI – you will find many rants on the ADSEI website. I spend an irritating amount of time dissecting the hype and wild claims around AI. On Monday I recorded an episode of my podcast about the environmental impacts of the AI based data center craze, and whether the value of Chatbots is worth the price we will all pay. But actually, I think the problem of AI is a symptom, not a cause. It has grown out of the deeper problems of our society, and they are largely rooted in a catastrophic failure of critical thinking.

If we were all thinking critically, being rationally sceptical, and evaluating our systems systematically, we’d have rebuilt the education system already. We’d have tackled climate change decades ago. Income inequality wouldn’t have had the opportunity to develop. And we’d be focusing on building the types of AI that solve real problems, rather than the types that encourage psychosis and make billionaires rich.
It all comes back to education. Because to come back to the goals of education I proposed earlier, we need to build a new, healthy education system that allows all of our children to:

  • Grow and flourish, physically and mentally
  • Think Critically
  • Solve Problems
  • Build a compassionate, inclusive, and equitable society

Will you help me do it?

Leave a Reply