The Power of Data Science

When it comes to learning, there’s nothing as important as motivation. I want to share with you some of my students’ comments on the impact of authentic projects using real data.

If you’d asked me what I thought I was capable of 6 months ago, the answer would be very different to now. From just one single project that really pushed the limits of what I could achieve, I have done so much more than what I thought I’d ever do. When provided with potential gigabytes of raw data, and the task to interpret it all, one’s abilities are truly pushed, and a level of understanding of how to algorithmically dissect data and create intelligent tools to make discoveries presents itself as an opportunity to learn and grow.

— Sean

What blows my mind, and I think most others, is that in the presence of a huge magnitude of information – that could not possibly be conceived by the human brain alone – a high school student with a few lines of python can interpret such immense data and create something innovative and frankly, wondrous.     

— Reena

I found this course was very interesting as we were able to analyse the data from real data sets. This is in stark contrast to what a friend of mine is doing at his school, which is textbook coding. I found that being able to apply things we learned during class to real data immediately helped me understand how the code worked.

— anonymous survey feedback

Authentic projects, with real, measurable impact, are accessible from primary school onwards. We tell kids to be the change they want to see, but then add “But not yet. You’re not old enough to effect change yet.”

But kids can measure, analyse, and change their environment, and their community, easily. How about collecting rubbish from the school playground and examining where the worst of the rubbish comes from? Is it clingwrap from school lunches, or the wrapper off a particular item from the canteen? How could we develop strategies to change that? Nude food days, or changing the food available at the canteen? Give kids the challenge to impact their own environment and they invest in the outcome – as well as the data science techniques they needed to get there: collecting the data (picking up and sorting the rubbish, recording the type of each piece), analysing the data (working out what the most common items are), visualising the data (presenting a graph, or more sophisticated visualisation), and then solving the problem.

Biology and Enviroscience? What about doing a bird or bug survey and look at their preferred habitats and food sources.

Social Studies & Politics? Let’s look at voting patterns in different areas – you can actually download vote data from the Australian Electoral Commission Website, or you can use existing analyses from organisations like the ABC. (Or this one, which is really cool and interactive.)

You could look at census data, climate data, or live parking data for the Melbourne CBD.

You want to explore it in class? There’s a dataset to make it possible. ADSEI’s mission is to support the teaching and learning of Data Science in schools, so if you want help making this happen in your school, contact us at today.


Authentic projects, Authentic Motivation

This Monday 8 John Monash Science School year 12 students will present their Computational Science projects from last year as a poster at the Lorne Genome Conference.

These students worked with Dr Sonika Tyagi, the Monash Bioinformatics Platform Manager, to develop software that identifies micro RNA sequences, and works out their likely structure.

They worked in two teams, so that one group worked on the identification, and the other on the structure. They took experimentally verified results and used them to train their software, using machine learning techniques.

Their project was submitted in October, for credit in their year 11 Computing class, but both groups are continuing to work on the software to improve it. They are keen to make the software faster and more accurate. And the key reason that they are still working on it is because it’s not a toy project purely for classroom credit. It’s a real project that has the potential to have an impact in science.

Most of the students in these groups had no background in Biology, but they were keen to learn more about machine learning, so out of the range of projects on offer, they chose this one as an opportunity to learn new skills and produce something useful. In fact one of the students in the project wasn’t even studying Computing. He was just really excited by the opportunity to work on a real and challenging project.

When I was teaching I ran projects like this every year. Not every student produces something that goes on to be used, but every student has the chance to work on real projects, with real data, and real outcomes.

Every year I had students who kept working on their projects long after the subject finishes. Every year the students consistently rated the projects as both the most challenging and the best parts of the course. It was HARD. And they loved it.

This is the beauty of Computing, and of Data Science. Kids can do something real, and have an impact, while they’re still at school. Even Primary Schools can run Data Science projects that have an impact on their local community. I’ll write more about that soon.

Imagine what a difference we could make in the world if all students had opportunities to do real projects and learn Data Science and Computational Skills from the start.

Mixing Data and Politics

One of my early Data Science projects in class came about because of a student’s fascination with politics. Being obsessed with politics myself, this student, Jack, and I had lots of conversations about the federal election as it happened. During one of these conversations, Jack told me that he had been playing with the data on the Australian Electoral Commission website.

When I went and had a look, I discovered that it was possible to download a spreadsheet file that contained every single vote in the federal election. So I downloaded the Victorian Senate votes. Each line in the file contained the electorate, polling booth, and box-by-box information about how each voter had numbered their ballot paper. Obviously the votes are anonymous, but there’s a wealth of information in that file that had my head spinning.

You could then analyse the data to find out not only the simple stuff like who got the most first preferences, but more complex questions like where did people who voted 1 for The Australian Greens put their 2? Or how many people voted below the line? Was there a difference in the average first preference of people who voted below the line or above it?

(If you are not familiar with the Australian voting system and these terms aren’t making sense to you,  you can read some of the details here. It’s bizarrely complex.)

It was a tricky one because we had to first understand the format of the data. The boxes on the ballot paper are not numbered, so it took us some time to match each box with the  correct field in the spreadsheet.

It was great, too, because being a real dataset, it didn’t always follow the rules. In the Australian Senate you are only allowed to use the number 1 once, and you either vote below the line OR above it. Not both. Some voters, though, had not followed the rules, so an analysis that assumed valid votes was doomed to failure. It was a great lesson in the complexity of real datasets.

I had the students first come up with a question the data could answer – and that was fascinating to start with, because some would ask questions such as “which is the best party?” which, of course, the data cannot answer. But it can answer which party got the most votes.

Some students chose to explore the differences between rural and urban voters, which necessitated finding a way of categorising a particular electorate – an interesting can of worms in its own right.

Some looked at the voting patterns of their own electorates. Some looked at which party’s voters followed the How To Vote cards provided by their parties – now that was an interesting one!

This idea of students taking a rich and complex dataset and exploring the questions they find most interesting is a really powerful one. It provides built-in differentiation, and gives the students a lot of motivation when they have the freedom to explore.


Welcome to the Australian Data Science Education Institute

We are a not-for-profit organisation using Data Science to engage students with technology and give them the tools they need to create real and positive change in their communities.

ADSEI aims to support teachers to use Data Science to create genuine learning opportunities within their own disciplines. Not only does this provide students with the opportunity to learn about real issues within their communities and create solutions, it also ticks off the learning requirements of the Digital Technologies Curriculum.

Join us as we explore the fascinating world of Data Science and change the world, one project at a time.