Kate Carruthers on Data Governance and the people side of data

Make Me Data Literate
Make Me Data Literate
Kate Carruthers on Data Governance and the people side of data
/

A fascinating chat with Kate Carruthers, Chief Data Officer at the University of New South Wales, and Head of Business Intelligence for the Research UNSW AI Institute, on accidentally becoming a data person, data governance, and the people side of data.

” Oh, I don’t have any formal education in this area. So I don’t even know how this happened to me. I started out doing an arts degree. So I was doing history, anthropology and philosophy. I have no idea how this happened to me.”

“And the thing is it’s different for every organisation because every organisation is a unique special snowflake.”

“Behind every business problem is a human being with some kind of need. And if we understand that, we can solve it and increasingly now every business is a data driven business, but you can’t let data be the only thing you need to, we need to focus in on the human problems we’re trying to solve. And that’s probably one of the things that is making me really think about this AI revolution that’s happening now. And a lot of people seem to be putting forward crazy solutions and not keeping the human being with a real problem and real needs in mind with their solutions.”

Transcript

Linda: Thank you for joining us for another episode of Make Me Data Literate. This is going to be great fun and I don’t know why it’s taken me so long to get Kate onto the podcast.

But welcome. Who are you and what do you do?

Kate: Thanks for having me, Linda. I’m Kate Carruthers.

I’m the Chief Data Officer at the University of New South Wales in Sydney, Australia.

And I’m also the Head of Business Intelligence for our Research UNSW AI Institute.

Linda: That is so cool.

What does the Chief Data Officer do?

Because it sounds like a job that could be many, many things.

Kate: Many fine things.

And the thing is it’s different for every organisation because every organisation is a unique special snowflake.

So for me, I always conceptualise the data at UNSW in three sort of realms.

First of all, there’s the enterprise slash administrative data, how we run it as a business.

Then there’s the learning and teaching.

And then there’s the research data of which we have every conceivable kind of data you can imagine.

So I work across all of those areas.

I look after things like data governance.

And I also have a team that does BI and AI and ML.

Linda: Oh boy, that must keep you on your toes.

Kate: My team is fantastic.

They are so good.

They just keep coming and saying, we’ve just done this thing that’s amazing.

And I’m like, oh my God, this is amazing.

Linda: That sounds fabulous.

You’ve obviously got a really nice set up there.

What did you have to learn to do this job?

Is there anything missing from your formal education?

Kate: Oh, I don’t have any formal education in this area.

So I don’t even know how this happened to me.

I started out doing an arts degree.

So I was doing history, anthropology and philosophy.

I have no idea how this happened to me.

Linda: That’s an interesting segue.

So how did you bridge that gap?

How did you wind up here?

Kate: I ended up working in a charity, a not-for-profit, the National Trust of Australia.

And I ended up being in charge of IT for them and started having to manage things like databases and stuff.

And that was the start of my journey.

So I had to learn on the job and discover that I was quite good at it.

And then went on to work on big data warehousing projects for big organisations like finance companies and banks and insurance companies.

Linda: It’s so interesting to me.

I’m not sure that I’ve ever interviewed anyone in data who actually had qualifications in data and set out to work in data.

It’s always been some kind of happy accident.

Hopefully happy.

Do you feel like there is a kind of formal, part of your formal education that really helped in this role?

Or was it really everything from scratch?

Kate: oh no. So I actually found the arts stuff that I did was really formative for me because it taught me how to think and it taught me how to write.

And back in my day, we were sort of the tail end of the Oxbridge style tutorial systems in the big Sandstone universities.

So I was the beneficiary of that where you had quite small tutorials and you’d have to write papers and read them and people would just tear them apart in front of you.

And it was quite character building.

But it was also quite good because you had to be able to defend your ideas on your feet in the room.

So it built your skills to think on your feet.

Linda: That’s one of the things I love about the projects that I build is the idea of actually having to critically evaluate your own work and go in and think about what was wrong and what was right and actually apply that critical thinking that we don’t seem to often do with the kind of the way we do education now with textbook – Did you get the right answer? Yes, no.

Kate: It’s very much a part of sort of the world of an arts degree because every assignment basically starts with critically evaluate something.

And when we do IT or engineering education, it’s very much do you have the right answer or not?

So it’s a different kind of focus.

Linda: Yeah, I don’t think it has to be. I guess there are historical reasons why the two, the disciplines have kind of evolved in those separate directions.

Kate: Well, you know, it’s pretty basic.

In engineering, we want things to work.

So we want people to be able to build bridges and buildings that don’t fall down.

So there’s an awful lot of factual material that they need to cover and they need to understand it and how it all hangs together.

So it’s a really big thing that we focus on in engineering education to make sure that people understand the facts that they need and the knowledge that they need to master to be able to be good engineers.

So, you know, having worked in engineering education for the last 10 years, I’m really conscious of the fact that we want our engineers to be able to be certified and then to build solid things.

Linda: Things that don’t fall down.

Kate: Yeah.

Linda: You’ve obviously worked in a range of different data roles in a different context now.

Is there something, any kind of one magic bit of understanding that if everyone got this about data, your life would be easier?

Kate: Look, I honestly think it’s not about data and what we need to really be doing is focusing in on what are the business problems we’re trying to solve?

What are the people’s problems that we’re trying to solve?

Because behind every business problem is a human being with some kind of need.

And if we understand that, we can solve it and increasingly now every business is a data driven business, but you can’t let data be the only thing you need to, we need to focus in on the human problems we’re trying to solve.

And that’s probably one of the things that is making me really think about this AI revolution that’s happening now.

And a lot of people seem to be putting forward crazy solutions and not keeping the human being with a real problem and real needs in mind with their solutions.

Linda: I’ve been thinking about that a lot lately.

There seems to be a type of person who I sort of value and collect, who sees the big picture and remembers why we’re here and is driving towards that ultimate goal.

In education, the big picture is the students as well as the society that we want to create.

But it’s very easy to get head down into the one, the nitty gritty of the task that you’ve been given or the thing that’s in front of you and forget the big picture and that actually maybe this isn’t even the right problem to solve.

Kate: Yeah, yeah.

Linda: Putting skills in context.

Kate: I’ve worked with people who, you know, obsessed with their particular data model and perfect endless perfection of the data model.

And it’s like, actually, that’s not the most important thing.

Linda: Yeah.

Kate: But getting those right, getting those sort of fundamentals right in what you’re doing is actually important.

And it is an important practice in the discipline.

So there’s the people with the practitioners who need the kind of knowledge and understanding of what they’re doing as practitioners of a discipline, very much like engineers.

So, you know, we have data engineers and they have to be able to build robust solutions that can meet people’s needs.

But we need to be able to translate between our customers and our techies.

Linda: That’s the key, isn’t it?

Those people who can do that translation, who can bridge both worlds are immensely valuable.

Kate: Well, it’s one of the reasons why I make one of my team’s KPIs.

Everybody in my team has a KPI of doing some public speaking, have to do two talks a year.

Linda: Oh, that’s great.

Kate: Because it’s positively correlated with career success and it also helps them do their day job.

So, you know, when they, when they’re talking to other people, the more confident they are in their public speaking, the more confident they are to articulate their ideas and have a dialogue with people.

Because, you know, the problem with a lot of data people and I am generalising here, but on the whole, data people are happy to never have to talk to anybody again in their lives.

Linda: True story.

Kate: So prevalence of people who prefer not to talk to anybody.

Linda: I grew up in a computer science department, so I understand the type of person you’re talking about.

Kate: Or, you know, they love COVID.

They didn’t have to leave the house.

Linda: Yeah, the irony in our house was that of the four people in this household, the one person who did have to leave the house because his work was considered essential was the engineer who would really be quite happy not talking to anyone.

He was the one who was getting out and about and seeing people.

It was, it was very unfair for the rest of us.

What are some of the worst data mistakes that you’ve seen?

Kate: I think the biggest mistake that I’ve seen people make is not understanding that they actually need to collect data.

They don’t, a lot of, hold on, I’ll rewind.

Let me go back to first principles.

Some people in the world do not understand that for data to exist you have to collect it.

So I once had somebody come and ask me, we’ve done this new process.

We’d like to see how it’s going.

And I said, that’s fine.

Where’s your data?

They said, what, no, you’ve got the data.

And I was like, no, no, no.

Did you benchmark your process before you improved it?

And where did you put that data?

And how are you tracking new process so that we can do some analytics on it?

And they were like, they did not understand that to have the data for analytics, you needed to collect the data.

Linda: Wow.

Kate: So, and that was a fairly senior business person that had that misunderstanding.

So there’s, so you can just make some really big assumptions about the data literacy of people that you’re dealing with.

And the other thing that the other big thing that I see is the other side of that is people collecting all of the data just in case we might need it.

And we’ve seen with things like Optus Medibank Latitude data breaches, the egregious practice that we should stop doing.

So we should stop doing that.

But it would be good to lift the data literacy of those people who really have no idea how analytics happens.

Linda: Yes, definitely.

Just there’s a level of magical thinking, I think, that goes with the idea that data science is too difficult for us mere mortals to understand.

And therefore, it’s magic.

And therefore, it can magically do all of the things we just have to point a data person at our question and it will be answered.

Kate: Yeah. Yeah.

It’s fascinating how, like I genuinely think that data science has totally been overrated in, because, you know, I look at the care and feeding that most data scientists need.

And they’re not, they’re not really very useful because they don’t know how to get the data.

They don’t know how to manipulate the data.

So, you know, I see a combination data engineer, data scientist as a much more valuable member of the team than a pure play data scientist who needs to be fed.

Linda: Like a machine that you can feed data in one end and turn the handle and get the data out of the other.

Kate: Well, you know, like one of the things we discovered when we first moved to the cloud, and we thought that data scientists would want raw data.

And actually, they don’t know what to do with raw data because they don’t understand the data.

So they want semi structured data.

So they were like, oh, well, we don’t understand this raw data.

So, you know, one of the classic ones in the university is we do all these offers so students apply and then we give them offers.

We’ll say, you can come and study here and do X.

And sometimes we say you can come here and study to do X if you get a certain score in your English test and conditional offers, they’re called.

And they want all the conditional offers categorized for them because they don’t know that code X, Y, Z, one, two, three means a certain thing.

So they don’t know that.

So they want that.

So they need the data semi structured, which is hilarious to the data engineers who are just going, but why don’t they know that about the data?

Linda: The idea of understanding the context is something that is fascinatingly absent from a lot of data science courses.

I know people who’ve come through and never seen a real data set and believe that they can simply apply cookie cutter processes to, you know, standard data and turn that handle and crank out the results.

And like, well, you can’t do that because how do you know if it’s valid if you don’t understand the context and understand the data that you’re dealing with and how one field relates to another?

How do you … you can’t just blindly apply statistical processes.

It doesn’t work that way.

Kate: Yeah.

And that’s the other thing is understanding the relative weights of the various variables and understanding the, like you said, the context.

So the organizational context is so important.

You know, I can remember when I was in banking, you know, we did a lot of stuff back in the day building behavioral models.

And it turned out that there were only a handful of really important variables in our statistical modeling.

And you could throw most of the rest of them away.

Linda: Yeah, it’s interesting that isn’t it?

And it’s one of the things that worries me about the machine learning slash AI trend that you, you know, you basically tip this congealed massive data into the funnel and out comes meaning, well, how do you know that it’s meaning?

How do you know that it’s real if you don’t understand the context and understand the processes that are going on?

Kate: Absolutely.

I keep joking that most modern AI seems to be a bunch of transformers in a raincoat standing on top of each other.

Linda: I like it. Six transformers in a trench coat.

Kate: Yeah, pretty much, pretty much.

So it’s like people come to me and sort of go, well, what, chat GPT can do this?

And I was like, how would chat GPT actually know that?

Does it, where does it get that knowledge to answer that particular question from?

And they’re like, Oh, I don’t know.

I haven’t thought about this yet.

You haven’t thought about this. Have you?

So people are not really understanding what sort of things that they’re going to need to put together to get sensible and real answers out of this technology.

Linda: Yeah.

Yeah.

And, and, you know, it’s back to that magical thinking that we just have to poke the AI in the right way and it’ll tell us what we need to know.

Well, how can you be confident in that result if you, you know, if you don’t understand the processes underneath and you don’t know the data that’s gone into it in the first place?

What’s, you know, the, I think the whole explainable AI thing has a long way to go.

Kate: Yeah.

And I did a session on my podcast with Fiona Tweedie where we’re talking about that.

And the consensus was, yeah, we need to do a lot more work on that.

But I think one of the things about this is people, people are assuming, like you said, it’s almost magical thinking that data can do magic when data is just data.

It’s basically a nerd.

And we need to put meaning over it.

And, and, and that’s the role of the data professional is, is putting that sort of semantic layer over the data so that it is understandable.

Because if you don’t have that, and increasingly machines will be able to do it.

They’re not very good at it just yet, but it’s coming along very soon.

But right now, we need, we need humans to add that meaning so that we can make sense of it.

And then we can feed it into the AI models.

Linda: Yeah.

So that you get results that you’re actually confident, make some kind of sense and that, you know, take the context into account and all that sort of stuff.

Kate: Yeah. Yeah.

Linda: We’re definitely not there yet.

Have you ever seen data deliberately misused?

Kate: Oh, all the time.

This reminds me of something that I’ve heard a lot of older practitioners say, you know, about statistics, if you torture the data long enough, it’ll confess to anything.

You know, yeah, one of the, one of the things I always look at is, is have they fiddled with the axes?

Like that’s an oldie, but a goodie, you know, have they fiddled with the axes to, to make the data look misleading?

Linda: Yeah, I just did a blog post on that today.

Kate: Yeah.

Often you get really, you know, really strange when you look at it, getting really close and you’re like, Oh my God, look at that.

Look at those axes.

Linda: Yeah. Yeah. And sometimes it’s not even deliberate.

Sometimes, you know, you can, you can produce a scientifically correct graph that’s nonetheless misleading.

Like early on in the pandemic, the ABC published a COVID graph that they used a log scale, which is perfectly scientifically accurate, perfectly valid for that type of data, but no one in the general public can read a log scale.

It’s not, it’s not what we’re used to.

It’s not what we understand intuitively.

And in that sense, it was quite misleading.

Kate: I think too, that that’s, that’s really talking to the role of data practitioners in data storytelling of understanding the audience and crafting a narrative that the audience will understand rather than just what, what we might understand amongst ourselves.

And the other thing is, you know, a lot of times people don’t just apply the normal statistical tests against their data, you know, that just to make sure that it is valid, you know.

So, so that’s also a kind of thing.

It might, and it might just be partly because people are not statistically trained, or it might just because their people are sloppy, or it might be because they don’t want to answer that question.

Linda: Never ask a question you don’t want the answer to.

So you look for the axes when you look at graphs in the media, what else do you look for?

How do you, how do you spot the ones that are trained to mislead?

Kate: Well, what I always look at, apart from the axes, is the narrative they’re trying to tell, to see if they’re trying to push a certain angle, especially in the political space, because quite often people are quite partisan and they want to spin a particular narrative.

So I always look for what is the narrative and does the data seem to support the narrative and the axiom.

Linda: It’s interesting, isn’t it?

That idea of data as storytelling.

I interviewed Greg Jericho a while back and he said, when he, when he teaches data journalists, or when he teaches journalists about data journalism, he, he, the idea is that is the big idea that data is counting things and telling stories.

And we don’t think of graphs and visualizations often as communication, but that’s what they are.

And all of the rules of communication apply to them, you know, that apply to giving a talk or writing an essay or writing an article for the newspaper.

It’s that idea of understanding your audience. And what are you trying to say and how do you want the audience to react?

How do you want them to walk away from your, your piece of communication, data journalism or data visualization? It’s just that. telling stories.

Kate: Yeah. And I think that, that has never really, like I remember when I first started working on data warehousing projects last century, because I’m an older person.

And, and there was no idea of crafting something that made sense to the person who would be using it in the early days.

So, you know, we were so excited that we had our data warehouse and we were constructing our reports and we had our, you know, we had our pivot, pivot table front ends and stuff.

And, you know, we were just in love with the technology.

And there was no notion of, of crafting something that made sense to another human being.

So I think we’ve come a long way since those days.

Linda: Well, it’s progress. It’s like the early days of computing where we didn’t have any concept of usability or accessibility.

People were lucky to use our technology.

They should suck it up and cope with whatever it was we threw at them.

But the idea that, well, you can actually make it really hard for people to use and comprehend or you can make it easy.

Perhaps we should try and make it easy.

That might be novel.

Kate: Yeah.

Yeah.

Well, it’s funny because my husband came to me a years ago now and he was like, can I talk to you?

And I was like, yes.

I was thinking, oh, what does he want to talk to me about?

Anyway, he wanted to talk to me about Facebook.

And he was like, Facebook is so easy to use.

Why isn’t, why aren’t work programs like that?

And I was like, oh, you need to go and talk to your IT department about that.

But he was just genuinely confused as a, as a normal, non-technical human being.

Why Facebook was so easy to use and why every other system he used in his day job was not easy to use.

Linda: It’s a fair question.

Kate: And I think, you know, I think a lot now, I was looking at some of the new stuff out from some of the vendors in the visualisation space.

Some of the generative AI overlays with that is really exciting.

So ordinary people are going to be able to ask plain language questions of the data and get a customised visualisation on the fly, which is really interesting.

Linda: Yeah, that has so much potential to see the idea of making that translation happen.

Kate: Yeah, but, you know, the devil is going to be in the detail and there’s going to have to be a lot of data engineering in the back end to make that possible.

So, you know, we won’t be out of a job anytime soon.

Linda: No, indeed.

No, I think it has a while to go yet.

What excites you about data?

Kate: Well, the reason that I took this job here at the university, and it was a long time ago, it was about eight years ago, eight and a half years ago, was that I could see that that we were going to be undergoing a very big digital transformation, the whole world would, and that data underpins digital transformation.

So I was really interested in helping the university get our data organised ready for the digital transformation that had to happen.

And then COVID happened, and it kind of gave us a push along.

So I did my master’s thesis in 2019 on the state of digital transformation in Australian higher education.

Linda: Wow. That was timely!

Kate: And at the end of 2019, there was none.

Let me tell you the short answer, there was none.

I went away on vacation at the start of 2020.

And when I came back, it was COVID.

And suddenly, all of my colleagues who were like, oh, my course is unique and special, it can never go online.

Their courses were online.

Everything was online.

But the thing that we hadn’t done as a business and nobody in the higher education sector had done, had actually been to transform what we were doing to take advantage of the new affordances in the digital age.

So all we’d done was plonk whatever we did in the classroom online.

So record a lecture, do whatever.

And I think I see that as kind of akin to what they did in the early days of television.

I don’t know if you’ve ever seen it, but you can see some really old black and white footage of early television with a bunch of people in evening dress standing around a microphone with scripts in their hands reading it.

And so they were doing radio on TV because they hadn’t worked out how to use the new medium.

I sort of see what we’re doing in the digital education space at about the same level where we’re all standing around our equivalent of our microphone in our evening dress trying to do the old thing in the new technology.

So I think that we as an industry, the higher education industry have to grapple with how we use the new medium and how we shift education in compelling ways that make sense to the kids that are coming through.

And the kids that are coming through now are quite different to previous generations.

You know, they are focused, they are sort of the TikTok generation.

They’re not big readers.

They really don’t first go to books like my generation did.

So there’s a whole lot of different forces on the education sector, but underpinning everything is data.

And that’s why I was really interested in this job and really interested in starting to get things organized so that we would be able to leverage the opportunity.

Linda: That’s interesting to me, the idea of organizing data, because it’s, I must say, I’m not great at being organized.

I won’t show you a photo of my desk because it’s a disaster area.

But it seems to me fundamental that, you know, systematic and structured approaches to data yield results.

But I’ve worked with scientists who, you know, happily tell me that they cannot compare the results of one experiment to the results of the one before, because the system wasn’t calibrated.

And they will go on and do the next experiment and not calibrate it either.

And it’s like, well, how do you know whether you’ve achieved anything?

How, you know, if you can’t, they’re trying to build newer and better things.

And they don’t can’t really compare them to the old ones.

And it’s horrifying to me.

But we don’t teach our scientists how to do that.

We don’t teach them how to be systematic and structured with their data.

And then we wonder why they handle their data in very horrifying ways.

Kate: Well, we’re doing, we do five year projects for our research data management efforts.

So we’re doing a new five year project right now.

And, you know, we’ve done a lot of work about improving research data management here at UNSW.

And we’ve got another five years of working on that.

It’s really exciting stuff in the pipeline.

So we’re trying to help people do the right thing.

And one of the, one of the reasons I got this job was because I was back in 2013, 2014.

And I was doing a research project with some folks in medicine.

And we were working with some indigenous data.

And I was like, where are we allowed to store this data?

And they were like, I don’t know.

I was like, hold on, you do this all the time.

How come you don’t know?

And then I started asking around.

And then I realized that what we did as an organization, we got everybody to jump through the ethics approval hoop.

And once you got your approval, they would just wave goodbye and say goodbye.

See you later.

And, and there was no guidance as to how to store the data safely.

So no data governance.

There was nothing about that.

So that was one of the reasons that I ended up in this role.

It’s because I started asking questions about, well, how, how are we supposed to store data?

And where are we supposed to store it?

And how are we supposed to store it so that it’s safe?

Linda: I’m increasingly of the belief that the people who ask the difficult questions are the people who are going to, you know, solve our problems and change the world.

So I’m glad to hear that you’re asking the difficult questions and getting in there and solving the data problems.

It’s been a fabulous chat, Kate, thank you so much for coming on.

Kate: Thank you very much for having me.

Leave a Reply