Mark Gray on Physics, Supercomputing, Robodebt, and the future of medicine

Make Me Data Literate

00:00 /

A super interesting conversation today with Mark Gray, Head of Strategic Partnerships at Pawsey Supercomputing Research Centre. We roamed everywhere from cloud physics to robodebt and personalised medicine, passing by NASA, climate science, and log scales on the way. “

“A lot of physicists are very competent programmers. You kind of have to be, because there aren’t any problems in physics that can be solved non-computationally. You need a computer to solve any problem.”

“I think all degrees should teach programming at some level. At least the logic of how programming works. There aren’t many areas of science anymore where you can really progress without a reasonably good understanding of the algorithms that you’re using
and how they’re implemented. In order to be able to do that, you need to have a pretty good understanding of some programming techniques. But then there’s other areas, I think where that’s useful as well.”

“people have developed a kind of implicit trust in the way that technology works. And I think partly it’s because they don’t actually understand how it works. And so, they just choose to trust it. I mean, I’ve got a lot of experience in programming. …I would not trust a computer-driven car to not hurt me.”

Transcript

Linda
Thank you for joining us for another episode of Make Me Data Literate. I’m excited about this one today. It’s not often I get to interview a good friend, but also I don’t know why it’s taken me so long to get someone from Pawsey on the podcast. So welcome Mark Gray. Thanks for joining us.

Mark
Thanks, Linda. It’s great to be here.

Linda
Excellent. Who are you and what do you do?

Mark
Well, my name is Mark Gray.

You already said that. I’m one of the managers at a place called Pawsey Supercomputing Research Center. We are one of Australia’s tier one supercomputing centers,
and we provide compute and storage and cloud facilities for research to Australian researchers.

Linda
What does tier one mean?

Mark
Well depends who you ask,

Mark
but if you ask us, it means we’re kind of just the largest computer facilities in Australia. So there’s small facilities that are located at universities and some of the research infrastructures, and then there’s the big tier ones, and the big tier one infrastructures are probably identified by two things. One is just their size, they’re very big facilities, but they’re also used by a large group of researchers. They’re not just a single institution facility, they’re available to everybody.

Linda
Awesome. I hope I get to talk about some of the cool projects you folks do because there’s some amazing science that gets done at Pawsey. Tell me, what did you have to learn to do your work? What was missing from your formal education?

Mark
Linda, so my career started with an undergraduate degree in physics, and I got into physics because I really liked clouds. So when I was choosing what to study for my honors degree thesis topic, I decided to look at Cloud physics. I liked it a lot, and after that, I got offered a place to go to a place called the University Wisconsin-Madison, where I studied graduate work in atmospheric and oceanic physics.

Then after that, I went off to a place called NASA, where I worked on a program called the Earth Observing System. And we were building software to run polar orbiting satellites for environmental observation, and my specific area of expertise was in writing algorithms for Cloud physics, which was my area. What I’ve ended up doing is working in computing. I guess in one way, the answer is lots of things were missing from my formal education because the job that I’m doing is not the job that I was trained for at all. But on the other hand, I think that the skills that I developed during my education and my early career, and some of the other stuff that I’ve done, have made me good at what I do. I’m not sure for the job that I do, and I think for a lot of management jobs in general, I’m not sure that a lot of formal training is really that valuable.

There’s no better value training than doing the job, I think when it comes to managing people.

Linda
I used to walk around supercomputing the supercomputing conference in the US and interview people on how they got into supercomputing. I did not ever interview anyone who was like, “Well, I set out to go into supercomputing.” It’s always “well I was doing stuff that required supercomputing and accidentally wound up working in a supercomputing center and now it’s my whole life.”

Mark
that’s also what happened to me.

I came back to Australia in 2005, and I was doing work in the same kind of satellite environmental work that I’d been doing in the United States. In order to get that done, I needed access to some good compute facilities. So I started doing my work at a place called Ivec, which was the precursor to what is now Pawsey. In doing that, I got to know a lot of people that worked there, and a few years after that started, a job came up at Pawsey, and I put my hand up for it. I started my career at Pawsey doing research software engineering, basically, and especially initially web development, and then some other stuff. The skills that got me into this spot, I guess initially, right, the programming skills, were a part of my formal education. A lot of physicists are very competent programmers. You kind of have to be, because there aren’t any problems in physics that can be solved non-computationally.

You need a computer to solve any problem.

Linda
Were you taught to program as part of your degree, or did you have to pick that up on the fly in order to solve the problems you had to solve?

Mark
No, it was part of my degree.
So we did Fortran and assembly language programming, and some other stuff like that in my degree program. It was a pretty good background or in programming, to be honest. I picked up a lot of languages since then, but I think that my early education in programming was really helpful in my ability to be able to pick up stuff later in life.

Linda
It’s interesting to track which degrees or disciplines actually teach programming and which kind of fling their students to the wolves and go “learn some programming cos you’re gonna need it but we’re not helping you with that.”

Mark
I think all degrees should teach programming at some level. At least the logic of how programming works. There aren’t many areas of science anymore where you can really progress without a reasonably good understanding of the algorithms that you’re using
and how they’re implemented. In order to be able to do that, you need to have a pretty good understanding of some programming techniques. But then there’s other areas, I think where that’s useful as well.
Probably 10 years ago, if you’d said to me, “Do art students need to be programmers?” I would have said, “No, that’s crazy.” I mean, why would they? But today, there probably are some areas where art students probably do need to have a level of skill in that area. As things like AI and stuff become more useful tools for a broader range of disciplines, then the need to understand how these things work then becomes really obligatory as well. Pretty soon, everybody needs to be a programmer at some point. Or, that’s not what I see is happening. I mean, it’s to me more like people have developed a kind of implicit trust in the way that technology works.

And I think partly it’s because they don’t actually understand how it works.

And so, they just choose to trust it. Anyone who’s a… I mean, I’ve got a lot of experience in programming. I have a lot of friends who have a lot of experience in programming. I would not trust a computer-driven car to not hurt me.

Mark
So, if I was flying a plane and I knew that the entire thing was being flown by an algorithm
and there was no pilot up front, I would be very worried. So, yeah.

Linda
Yep, the more you know the less safe you feel.

Mark
Yeah, right. I think if more people had that level of understanding, they’d be less trusting, less easily manipulated.

Linda
Yeah, well, as you know, that’s a big part of why I do what I do just to try to build the data and computational literacy of people so that we can all ask the right questions and be the right level of skeptical and cautious when dealing with these systems.

Linda
What do you wish everyone knew about data? Is there something that you figure if everyone understood this, it would change the world or at least make your life easier? [pause]

Mark
Oh, so I think the one thing would be that if everybody

understood that the data itself, it doesn’t make decisions

about things. It doesn’t have judgement in it. It’s what you do with it that has judgement. So, when someone says, “Oh, the data says this, this, and this.” That’s not really true. The data presents some information. You have decided it says this, this, and this.

Mark
And so, I dislike it when people either innocently or non-innocently use data and compute tools to kind of absolve themselves of responsibility. So, there’s that.

Mark
So, I think the other one is people talk a lot, I guess, about the value of data. And I think part of that is that this word data, what it means.

When we sit down and when you ask a question, what is the value of data? What do you wish people knew about data? It’s like, well, first, you really need to understand exactly what it is you’re talking about when you say data.

Because that can mean a lot of different things to a lot of different people. As a previous experimental physicist, I got very specific ideas about what I think data means.

Mark
And it might be different to what other people think. I think that, yeah, the value of data and the usability of data and the responsibility of data, I think those things all rest with people. Data is just a tool. .

Linda
I like that. I think that’s… There was a study recently where some very high number of researchers was given all the same data and it came up with almost as many interpretations of the data as there were researchers looking at it. It’s… I think it’s a red flag, isn’t it, when someone says, “The data says we have to do this.”

Linda
You’re like, “Whoa! Hang on a minute.”

Mark
Right. Yeah. Yeah. The data, I mean, that’s that kind of language. The data says this thing, you know, I think is a little bit of a slippery slope. Um, because it’s not the data that says that. you say that. you’ve read the data and you’ve made a judgement. And this is what you think. So the correct statement is I think that the data shows this, this and this.

Um, you know, in, in physics and other science areas, we like to be very specific about the meaning of things. So, you know, colloquially, we don’t, you know, when we’re just talking, we’re pretty, pretty relaxed about exactly what we mean by things.

It causes a lot of confusion. Um, but, you know, when we’re making judgements about things that are important, it’s really useful to be very clear about what the meanings of things are.

In legal terms, you know, they do this when you’re in a law court or, you know, or someone’s running a contract, they define the meaning of every term at the beginning of a contract. And every time you see that word in the contract, you must mentally go back to the definition of the word at the beginning of your contract and kind of insert that definition in that spot into the contract.

That’s what it actually means, right? So, you know, when, when we say data, you know, and we’re talking colloquially, we really need to be referring back to our original contract with reality and remembering what data means.

Because when we say that, we’re including a whole bunch of stuff that, you know, and then the problem with doing that, of course, is when you have a conversation with somebody on that, not everyone might be on the same page as to exactly what you’ve included in your definition of data.

And then we are, and then if you talk about different things, then, you know, conversation is hard.

Linda

Yeah.

And it’s, it’s tricky too when, when terms within the discipline don’t match from, from
one discipline to another, you know, you’ll find that one side, like physics uses terms
slightly differently to the way biologists use them. Indeed, in the curriculum, maths and
digital technologies and science all have different ways of describing the same thing
and use the same terms to mean different things.

And it’s, it’s really problematic.

Mark
It is really problematic. And there’s a great example from my own career in, in physics, when we’re studying… In gravity, you learn about the nature of space time and one of the outcomes of the nature of our understanding of the nature of space time is that gravity waves were predicted to exist. There was a large change in, in the gravity in the gravity at a location in space time, there will be ripples in space time called gravity waves that would propagate out. And, you know, there was a big gravity wave detected built at UWA. So years ago to begin to say, and other organizations started building those too, and they have since been detected, you know, and they, they’ve actually, you know, experimentally proven that they exist.

And then I went to study atmospheric science, and people started talking about gravity waves, and I was like, What are you talking about?

I mean, but actually, but gravity ways in atmospheric science are actually buoyancy waves. And so in atmospheric science, there are two forces. Buoyancy… Well, no, no, there are more than two forces. There are two forces that act vertically on a block on a body of air. The buoyancy wants to move things up and gravity wants to move something down. And so when you move a body of air up very quickly, It moves up and then gravity pulls it back down, but then buoyancy moves it back up and then gravity moves it back down and you get a wave.

Linda
Ahah!

Mark
And of course, one of the sources of that wave is gravity.
So it’s a gravity wave. And it really bothered me coming from a fundamental physics background to be talking about gravity waves in the downstream air of mountains, which is where some of these sometimes few things occur, or it’s in thunderstorms, In the ripples around them. And yeah, so yeah, terms are … definitions of things are important. Definitely.

Linda
And which is, which is great as an example of why context matters as well, that you
can’t just take your data and analyze it as though numbers are, you know, exist in their own right. The context of them is really important and, you know, actually understanding what’s been measured and how it relates to all the other things that have been measured and all that sort of stuff, you know, even in terms of your definitions, you’ve got to know what you’re looking at in order to analyze it in a way that makes sense.

Mark
That’s why the whole data conversation, you know, is so complicated because context is very, very important.

So, yeah, some, I mean, sometimes, sometimes I guess kind of not, you know, if you’re measuring, I don’t know, temperature at the surface, you might think, well, that’s not very contextually important.
But it is contextually important. So for instance, when you’re measuring surface temperature, and this is a big deal these days because we’d like to get surface temperature measured very accurately.
The local geography is very important in determining the temperature of things. So right next to me right now about 15 minutes away is Perth Airport.

And Perth Airport has a weather monitoring station where they measure temperature. It also turns out that Perth Airport’s in a bit of a dip in the ground, like a like a broad valley, right. And so air kind of pools, you know, in the Perth Airport area. And so that temperature can be a little bit different to the surrounding temperature.
So you look at the temperature and it’s not necessarily the temperature of Perth because where it’s measured is very important.

Linda
Yeah, I’ve noticed that just riding, riding to work, I used to ride through a dip next to our local park, and you would actually go into, it was like going to a cold bath and, and out, as you went through the little valley,

Mark
Yeah.

Linda
it was just this, this little puddle of cold air.

Mark
Because cool air will pool in these in these dips.

Linda
Yeah.

Mark
And yeah, so yeah, so context is important everywhere. You know, one of the biggest challenges and you know, I used to work in climate science, one of the biggest challenges is normalizing data because the local context of all these measurements is incredibly important. And when you’re trying to compare things, you must first remove the local effects. So you’re actually comparing apples to apples when it comes to say comparing temperature at two locations.

Linda
Yeah.

Mark
You know, there are other reasons why those things might not be the same or when you’re looking for those large scale trends, right.
You have to get rid of all these local effects and it’s complicated work.

Linda
So it’s surprisingly complicated to say even what is the temperature in Perth compared to the temperature in Melbourne, like define Perth, define Melbourne. It’s not that straightforward.

Mark
Yeah, I know.

I do it all the time when it’s really hot and I say hey, it’s 43 degrees in Perth. I also know that in my car, you know, on a hot day, if it’s 43 degrees outside, it’s probably 48 to 50 degrees inside my car. And that measurement at Perth is probably measured in Perth City at the Bureau of Meteorology, which is on a bit of a hill. And so, you know, you have to look at the, if you really want to get measurements of temperature that are useful, you know, you really have to either measure yourself in your backyard or just look at like an average over the area, which will tell you something.

Linda
Yeah. Yeah.

Linda
But then, you know, how do you, how do you average, you know, which, which, which points do you use to contribute to your average. And it fascinates me that even the definition of the average global temperature is actually quite complicated. And there are different versions of it.

Mark
Yep.

Lbda It’s like that you, something that we intuitively understand as simple and straightforward is actually wildly complicated when it comes to the specific implementation.

Mark
Yeah, and you know, in physics, this is a big deal. So physics, a lot of people misunderstand physics. Physics is fundamentally the science of measurement. It’s not, you know, that is what physics is. Physics is the science of measurement, right.

Linda
I’ve not heard that description before. That’s great.

Mark
And in physics, we really care most of all about the accuracy and precision of a measurement. That’s it. That’s everything comes from that.

Linda
Nice.

Mark
And the, all of the theory that we have and the hypotheses that we have, you know, live or die on the basis of measurement.
And so that’s why we get very, in physics, we get very particular about the definitions of things because we’re very particular about measurement.

Linda
Yep. That’s super cool. I’m seeing physics in a completely new light now. What are, what are the worst data mistakes you’ve seen?

Mark
I was thinking about this. And recently, I think the worst data mistake I’ve seen is something called robodebt.

Linda
Yeah.

Mark
Right. So, and for anyone that’s not familiar with that, there’s, there was a period in the previous government where data about people was used inappropriately, probably, to make determinations about debts that they owed to the federal government. And it turns out a lot of those debts were not real.

Linda
Do you, can you explain the sort of a simplified version of what went wrong there, exactly what they were doing?

Mark
My, so my kind of simple understanding of it is this concept, the thing they were talking about was income averaging. And so they would, they would average people’s income over a year and then essentially estimate their debt based on that income average and then send people debt notices.

Linda
Yep.

Mark
But a lot of people, especially who are on very low incomes, either work itinerantly on jobs that come and go and just don’t have those kind of regular kind of sources of income.

Linda
Yep.

Mark
And so when you average their income over a long period, you actually just get wrong answers. They’re not, you know, it’s not a question of, you know, interpretation, it’s simply wrong.

Linda
Yeah.

Mark
And so, you know, the problem occurred because a lot of those people were also very, very vulnerable. And a lot of those debt notices got, as I understand it, handed over pretty quickly to non-governmental debt collectors. And, you know, they’re pretty mean. And so people got quite traumatized by it. As I understand it, quite a number of people actually committed suicide based on notes that they got about debts that they owed.

Linda
That they allegedly owed.

Mark
that they allegedly owed, right, And the other part of it that really, like, bothered me was that the onus was on the individual to prove that they didn’t own the debt.

Linda
Yep.

Mark
Right? So, you know, it wasn’t for the government to prove that they did. And so that’s, that’s to me, you know, the most blatant kind of grievous misuse of data that I can think of, especially when you think about the human cost that it resulted in.

Linda
Yep.

Mark
Thank you. Thank you.

Linda
So when the question I asked was about mistakes and it’s quite, arguably not even to open to argument anymore that this was a mistake rather or a deliberate, calculated misuse of data. But the, the other question, I guess you’ve, you’ve already given one example, I suppose is have you ever seen data deliberately misused.

Mark
I, yes, and in, I’ve seen data deliberately misused and my example is in climate.

Linda
Mmhm.

Mark
And the example is in people who wish to assert that climate change is not happening. And there are many blatant and I think simply disingenuous uses of data to try and prove a point.

Linda
Yeah.

Mark
You know, things like, you know, showing temperature for the last 5,000 years, but leaving off the last 100 and then, you know, saying, oh, there’s no big deal, you know, when you’ve left off the bit where the temperature goes up enormously high.

Linda
Yeah.

Mark
And so, you know, those kind of things, those, those are blatant and intended to mislead.

Linda
Yeah.

Mark
Right. And, you know, the thing about it, I think a lot of people that do that, they make uses of those, you know, trying to deceive people, are doing so knowledgeably. They, they, in order to come up with their deceptive use of the data, they must first have looked at the actual data.

Linda
Yeah.

Mark
Right. And then they made a decision to come up with a presentation of it, which is deceptive. And they know it’s deceptive. Right. So that’s, yeah, that’s, that’s, that’s one of my, one of my things. So, yeah, I don’t really like it. It’s frustrating.

Linda
How do we spot things like that?

Mark
So, whenever you see like, someone show like a chart of data, like a plot, like in the newspaper or online or whatever, you should be asking yourself lots of questions about what you’re looking at, because there are so many ways to display data in a way that is deceptive.

Mark
So things like fiddling with the, the axes on a chart, you know, the vertical axis, horizontal axis, you can fiddle with those numbers to make things look especially bad or especially good, depending on the point you’re trying to make.

Linda
Yep.

Mark
Things like switching between a linear plot and a log plot, a logarithmic plot and a log and you know they’ll show trends differently and they mean different things, and they’re not comparable. Right. And if you’re just looking like, like, like a great one during the pandemic, you know, was, you know, looking at exponential rises in cases but when you look at them logarithmically they don’t look that scary.

Linda
Yeah.

Mark
Unless you understand what a logarithm plot is, and when you see a linear line on a logarithmic plot, then you know that thing is going up very, very rapidly. So, you know, that kind of thing.

Linda
Yep.And I would argue that even people who understand log graphs, like I understand log graphs,

but I don’t intuitively read a line graph as a log graph, so that unless you’re really

careful and really think about it, we tend to look at a graph and take the surface impression,

the first quick understanding and run away with it rather than stop and look at it in

detail, and that’s where using a log graph is really problematic.

Mark
Right. So, because a graph of data is, you know, the, the product, a graph, it’s a whole bundle of information, and every part of that information is equally important. So if you take a chart, and you just lop off the numbers on the vertical axis, and just say this thing is changing, right.

Linda
Yeah.

Mark
That’s, that’s a hint that someone’s trying to trick you. Right. The, the, yeah, there are other ways to trick people in data, like leaving out data points that are important, or, you know, the, you know, and then some like there’s legitimate reasons to take outliers out of data systems.

But sometimes they’re important, and they show things. And the other one, of course, is correlation. You know, you can look at data points and you can say how well do they correlate, you know, how well do they fit, you know, a line. And, you know, someone shows a scatter plot, and then draw a line through it and go, oh, here’s the correlation. You know, it’s like, well, no, you know, if you it’s a big, you know, disorderly looking scatter plot and just look at it, you can see there’s no order to it.

Linda
Yeah.

Mark
Then, you know, you can draw a line through it, but it doesn’t mean anything. So that’s the other thing is people adding stuff to charts and information that actually doesn’t mean anything. It doesn’t mean you can’t make one, you know, if you just take any data set, you can come up with a, with a linear regression that matches the data. doesn’t mean that data follows that linear regression, right, because you can draw a line through any amount of data.

Linda
Yeah. Yeah.

Mark
So, you know, there’s, there’s a lot of detail at the, the, the log, the log versus, you know, linear plots.
I mean, the plot of data cannot be understood without, you know, clear, you know, articulation of what the vertical axis are, what the numbers are and what they mean. And in physics, of course, what the units are, it always bothers me enormously when someone puts anything anywhere and they don’t tell you what the units are.

Linda
Yep.

Mark
So, and in physics, we get very, very, very particular about the units of things because it’s the, it’s how you understand what a measurement is.

Linda
And it’s, it’s sometimes this is just, you know, carelessness or even the default result from whatever graphing package you’re using and sometimes it’s woefully misleading and it can be difficult to draw a line between the two.

Mark
So, it’s not always malicious. I guess I, I always try to, you know, assume the best intentions of people, right, so, you know, I never attribute to malice what you can attribute to incompetence, right, basically.

Linda
Yep.

Mark
So, you know, people make mistakes and they just look at a thing and they think it means something and maybe it doesn’t, but they, they, they talk about it like it does and then it becomes a story which they, which they send out and, you know, it doesn’t mean they’re trying to manipulate people, or trying to deceive people.

Linda
Yep.

Mark
It doesn’t, but on the question of, sometimes the question of intention isn’t important if people are deceived anyway.

Linda
Now I’m going to put you on the spot here. I didn’t warn you about this question, but I think since since I have you here from Pawsey, tell me your favorite use of data at Pawsey.

Mark
Yeah.

Linda
What, what’s something really cool you’ve seen done with data.

Mark
So, my favorite project at Pawsey that I’ve seen on our systems right now is a project that’s actually being operated out of one of the hospitals in Perth. So, recently, one of the researchers, a doctor and machine learning expert at royal Perth hospital came to us to tell us about a project they’ve been running on our systems. And they have developed a machine learning algorithm that can predict brain swelling in traumatic brain injured patients in the ICU.

Linda
Yeah.

Mark
And as I understand it, if I recall the numbers correctly, they can predict it with about 90% accuracy with about 30 minutes warning. When I was talking to the doctor in charge of this project, Dr. Rob Macnamara, and he’s from both Curtin University and from Royal Perth hospital and the projects are supported by those orgs. The, when patients arrive in the ICU with traumatic brain injuries, it’s not always the initial traumatic brain injury, which is the life threatening or disabling event. Frequently, it’s the brain swelling that happens to the patient while they’re in the ICU. That is the fatal or permanently disabling event. And you’d really like to be able to predict when those things are going to happen so that you can have a chance at stopping them because once someone’s brain has started to swell, it’s a critical moment and you have to act fast and there are only so many things you can do, Right.

So if you had warning of it coming, at the very least you could be ready, but you also can take preventative actions. Right. But how do you know? So they took a bunch of data from hospitals at Royal Perth and Royal Alfred and Royal Sydney and collated a data set which they’re able to use to run through an algorithm to, after a lot of work, Come up with an algorithm that could actually be predictive about people in the ICU and whether their brains would begin to swell or not based on what’s happening with their bodies. Because there are indications that stuff’s about to happen. As I understand it, they’re in the latest stages of Therapeutic goods administration approval to use that system clinically. So still very much in the development phase.

Linda
Yeah.

Mark
But it’s super cool, right? Because they’re using data and computers to literally save people’s lives. And you know, of all the work that we do and you know, at Pawsey, we support thousands of researchers across Australia. And the work is all great. Right.

Linda
Yep.

Mark
There’s lots of super interesting work happening on our systems. But you know, when we’re able to provide the facilities that, you know, help people help other people, I think that’s when, you know, at least our teams feel like they are achieving the best that they can achieve.

Linda
You really know why you’re going to work when you’re supporting that kind of project.

Mark
Right. So, you know, that kind of stuff is, you know, what gets you out of bed in the morning and, you know, and getting to work and, you know, because it’s hard work, spreading these big compute facilities. The, the, I guess one of my jokes at work when, when we do something with computers is I say something like, Oh, I mean, we’re going to do this thing with a computer. I’m like, well, that’s a computer. It’ll just work. And it’ll always happen exactly the same way.

Linda

Mark
Right. How could it not? Of course, it’s funny because that’s not how they work. And they’re very complicated. And so sometimes you do do a thing twice and in, you know, and two things, two different things happen on a computer each time.

Linda
Yep.

Mark
That’s upsetting. And also, you know, we run very, very high performance compute facilities. And, you know, that also means that they’re kind of running at the bleeding edge of technology. You know, it’s called the bleeding edge because everything’s bleeding, and the people. So they’re all injured by what they’re doing…

Linda
Yeah. All the edges are sharp.

Mark
But yeah, it’s, yes, right. It’s, it’s, it’s, it is genuinely hard work and very demanding work. And it doesn’t really rest. You know, we run our systems 24/7. When we solve our systems, you know, if our systems are offline, then there are major radio telescope instruments that don’t collect data.

Linda
Yeah.

Mark
There are other things that don’t happen. Thousands of people across Australia are immediately affected. So, you know, that’s a, that’s a pressure that we all feel when it comes to the operating of equipment.

Linda
It can be pretty intense.

Mark
Mmm. But, you know, the good part is… that’s hard. And the good part is someone comes along and says, yeah, we used your computer to say someone’s life last week. So thanks.

Linda
Yeah.

Mark
Right. And it is pretty cool. Or someone comes along and they say, I’ve used your computer to, you know, take the data that I have here. And I’ve just found a, you know, a new kind of stellar object, which has never been identified before. That’s really cool. So, you know, and that kind of stuff happens as well. So, you know, that’s what makes the work kind of cool. You know, I got into, I was working in science before this career, and I kind of got out of science and I started working in compute.

Mark
And now I’m kind of helping to run these big computer facilities. And so I’ve left science, you know, from doing science. And now I’m helping other people do science. And it is actually a very rewarding to do that. If I think about the kind of impact that I could have had as a probably pretty mediocre guy working in climate science versus you know, helping other smarter people than me get their work done, I’d rather do that. Because I think there’s actually more opportunity to have impact that’s, that’s not that’s going to be persistent from that work, then from, you know, just, you know, being yet another, you know, kind of like, you know, middle rung climate researcher somewhere in a lab.

Linda
Well, you’re facilitating a lot more science than you could ever have done as a single person.

Mark
Yeah, exactly. So that’s what makes it cool.

Linda
That’s a good segue into the last question.

What is it that excites you about data?

Mark
So what excites me about data and I’m interpreting this to be very broad. So data in this conversation includes data, but it includes AI and includes all this other stuff, is that, you know, we are really on the cusp of some very, very cool stuff in probably the next like 15 years.

So the, my, the thing I’m most looking forward to as to what’s kind of coming out of that is that is what’s going to happen to medicine over the next 20 years. So, you know, a lot of medicine is fundamentally trial and error. Right. So you go to a doctor with a problem. And the doctor says, Well, you know, I’ve got eight different drugs that we can, we can try you try on you with this thing. We’ll start with the drug with the least side effects. And then we’ll see how you go and then we’ll, we’ll find the one that works best for you that mitigates most of your symptoms without creating and creating the least side effects. And that’s what we’ll go with, you know, with migraines and, and, you know, and other kinds of conditions they, they experienced this stuff all the time right it’s hard to get to the right drug.

So data can solve this problem. I actually think data will solve this problem. You can compute based on someone’s DNA, what a drug will do to them. It’s just kind of computationally expensive. And because it’s computationally expensive, it’s monetarily expensive. But that won’t always be true.

There’s, there’s kind of two barriers to it. One is DNA sequencing needs to become cheap. Well, we’re there. That’s done, you know, there’s these things like Nanopore and other kind of sequences now that you can take into a field and they’re pretty good for short reads. And, you know, sequencing in general has gone from, you know, being a millions of dollar activity per person to 1000s and will become a hundreds of dollar activity. And at that point, you’ve got the data that you need to actually do the compute to do something predictive.

And on the predictive side, you know, these models that you can make for drug interactions, well, they exist. So, you know, one day in the, I think not too distant future, you’ll go to your GP with a problem and your GP, he or she will say, Oh, I’ll put the, I’ll just click, click, click, click, click. We’ve got your DNA on file, click, click, click, click, click. Ah, okay, we’re going to give you drug a it has an 87% chance of fixing your problem. And if that doesn’t work, we’ll give you drug B, it has 11% chance of fixing your problem and the other drug we won’t even bother with. Whether you’re on drug A, it has a 23% chance of the following side effects based on your DNA and you can compute all that, right? So, you know, the, that means that, you know, doctors can go straight to the most effective diagnosis for, you know, for pharmaceuticals. And that is going to, I think, have two big positive impactsand one negative one.

So the two big positive impacts are that medicine will become a lot cheaper because people will simply need to be attended to less frequently and they will be healthier because they’ll be getting treatments, the more effective, most effective treatment more frequently.

Linda
Yeah.

Mark
The, the negative part is that in order to make a system like that, you probably first need to build a system that has a digital copy of everyone’s DNA.

Linda

Mark
And, yeah, and so the big sigh there is that historically, we’re not very good at protecting such information.

Linda
Yeah.

Mark
And that information, we talked about the value of information at the beginning of this conversation, the value, the monetary value of a database of everyone’s DNA would be pretty big.

Linda
Yep.

Mark
And there’s a lot you can, and with these new computational tools in the future, there’ll be a lot that you can know about a person with a copy of their DNA and half a day on a very fast computer, right? So that’s dangerous. And, you know, we’ll have to thread the needle very carefully on that. I think when it happens, and it will happen, I think that governments will be irresistibly drawn to it because of the level of savings that they will get in their national health systems.

Linda
Yep.

Mark
And so they’ll probably rush to it because the savings will be enormous, right?

Linda
Yep.

Mark
And historically, when we rush into solutions with people’s personal data and we don’t take a lot of, we’re not very careful, not so great things happen. So, you know, it’s kind of a challenge. Right.

Linda
Yeah.

Mark
Right. Yeah. So, which is, you know, a horrible invasion of privacy. And, yeah, and of course, there’s not an easy invasion of privacy, but there’s malicious things people can do with their data if they get their hands on it.

Linda
Yeah. Yeah, it’s a big one.

I was just reading today about, I think it was Tissue Path who were hacked and a whole

bunch of people’s pathology results wound up on the dark web.

Mark
Right.

Yeah.

So which is, you know, a horrible invasion of privacy.

And yeah, and of course there’s not an invasion of privacy, but there’s malicious things people can do with their data if they get their hands on it.

And so that’s pretty bad. I think the other, the other, my other, big, like, thing I’m looking forward to in data is the things that are going to come out of a project called the square kilometer array.

Linda
Yeah.

Mark
So, so the first one is about people, right? People are going to live better lives and healthier lives with data in our lifetimes, which is kind of cool. This square kilometer array is a big project that’s being operated here in Western Australia. An international consortium is building what will be the world’s largest radio telescope. Not only the world’s largest telescope, it will be the largest radio telescope that it is possible to build on the surface of the earth. And that instrument, when they turn it on in… towards the end of this decade… is going to see some very, very cool things. A lot of things that we can kind of predict and also some things we can’t predict. But if you’re like me and you like science fiction, then, and you’ve been following the SETI project for the last 30 years, the search for extraterrestrial intelligence. So if there is an extraterrestrial intelligence in the immediate vicinity of our little planet, then an instrument like the SKA, I think, will actually, SKAO telescope,
Will in fact be able to categorically tell you the closest, I don’t know how many stars, but a lot, whether there’s someone there or not. So that would be very interesting.

Linda
Amazing.

Mark
Yeah, right. So either they’ll find someone or they’ll find no one. And then we will have a measurement on the density of life in the universe. Because right now we don’t. Right now we’ve only got ourselves. And one data point is not a good measurement. So we’re physicists, right? I’m a physicist. I like measurement. So soon we will actually have a measurement.

Linda
Yeah.

Mark
I think that will be very exciting. So it will be also very cool if they find someone because I think that will be, you know, neat. The SKAO, of course, will do a lot of other cool science, which will come up. And, but, you know, I think the one that will probably most resonate with people will be the ET thing.

Linda
Yeah. Yeah, it’s the easiest to kind of relate to.

Mark
Yeah, because the other stuff they’re going to look into, you know, what dark matter is, you know, the origins of the universe, looking at things like fast radio bursts and trying to understand the weird stuff in the universe is all really cool. Most people don’t really understand it. And radio astronomers will love it and physicists like myself will love it as well. But, you know, if I’m thinking about, you know, things that will change society and things I’m looking forward to, I think, yeah, the ET thing will probably Change us in ways that is probably a little bit hard to predict. And in the medicine one, well, that will change us in ways that are very easy to predict. It’ll just be really good.

Linda
That’s super cool. Thank you so much for joining us. Mark, this has been an amazing conversation. I have really loved it.

Mark
Oh, thanks, Linda. And thanks for inviting. I’m glad we finally got to do this because we’ve been talking about this for ages.

Linda
Yes.

Mark
And every time we’ve synced up chat, it didn’t happen. So finally, yes, yeah, you’re welcome.

Linda
Yeah, it’s been complicated, but we managed it. Well done. Thanks so much.

Mark
Yeah you’re welcome.

Mark Gray on Physics, Supercomputing, Robodebt, and the future of medicine

Related

Published by Dr Linda McIver

Leave a ReplyCancel reply