Ray Hilton on AI & Deep Tech

Make Me Data Literate

00:00 /

A fantastic conversation with Ray Hilton.

“on its own, like data is relatively inert and doesn’t really have much value – like the value comes from what you do with it and how you interpret it. And it can obviously be interpreted in many different ways. “

“It’s not just the fact you’ve identified these people. It’s like you’ve identified them and associated them with this, you know, treasure trove for a hacker. Information about their health history. So that’s a terrible situation, obviously, and we don’t want that to happen. And one of the easiest ways to avoid it happening is simply not collecting the data in the first place. “

Transcript

Linda: Welcome back to another episode of Make Me Data Literate. One of the great joys of this podcast is the wide range of people I get to talk to and I have been pursuing this guest for some time and finally we’ve got a chance to have the conversation. So very happy to have here, Welcome Ray Hilton.

Ray: Thank you. Thank you. It’s great to be here.

Linda: So let’s start with the obvious question. Who are you and what do you do?

Ray: Yes. Well, my name is Ray as we’ve already established. I’m a principal consultant and I specialize in data and AI.

I’m also a CTO of a women’s health wearable health tracker start up called Kyrie. And I’m also advising a few different deep tech start ups across like health care and climate.

And I also also do some research for space domain awareness at Swinburne. Yeah. So there’s a few things on my plate and also in between all that, I have a family and fairly active weekends as well that I try and squeeze in as much as possible burning the candle from every end using a flamethrower.

Linda: Do you eat and sleep at all?

Ray: Ah. They’re overrated. Yeah, it is a bit of a squeeze. My supervisor said it best when he said he’s got academic ADHD and he finds it hard to sort of sit in any one domain. And he’s an astrophysicist by training but is now like supervising people across like life sciences, especially like medical imagery using AI for that. And there’s a few other things like UIs, visualization, augmented reality, this kind of stuff. It’s all, you know, I guess this is a once you get to that kind of level, you’re able to pursue all your different interests under a single banner. But super interesting.

Linda: I’ve always thought that all the interesting stuff happens on the interdisciplinary edges anyway. So that really speaks to me. I’m not interested in a discipline for its own sake. I’m interested in what it can add to other disciplines, especially computing. It’s so obviously a service, sort of a service industry, a service discipline that, what can we do with it? What’s the point? Where can we get some really interesting insights? That’s really cool. You’re also, I believe, one of the organizers of the Melbourne Deep Tech Meetup. How did that happen?

Ray: Well, that was, it was quite interesting because after COVID, obviously, that put a bit of a dampener on the in-person meetups. And post pandemic, when things started opening up again, lots of them had moved online and very few were coming back to being in person. I think a lot of us, after being isolated for so long, were keen to get back to face-to-face things, but not just in a work context, but with other people, peers who shared common interests.

And I was trying to form one around space tech. Another friend of mine was trying to find one around biotech or life sciences in general. And I had another friend that was looking at it from a climate point of view. And we were like, “Hang on a second. These are all interrelated in a way. But if we take a step back from a specific topic, what we’re talking about is deep tech. This longer term transition of science, we try to say pithily, putting science into production.

These things originate in research. And then founders are taking that, understanding how they can map it to a product and going on this journey. The deep part refers to the depth of knowledge, as well as the depth of pockets required to be able to bring this stuff to fruition.

But yeah, so we put that together and we started promoting it. And we get an interesting cross-section of people. We’ve got obviously students and academics, but we’ve also got business founders who are technical as well as more non-technical product people as well. And they all have this shared interest. So I think to your point about the cross-disciplinary, the intersection between these different areas, I think that’s where this has hit a very positive vibe, is that people can come from all around different subjects, different domains. And yet they have this common shared interest of trying to bring these research-based ideas to light.

And so we actually had an interesting example of this, maybe it’s a few months ago now, where we had one company come and talk, Mass Dynamics, who provide mass spectroscopy as a service for life sciences. And another company, Hone AG, who provide handheld grain and soil measuring, which also uses spectroscopy. And so while one was agriculture and the other is life sciences, the technical underpinnings had a large overlap.

And so the technical people from both teams started chatting and go, “Oh, I didn’t realize you were using it that way. That’s interesting.” And so that’s what we hope is that these people meet others who are doing interesting things that they may not have known about before. And through that, rising tide lifts all boats. We hope that we can interconnect these people and interesting things come from it.

Linda: That’s fantastic. So it’s all about connections and about impact, doing real stuff with these research ideas. I love that. That’s awesome. Your specialization is data in AI. What did you have to learn to do your work and how much of that came from your formal education? How much are you self-taught? I hesitate to use the term specialization with someone like you, but… nominal work specialization.

Ray: Yes. My specialization is my generalism. Jack of all trades, master of some, maybe? Well, I don’t think I ever formally trained in this area. I do have a computer science degree, but I only got that after working in computer science. So it’s a bit of a retroactive qualification.

But it did help. It certainly helped me appreciate computer science from a more fundamental science point of view rather than just as coding websites or whatever I was doing at the time. It exposed me to a whole bunch of other languages and platforms and some of the history, like why things work certain ways. So I think without the computer science degree, it’s certainly possible to learn this stuff on the job. But you’re less likely to, let’s say, learn MIPS assembly language on the job because it’s just not something that’s necessarily in demand or people want to do. But it’s an interesting experience to go through to understand, okay, this is actually, at a low level, how this computer works. And that’s why we have obviously higher level languages, which are then ultimately compiled down to this. So then you can appreciate when I write code like this, it’s going to turn out to be code like this in the assembly language. So the compiler’s responsibility is to try and work out what your intention is and write the most optimal assembly language. But sometimes it doesn’t get it right. I mean, these days, compilers are pretty awesome and they’re really good at this. But back in the day, you had to be mindful that if you wrote that tight loop or did this thing in that particular way, this is the impact it’s going to have to how the assembly language comes out. And so to debug the performance or whatever it may be, having that lower level knowledge can help in some way. But yeah, I think I originally started my technology. I remember my dad buying a computer when I was about, gosh, I can’t remember. Maybe it’s about seven or eight years old. He bought this computer and it was an Einstein. It was back in the time when everyone had their own little brands of computers with their own operating systems. They weren’t all PC clones.

Linda: I had a Commodore 64.

Ray: Ah, yeah, you had a cool computer. I had one. It didn’t have much software support on this platform. I think later we got a ZX Spectrum. And so then that was my one. And then I spent a lot of time writing code in basic on there and trying to make games and build out various different little silly little things like you could… I can’t even remember what I used to do in it, but I love playing around in graphics in basic and just trying to plot stuff on the screen. And it was actually an interesting way of learning other topics. So when I started learning about, in physics when we’re thinking we’re learning about vectors at school, I was like, ah, I’ve been trying to draw like a spacecraft that moves around the screen. And I’m going, oh, if you’re going this way, you move forward by this many pixels. And if you turn left and I’ll go this way by that many pixels. But if I just represent this as a vector, I can sort of dynamically calculate the orientation and the number of pixels and then it’s much smoother and it acts more like a real thing. And oh, and then I can apply another vector to act as thrust.

And so like it actually helped me learn some topics like physics at school by going home and trying to implement some of the math in code.

Linda: That’s amazing. I love that.

Ray: And so that’s a great learning tool in that regard. Back to the interdisciplinary aspect. Yeah. I think these days, obviously, computers have come a long way and it’s much easier for kids to jump in and sort of learn different languages, visual coding, things like that. And so it’s much more normal to use computers as a teaching tool to explore things like physics.

Linda: Did you teach yourself the data and the AI parts? Did they kind of happen just as part of your job and you learnt them on the job or where did they come from?

Ray: Well, I think it’s around the 2004, 2005 mark, which is a long time ago. I did this computer science degree and part of that was there was a module on AI. But it’s not AI as we think of it today. It was like a simple neural networks like a Kohonen network. And it was it was a really interesting topic because I guess it was the first time that I’d really been introduced to these ideas. For this, like, the example they gave was putting RGB values into this Kohonen network, run it through a training process. And even though you randomize at the beginning, it ends up sorting the colors. And so the great thing about it is that you can visually see how it’s sorted the colors into this kind of looks like a gradient, you know, but every time you run it, it’s different because you start off with random and all it’s trying to do is bring colors that are close in value together. And it’s a great teaching tool in that regard because then you can like run it and visually see the output and play around with it and you can build the whole thing end to end yourself in just like not many lines of code. And so you kind of have this, you know, that intrigued me because I was like, oh, wow, this is interesting. And I can build this, this notion of a neuron and it’s like this fundamental component and I can wire it up in this way and I get this effect. But then if I wire it up in a different way, now I can build different things and obviously it as a sort of foundational concept that like suddenly have this paradigm shift of like, wow, what else can you do?

There is, there is a generalization theorem that any function can be generalized as a sequence of polynomials, which is effectively what a neural network is. And so that’s really interesting that, you know, that’s why these large scale neural networks can approximate very, very complex things because you don’t actually need to know what the function is in traditional software engineering style with this is how the algorithm is going to work and I’m going to write the code. But obviously in a neural network, it can be learned and if the network is large enough and complex enough and there’s enough data, you can pretty much do anything, like you can synthesize, you know, entire songs, and based on examples of other people’s songs. So, and it’s really technically a function at the end of the day. And it’s quite fascinating how that complexity can come out of what is just a sequence of polynomial expressions really is the very high dimensional space but it’s fascinating.

But to answer your question about like learning that stuff, like I did that in that comp sci degree. Then I went on to do more and more data engineering as part of my day to day software engineering work. Data obviously came around and so doing more data engineering and learning how to sort of efficiently manipulate and build upon it. And it wasn’t actually until a few years later that I ended up working at a data and AI consulting, which was an interesting experience because it was a brand new company at the time. And the industry wasn’t like it was today where everyone in their dog wants a gen AI prototype. Back then no one saw the value in it and we had to sort of spend a lot of time convincing people that there was value in that data. And we could extract some of it through, you know, machine learning or AI techniques.

And then that was the first time I really started using a lot of these data science and AI techniques on the job. And then since then, you know, I’ve done a few other things, but the recurring theme is data is now at the forefront of both product development and also how businesses work and especially in startup spaces. You’re looking to, you know, be as efficient as possible, whether that’s understanding your customer or understanding the business itself. You want to make sure that you’re doing the right things and you’re mitigating, reducing risks, identifying potential revenue and all that kind of stuff. It’s all just data at the end of the day.

Linda: Yep. And so you’ve done, it sounds like you’ve applied a reasonably rigorous sort of scientific approach to your data analysis and your machine learning. Do you have feelings about the generative AI scene and the kinds of uses that we’re seeing now and the people leaping on board without necessarily knowing why they’re leaping on board?

Ray: Yeah, it’s, I think, I guess fundamentally my concern is we want to avoid another AI winter. There’s a chance that the reality doesn’t, if the reality doesn’t match up with the hype, then people will become disappointed and then that will lead to a reduction of investment, or at least a more conservative investment in these areas. And I think many people would say that that point is coming, the bubble will burst. It’s not to say that it will necessarily be a winter, but it might be harder to raise money just on the promise of having a dot AI suffix in your business name.

Linda: Yeah.

Ray: But what we don’t want to do is that the world is poisoned so much that people actively turn away from it. And so I think that’s where we need to hit this very careful balance between talking about the potential of something versus overhyping it to an extent where it cannot be ever satisfied and potentially send us back into a winter for another decade or so.

Linda: Yep.

Ray: I don’t, I don’t necessarily think it will go that far, but some of the, obviously some of the claims around Gen AI are pushing us to too far in the direction of, you know, all this talk of AGI, you know, artificial general intelligence, like human like flexible intelligence across a number of areas. It’s like, I understand obviously why people want to do it. They want to create excitement in this area, but it’s, it’s probably quite a while away. And it is bordering on some sci fi claims. And of course, you know, maybe even chatGPT would seem like sci fi for someone from a few years ago. And so things do progress, but it’s hard to see how all these things that are being talked about can be fulfilled in such a short space of time. The reality I think of Gen AI can be seen in like various projects we do, like for prototypes.

Someone plays with chat GPT or they play with an example chat bot and they go, oh, this is great. This is interesting. And then they want to use it in their own situation. But once they look at the effort involved to trying to take what they see as a free tool or at least a very cheap tool and then customize it for their particular domain and do all the additional work, it suddenly stops being a free tool and starts being quite a large, not just a large initial project to build and deploy it, but also an ongoing cost in terms of maintaining it, retraining, but also things like access control, you know, operational monitoring, all these things that potentially we don’t consider.

You know, we’re just like, oh yeah, we type in the chat bot, chat bot gives us an answer, but there’s all this other stuff. If you want to productionize it and make it reliable, it’s just like any other software project. There’s all this infrastructure and governance that needs to go behind it as well. And so then all of a sudden the cost of it becomes larger, but then value, okay, some businesses managed to deliver a product that makes more sense with GenAI, but sometimes it’s like this feature on the side. And so when it’s actually deployed, some people might use it initially, but unless it’s really offering value, significant value or a significant new way of exploring your product, it may only remain that way. And so like the total, you know, the cost benefit, often doesn’t make sense.

And you link that with the other risks of, I think there’s Canada Air suffered this where their chat bot just made up a policy, which they were then found that they had to honor. So I think when risks like that start popping up, that’s going to, obviously people hear that and they say, okay, if we’re going to build a GenAI project, we need to make sure that doesn’t happen. But listing all the things you don’t want to happen is quite tricky. Like you can’t simply say to the chat bot, oh yeah, and don’t do anything I don’t want you to do.

But there’s this tension between having something that’s flexible and works in natural language inherently means there’s a generative element to it. And the generative element means it is going to make stuff up. And controlling that and keeping that under wraps and not letting it go off piste. It’s actually a really hard thing. And it’s a long term project and you’re going to constantly need to change it because the model keeps changing, user behavior changes, the prompts change. So you have to constantly reevaluate all these negative cases as well and put in guardrails that defend against it. So yeah, it’s a much larger money pit than people realize.

And another aspect which is this currently massively subsidized via essentially hype funding where so much venture capital poured into Anthropic, Open AI, etc. Which is essentially covering most of the costs. But if you have a look into the stats around some of these things, both in terms of like energy consumption and like cloud computing costs, these things are hungry beasts. They use a lot of energy and therefore a lot of cloud computing costs. Open AI had that relationship with Microsoft to provide a lot of that stuff at cost or below cost, but it’s not being passed on to the general consumer. And so down the track, if that funding doesn’t continue to happen, these businesses will have to pass those costs on to the end users. And so anyone who’s built things on top of these platforms will then have to face the choice of either how do they pay for it? It changes that cost benefit calculation, or they try and spin up their own self hosted AI solution. And then obviously that increases the halo effect of the cost complexity increases yet again if you have to inhouse it. It’s a very long winded answer.

Linda: Not to mention water. Yeah, it’s going to be interesting to see how all of that plays out. Is there something that you wish everyone knew about data and you do consulting – and you work with companies who want data stuff but don’t necessarily have the expertise – Is there something that would make your life easier if everybody understood it from the start?

Ray: Yeah, that’s a good question. I think on its own, like data is relatively inert and doesn’t really have much value – like the value comes from what you do with it and how you interpret it. And it can obviously be interpreted in many different ways.

We used to as an industry, I think we used to have this philosophy of just collect everything you possibly can and we’ll work out how to monetize it later. But that sounds great on paper because data is cheap to store, but the process of capturing it and then interpreting it can get very, very expensive. So I think now we’re in a position of maybe we’ve refined and matured as an industry, but we’re much more likely to say, don’t collect everything. Let’s think about this. Let’s think about obviously things like PII. I don’t want that going into long term storage. As much as possible.

Linda: Can you explain that for our non-technical audience?

Ray: Oh, PII, personally identifiable information. So we’ve seen over the last few years some big data breaches. And this is an example of the more data you collect in one place, especially if it has that kind of identifying information in it, the bigger the honeypot you create. And so eventually, eventually, if these things get hacked and exported, not only is that critical business data, but identifying information about, let’s say, you know, it could be healthcare information. It’s not just the fact you’ve identified these people. It’s like you’ve identified them and associated them with this, you know, treasure trove for a hacker. Information about their health history. So that’s a terrible situation, obviously, and we don’t want that to happen. And one of the easiest ways to avoid it happening is simply not collecting the data in the first place. Yeah, and there’s this tension between that. Like how the most secure system is one you cannot access or has nothing inside of it. So there’s a tension between that and the practical reality that we do need some data.

Yeah, so I think these days we’re a bit more nuanced about it. We try and think about we try not to identify things. We try and scrub things that are identifying. Maybe use something like a differential privacy where you can remove or randomize certain data while keeping a consistent statistical shape to the overall data in aggregate. And so, you know, that takes thinking and that takes work.

And because we’re cautious about the risks of amassing too much data, We’re now shifting left a bit and not just, yeah, like I say, not collecting everything blindly, making more decisions about it. But also that impacts how we’re going to use the data because if we don’t collect as much data or we’re scrubbing this information, it means we cannot answer certain questions later on. So, you know, we have to be a lot more mindful about this whole process rather than just collecting everything and working out later.

Linda: Yeah.

Ray: Yes, I think it’s not quite a data question, but when going back to the gen AI question, it’s also the data used to train things is inherently historical. So what you’re really doing is looking backwards, as it were, like you’re using the past to try and predict the future, but that’s obviously there’s limitations to that. And so as much as we can build very, very clever gen AI things which can generate images and text, really what it’s doing is piecing things together, synthesizing things based on historical data. So you could, one could argue that there’s not really anything technically new being generated. It’s just a combination of other things that have already existed.

And so you can see that in like style transfer where images are created in the style of an artist and it’s using examples of that artist combined with knowledge of cats or whatever to generate a new image. But it’s not really creating something new. It’s not really thinking about it. It’s not conceptualizing it. It’s not contributing fundamentally new thought or ideas. And so I think that’s an important, important thing to keep in mind, especially when people are using like these gen AI chatbots is that when you’re asking it a question, it’s, it’s really just predicting what is the most statistically what’s the best word to come after the word I just put here. And so that’s all it’s doing is chaining together these words or in the case of an image pixels or whatever it may be. And so it’s from a Turing test point of view. It’s like it perhaps it passes that and it convinces us that it’s actually a thinking, feeling, agent capable of conceptualizing what we’re asking. But under the hood, it’s just this probabilistic. So what was the phrase there the stochastic parrot.

Linda: Yes. Yeah, I think that’s very convincing. It is. it’s extremely convincing, but it’s it’s really interesting when you start coming up against the edges of it. I was exploring.. a friend of mine tried to use the image generators to generate a three legged puffin. And it was so clear as you kind of go through the different prompts she used and the different sort of attempts. It’s so clear that it can it can show you a puffin because it’s seen pictures labelled puffin before, but it has no concept of what a leg is or where it might realistically be. and so it can’t give you a three legged puffin. it hasn’t seen a three legged puffin. And so there’s just no understanding there, there’s no I can give you what you ask for because I understand the pieces and I can put them together in a way that … you know a five year old could draw could sketch a three legged puffin and it’d be a bit rudimentary but but but they know what legs are and they know what birds are and you know they can put the ideas together. Whereas this can just give you things that it’s seen before in the slightly different arrangements, which is something that I don’t think we really intuitively understand, I think we don’t want to understand that we want them to be intelligent in some sense.

Ray: Yeah, I think a good example of that is the new video generation. And I’m sure we’ve all seen examples of the very trippy psychedelic videos where there’s no coherence. And so each frame is being interpreted individually and so as the video progresses things that weren’t anything in a scene suddenly manifest into a person or a dog and then transmutate and there’s no, in those examples, because there’s no underlying concept or continuity, Every every bit of noise or shape could then be subsequently translated as something completely different. So that’s what creates this kind of dream like, psychedelic video.

But modern ones have better coherency in time, but you can also see how they struggle with physics. So if you’re generating something new and there’s an object. Let’s just say it’s like a car crash or something like that. it’s seen examples of that from movies, and it can generate some of it. But you’ll notice that the physics will always feel off and not quite right, because there’s no underlying understanding of how the physical world works.

It’s replicating images from videos, it’s not replicating the underlying mechanisms that cause it to happen. So, yeah, I think sometimes those edge cases are telling about how the internals actually work. But for the vast majority of cases if we’re just generating something like a fun image of a puffin with three legs, maybe what it’s producing – you’re not asking it to do something that’s necessarily photo realistic or a real world thing. you’re asking it to do something which is crazy and not really seen before, so this kind of creative interpretation is perfectly acceptable. Yeah.

Linda: We’ve been wandering around AI and I’ve lost sight of the questions a little bit but it’s been super interesting. What are the worst data mistakes you’ve seen?

Ray: Well, I think, obviously, as I mentioned before these big data breaches are probably like one of the larger risks to happen like when you’re amassing data, especially private data. That’s one of the risks that can happen. And yes, there are fines associated with it. There’s loss of trust in the brand. But obviously the big impact is on the individuals who potentially are now, you can’t claim that data back. You can’t scrub it once it’s out there. Now, when the, was it Medibank? breach occurred? I think there was like a, they were providing free new driving licenses or things like that so that people could, you know, their old is scrubbed, but that obviously doesn’t get rid of the old information. The information is already out there. It doesn’t get rid of all these medical records associated with their identity.

So it’s like, it’s not enough. And the best, like I said before, to do is just either not collect, avoid collecting stuff you don’t need to collect and don’t collect it for any longer than you need it. And this is a good recent thing, consent. Don’t use people’s data willy nilly. Only if people opt in for that specific purpose for that specific period of time. And then that way they have at least a degree of control over when their data is being replicated or used for something. And I think the general public are much more aware of these things now and because of these incidents and so they demand that companies notify them, more so, when their data is going to be used. And we have seen examples of – now a company goes and uses customer data to train its AI models on private personal data without the customer’s knowledge. That is now an outrageous thing to do. And they get, you know, there’ll be media articles about it. And that’s a good thing. At least we’re holding people to account and we’re raising that as an issue now. Whereas in the past, I think nobody would have even known.

Linda: Yeah, it is complicated there. And I think our understanding is still lagging behind the industry behaviors. I was in a medical appointment recently where the doctor said I use AI to transcribe my appointments. But it’s fine because I delete it after two weeks and it doesn’t keep the audio, it just keeps the text and I delete it after two weeks it won’t be in the system. But I went and looked at the privacy policy of the system that she uses. And it clearly states that it reserves the right to to share your personal data with third parties and its software developers and whoever it wants basically, for the purposes of improving the system.

So, she says it’s deleted after two weeks but it’s in the training. And they reserve the right to share it with anyone. And I don’t think she knew that – the doctor – and we certainly weren’t informed of that as as the patients in the in the appointment. And I don’t think most people would know the questions to ask to, you know, to drill down to what’s actually happening with the data and that situation. And that’s, that’s going to come out and that’s that’s just outrageous.

Ray: Yeah, it’s a fundamental issue of who owns your data. Yeah, I think most people would assume they own their data. It seems most companies, while many companies in this space can get away with laying claim to it, and they’re moving it around as an asset. And I think, I fear without adequate legislation that puts the ownership squarely in by default in the hands of the citizen, and they have to actively opt in and give consent for it to be used in certain ways – without something like that in place is doubtful that companies are going to change their behavior around this too much. Yeah, we do need like a common framework.

Linda: We do, and we need legislators who understand it.

Ray: Yes, yes. Yeah. And I think this recent trend in obviously like AI, but gen AI specifically where they’re training on, basically, the internet. Like comments and on Reddit or discussion groups things that may have just only existed and you’d find it in the Google search, but now makes up a part of the training corpus of these large language models. That person wrote that comment with the intention of replying to someone on some comment thread somewhere. They didn’t write that comment, maybe 10 years ago with the intent of it then being used to build, contribute to building this product. ChatGpT. That wasn’t their intent. They didn’t consent to that.

Yeah, but I guess somewhere down there in terms and conditions it says Reddit’s allowed to do that, or Google will come along just like scrape the whole internet. You know, this has happened before people are annoyed about Google scraping the internet and indexing things and displaying search results. French news agencies were upset about that back in the day. And this is like another iteration of that. So now other products are now being built on people’s public data. But is it – does public mean it’s up for grabs?

Linda: Yeah.

Ray: Just because you wrote it in one context doesn’t mean it’s legitimate in another context.

Linda: Yeah, and I feel that the things you tell your doctor are not public data.

Ray: I think that it’s almost as a leakage from the mentality of how we treat public data into things. Oh, if we scrape enough data for enough places on the internet, we can build this. Yeah, if we want to build a model for private data, I will just scrape all the private data and put it into one place and do it there. Yeah, and it kind of skips over the part where who’s this or how do you retain privacy in all this?

Linda: Yeah,

Ray: what about the individual user? It’s their data. That’s their life. There are techniques to find a balance like federated learning, for example, where – I believe Apple do this – so they can learn on device. They can use your photos or messages and things like that. And they can learn, but then they take the weights and biases from that. And so without exposing your data to the internet, your data remains on your device, and they can just like learn some like classifications and things like that. And then upload that and then that is combined with everyone else’s data along with some other core models and that creates a new model, which is and so then that way you can learn from people’s personal information without exposing their personal information from their device. So there are things that you can do like that. so you can still do ML without compromising privacy, but it is much more complicated.

Linda: Yeah,

Ray: and, you know, obviously, people end up wanting to do a cheaper thing and that’s the motivation. But if that legislation was put in place to motivate people… There are options out there. It’s not like it’s impossible.

Linda: Yeah. Yeah. it requires legislation. I don’t think we can. I think we’ve seen very clearly over the over decades really that we can’t expect companies to act in society’s best interest unless we force them to. Have you ever seen data deliberately misused and what do you look for? How do you spot it?

Ray: Not, not on a, well, I guess there are cases where it has been deliberately famously deliberately misused. I think, you know, a good example is like the Centrelink fiasco where, you know, people’s data was being used to make probabilistic assessments as to whether people were owed money and sending out demand without a human in the loop.

LInda: Yeah.

Ray: And so someone obviously designed this and signed off on it and put it into production. And so people can point fingers as much as they want. The fact is it happened. It was a misuse of data, a callous misuse of data with no, no thought. Well, maybe they did think about the individuals involved, but they were optimizing for cost recovery.

Linda: Yeah.

Ray: Not what’s best for the individuals. And so I think that’s, that’s a shameful misuse of data by our government. So maybe that puts another side to the legislation thing is – people who are going to legislate it are also the misusers of it.

Linda: Hmm.

Ray: But I think that’s particularly egregious, that one, because these are people who are supposed to be in a position of trust and authority. And if they cannot be trusted, it leaves a very sour taste as to like, well, what else is going on? We, the government has that ability to have huge amounts of data across any number of different areas.

Linda:Hmm.

Ray: What do we just have to wait until some, there’s a whistleblower or a leak until we find out the next thing, like how many other things are going on like this. So I think if, if for nothing else, even if there is nothing else going on, the fact that it does not help in, in the fact that it sows a level of distrust between a citizen and their government, and that is just not necessary and not constructive in any way. Yeah.

I don’t think the data was collected with that intention in mind. So that’s the other thing is that it was the data was there.

Linda: Yeah.

Ray: And someone looked at it said, yes, we can use this data to do this thing. So that’s

Linda: comes back to consent, doesn’t it?

Ray: Comes back to consent. I’ve also, I think there was a something recently where a medical imaging startup was using customers medical images to train machine learning. And that again was without customers consent.

Linda: Yeah.

Ray: And so that’s been in the media recently. And that’s, that’s, that’s an interesting one. I’m sure somewhere in their fine print, they say, oh, if you give us your images, then we’re allowed to use them for other purposes. But putting it in the terms and conditions is very different from getting people’s enthusiastic and positive consent

Linda: and informed.

Ray: Informed consent. Yeah. People need to need to have a full understanding of what something is otherwise can it really be considered consent if they if they not if they don’t know what machine learning is and they don’t know what this AI models for, they don’t know what the, you know, they’re not even aware that this is another thing a company does. Yeah. They can’t consent if they don’t know those things.

Linda: Yeah. Yeah, it’s like consenting to an operation and then, you know, the operation you think you’re consenting to is, you know, an appendectomy and they decide while you’re in there they’re going to take a leg. Definitely requires a different consent form.

Ray: While we’re in the area we thought we’d do other things. Yeah.

Linda: Yeah, exactly. No, I like to be involved in that conversation, please. Yeah, we have a lot to learn I think about consent where data is concerned. What’s legal and what’s ethical and not necessarily the same thing.

Ray: Yes, absolutely.

Linda: What’s the first question you ask when you look at graphs in the media?

Ray: I think I saw in a good example of this. The other day, I can’t remember.. There’s a station in the US and I had a graph of user opinions. And it was like, yes, no, maybe don’t know. And the 13% results was the largest bar and the 22 and the 25% ones were really small.

Linda: Oh, it burns!

Ray: There was no real y axis. The vertical amount was what fit the narrative. It was a particularly bad example that there was no y axis, but that would be my answer is what is the what is the axes. What are the axes? What are they saying? What is the scale? And there’s a I did see another one the other day. I think this is from in Australia. And I can’t remember what was tracking it might have been. It might have been COVID cases or something like that. But the y axis went in steps of, I think it was 10s and then 30s and then 50s.

Linda: That’s creative!

Ray: And it wasn’t like logarithmic or anything like that. It’s just kind of like they obviously had a maximum number and then they fiddled around with the y axis to make the graph look a particular bit. It’s like, oh, it’s such a dishonest practice. And I think a lot of people, especially when you’re flashing something up on the screen on the TV, but not, you know, you’re not sitting there pausing and analyzing the graph. You just kind of like, oh, well, visually things bigger than that thing. It must be bad. Or Good. Depending on what they’re trying to say.

Linda: Yeah.

Ray: So I think that’s that’s a that’s a hard one. I think it’s different if it’s in print or on why because you can sit there and you can study a bit more.

Linda: How many people do you know that’s the question. How many people do and how many people know what questions to ask. That’s part of why I do what I do. What excites you about that?

Ray: I think there’s a lot of potential in using data for solving some pressing, pressing challenges. Energy, climate, agriculture, food security, it’s… we need to be more efficient. We can’t be so wasteful with things like energy or produce and, you know, we need to be efficient. Like, sure, we can buy electric cars, but the better thing to do is maybe not have cars as much, right? more public transport. But these are logistical problems. How what’s the best routes to have? what impact does pricing have to adoption of public transport? How busy does the tram need to be before people won’t get the tram and they’ll drive their car instead? You know, things like that.

So if we if we understand all these factors and we can do some predictive modelling, we might be able to come up with better policies. I mean, personally, I think we should have free public transport. You know, modelling this and seeing what the impact is and how much it will change things, you know, we could potentially impact climate energy and, you know, and a healthier workforce as well by by doing policies like that. So it’s a multifaceted impact. But yeah, data helps us look across all these different areas.

And it’s a tough job, right? We have to model all these different things and pull it all together. But it is possible. And I think I think that’s the power there. Like I said before, data itself is inert. What we do with it and how we model on it and how we can use it to solve these problems is definitely where the value lies and how we apply it to these things. And I think, you know, efficiency is a big thing. Like we always talk about we want more power, more energy, more energy generated. And there’s discussions about new power plants and, you know, whether we should have more coal power plants. Oh, we need more coal power plants because we need more energy. Okay. On the one hand, we might need more energy, but we should also be very conscious about how we’re using energy and simple things. Simple things like we have huge amount of rooftop solar, but we don’t have many batteries. So that means we’ve got lots of free energy during the day. So simple things like use your energy, hungry devices, like the tumble dryer during the day when there’s plenty of sun. And so then you’re more likely to offset that energy cost as soon as you use it at night, you’re going to draw power from, you know, the grid and therefore probably coal in Australia. So there’s things like that where partially it’s educating the public. But it’s also like, again, it’s kind of the logistical thing of like trying to work out a model of the best time to use certain types of power. Yeah, oh, it’s particularly windy today. That means that wind farms are going to be generating more power, which means that this is how we should use it. And this is how we should proactively adapt to using the power as it’s available. So yeah, I think I’m overall optimistic and excited about using data in those areas a lot more. And it also draws on many different – going back to the interdisciplinary thing – It draws on so many different areas of science, like modeling so many different aspects of the world in order to build these kinds of ensemble solutions. Like to sort of think of, you know, simple things like an individual’s behavior in that place, with that kind of weather condition. Well, there’s a cohort of people like that. And together they’re going to create a spike in demand at this time of the day in that region. And there’s so many different things you need to model just to have that one prediction. And so we can map this out as the entire nation and look at different areas or can also look at the climate impact from that as well. Yeah, it’s not a simple undertaking, but it’s definitely deeply, deeply fascinating.

Linda: Thank you so much. This has been a fabulous conversation. I love the way every one of these goes in a different direction and brings up a whole range of new issues. It’s been really interesting talking to you. Thanks for coming.

Ray: Thanks for having me.

Ray Hilton on AI & Deep Tech

Related

Published by Dr Linda McIver

Leave a ReplyCancel reply