People Friendly: Teachers’ Guide to AI – Defining Artificial Intelligence – ADSEI

Defining Artificial Intelligence

“AI is a strange science. It tries to define what it studies, while studying what it defines.” Andy Kitchen

Linda used to teach Computer Science to year 11 students, and that year-long class always started with an introduction to Artificial Intelligence. Step 1 was: Define Intelligence. The expectation from the class was typically that this would be easy, but the classroom debates and discussions went for hours, and we could never really pin down what intelligence is – which was, of course, the point. That’s because I suspect it is, at heart, indefinable. Intelligence, like ethics, is not one thing. It’s pluralistic. It can be expressed and measured in a multitude of ways. We recognise intelligence in certain behaviours, but it’s impossible to put together an essential list that defines intelligence, without stumbling over a host of caveats and exceptions.

In the end we usually came up with a list of attributes that were necessary for intelligence. The list included things like the ability to:

Be creative
Be self aware
Feel and express emotion
Solve problems
Recognise patterns
Be imaginative
Have empathy
Apply things learned in one context to a new context
Use and understand language

The trouble is that every definition of intelligence requires a definition of another surprisingly challenging term. For example, we can assert that to be intelligent requires creativity, but what exactly do we mean by creativity? Well, the ability to create something. But if I create something by following a recipe – say, making a tasty curry – is that creative? I haven’t done anything new, I’ve simply followed someone else’s process to create something that has been created before. Perhaps if I vary the recipe by adding different vegetables, or a different combination of spices, that might be properly creative? So is creativity creating something that is wholly unlike anything created before? How new is new? Does painting a statue blue make it a new statue?

Another problem with trying to formalise intelligence with a precise definition is that it raises uncomfortable questions about our own intelligence (and hence our humanity). If you’re bad at pattern recognition, does that make you unintelligent? If you can’t visualise things in your mind, you have aphantasia, which is estimated to affect around 3.9% of the population to some degree. Does that make you unintelligent? What about if you can’t recognise faces, because you have prosopagnosia, which is estimated to affect around 3% of the population?

There is increasing evidence that dolphins and great apes can recognise themselves in a reflection, and that many primates can use sign language. Crows can make and use tools. How many of the attributes of intelligence are necessary to meet the definition of intelligence? Which, if any, can we do without and still be considered intelligent? How, indeed, do we define and test for some of those attributes? What is our goal in defining intelligence? Is it to draw a line with humans on one side and all other creatures on the other, or is the goal to define intelligence precisely and see who – and what – falls on our side of the line?

The field of psychology has wrestled with these questions for decades, without reaching complete agreement on an answer. Since we can’t even define intelligence yet, it is fascinating that we seem to think we can create it. But that’s a classic trope in the world of technology, where tackling a problem without understanding or even defining it first, often seems to be the default approach to new technology. We are obviously not the first people to ask these questions. So why do companies like Google DeepMind, OpenAI, and Anthropic claim to be building, or solving intelligence? The likely answer is they know they’re not, but they have to make hyperbolic claims to get Venture Capitalists to pump their coffers with investment funds. While bad, the alternative – that they truly think they can create machines with human-like or super-human intelligence (without a scientifically accepted definition, and working under the market incentives of capitalism) – seems even worse.

Activity – what is intelligence?

This is an excellent activity to run in class, at almost any year level. Have a class discussion on what it means to be intelligent. Come up with a list of attributes which define intelligence. Then, brainstorm examples of exceptions for as many of these attributes as possible.

It’s not only Linda’s classes that expected Intelligence to be simple to define. Back in the 1950s Marvin Minsky and his colleagues at MIT set out to create Artificially Intelligent software, expecting it would take them the whole summer. Here we are in 2024, and not only have we not succeeded in creating truly intelligent software, now we’re not even sure it’s possible.

It turns out that there is no definitive answer. No simple “if it does this, it’s definitely intelligent” criterion. Some basics are generally agreed upon, though. Intelligence requires, among other things, the ability to learn, to adapt to new situations, and to be creative.

Who/What passes the test?

In 1950, Alan Turing proposed a test that he thought could help determine whether a computer program is intelligent or not. He based it on a game called the Imitation Game, which was intended to see whether participants could distinguish men from women from their conversation alone. The imitation game had its participants in different rooms, passing notes. Now known as The Turing test, the computing version was to put someone in a room with a keyboard and screen, and have them chat, using the keyboard, to two different agents in other rooms. One agent would be human, and one would be a machine. If they chatted with a computer program and could not tell it was a program and not a person, then perhaps that program could be called intelligent.

And yet, along come Large Language Models, which could happily pass the Turing test but don’t have any level of understanding or awareness. They simply use a statistical process to calculate a plausible string of words. Are they intelligent?

Activity – Do Chatbots pass the Turing Test? Why/Why not?

Have your students have conversations with a range of different chatbots. You could use ChatGPT, Claude, Gemini, service chatbots from company websites, or any others you can find. Ask them to identify responses that seem human, and responses that don’t. What makes them seem human? What makes them seem inhuman? How do they differ from conversations with friends, or with strangers? Do you recognise this as an AI? If so, why?

There’s a wonderful quote by Adrian Tchaikovsky, from his novel “Service Model”:

“Humans have been reading personality and self-determination into inanimate phenomena since long before Alan Turing ever proposed a test. The level of complexity in interaction required for an artificial system to convince a human that it is a person is pathetically low.”

One thing is certain – there is no direct path from Large Language Models to truly intelligent systems. They are not, as they are sometimes hyped to be, the very last step before machines become intelligent. They are not even a logical step along the way.

A system that most of us would think of as real AI – something that can, more or less, think like us – is known in Computer Science as Generalised Artificial Intelligence, and it is nowhere on the horizon. The term Artificial Intelligence is used instead to apply to anything produced using algorithmic and statistical techniques designed in the quest for real AI. It’s not intelligent. It just does some stuff that AI researchers came up with, and that might look a bit smart. In dim light. From the right angle. If you squint.

Most of these systems use one of a group of statistical algorithms called “Machine Learning” in the field. “Learning” is again, something of a misnomer. They’re not really doing what we think of as learning, which should involve understanding. They’re just getting progressively better, with feedback, at one very specific task.

Even voice recognition AI struggles at times. I’m currently staying in a highly automated house, with friends, and this morning my “Hey Google, turn the A/C off” was greeted with “Sure, I’ll tell you a joke…” And while it’s true that many people will tell you that my accent is a little unusual, Google has had plenty of time to get used to it, and I don’t really see “turn the A/C off” as sounding like “tell me a joke” – even in my weird accent.

Activity – How good is Voice Recognition?

Experiment with voice recognition – you could use voice recognition on Google Maps, on phones, text transcription in video conferencing systems like Google Meet, Teams, or Zoom, or the built in speech recognition in Windows or on a Mac. What words does it get right? What words does it get wrong? Is it different for different people? Does it consistently get some words wrong, or does it vary? Can you identify circumstances where it goes wrong, and circumstances where it’s mostly right, or does it seem random? Afterwards, have a class discussion about different uses for Voice Recognition (such as meeting transcription, medical appointment transcription, asking Siri or Google to switch lights or AC on or off, controlling equipment, etc), and consider where mistakes might be problematic. Are there situations where using voice recognition could be dangerous if it goes wrong?

It’s a shame, really, that the term AI has morphed into referring to systems that are really quite horribly dumb. And even if we don’t have to worry about AI becoming sentient and taking over the world any time soon, there are plenty of dangers in the cavalier way we use AI and machine learning. We tend to trust them too easily, and fail to evaluate them critically. That’s why it’s so important that kids learn about technology, including how to be rationally sceptical of it.

Large Language Models are problematic for a number of reasons, which we will discuss in more detail in future chapters, but for now, let’s have a quick look at the highlights (or, more accurately, lowlights) reel:

Theft
- LLMs are trained on text data that’s collected from the internet. Anyone with a web page, or an article published somewhere online, and many book authors, have probably had their work slurped into the training data for one or more chatbots, without permission.
- Some folks argue that systems being trained on content from the internet are rather like artists being trained on the works of the artists who have gone before them. After all, everything we experience becomes part of the inspiration for our creativity, so what’s the difference? The difference is that LLMs are not using these materials as inspiration. They are using them as content. Think of eating a cake, and liking it so much that you try to figure out how to make a similar cake. That’s using the existing cake as inspiration. But LLM’s don’t do that. They break the cake into its component crumbs and use those crumbs as ingredients for their “new” cake. Sometimes you get the crumbs combined in a different way. Sometimes you get chunks of the original cake. LLM’s don’t really generate. They regurgitate.

Activity – Fair use?

Find an opinion piece or other post arguing that training LLMs on creative work is theft, and one arguing that it’s fair use. Compare the arguments. Which do you believe? Are there merits to both, or is one argument much stronger than the other? Who benefits from defining this form of data use as fair use? Who is harmed by it? What is the financial cost or benefit, and to whom, of training LLMs on other people’s data, and what would be the cost or benefit of ruling it illegal (and to whom)?

Energy and water use
- The energy and water needs for training and running Large Language Models are horrifying, particularly in the context of climate change. Training Chat GPT 3 consumed an estimated 1,287 Megawatt hours. For context, that would power over 60,000 average Australian 4 person homes for a year. That results in a dangerous amount of CO2 being released, for unspecified, and, in many cases, unrealised benefit.
  - https://www.statista.com/statistics/1384401/energy-use-when-training-llm-models/
  - https://arxiv.org/abs/2304.03271
- What’s more, a paper by researchers at the University of California and the University of Texas estimated that training ChatGPT3 alone directly evaporated 700,000 Litres of water. They go on to say that AI systems may be responsible for the loss of up to 6 billion cubic metres of water annually by 2027, which is over half the total annual water usage of the entire United Kingdom.
- The water and energy use of these systems is a clear environmental threat, and directly contributes to climate change. Microsoft’s Bill Gates says the amount of energy used by AI doesn’t matter, because AI will solve climate change for us. But there is no evidence of AI ever having solved novel problems. Sure, in some very specific cases it is excellent at doing things it has been very precisely and carefully trained to do, things we already know how to do, but it has never yet solved a problem it hasn’t been taught to solve by someone who has already solved it. We have no evidence that it will ever be capable of solving climate change, but every evidence that its voracious appetite for energy and water are rapidly making it worse
  - https://theconversation.com/ais-excessive-water-consumption-threatens-to-drown-out-its-environmental-contributions-225854
Accuracy, or lack thereof
- It’s important to remember that large language models are not thinking. They are putting together statistically plausible text. Each word is carefully calculated to be a likely candidate to follow the word before. The calculations are based on every piece of text the model has ever consumed. How does it figure out which word is most likely to be next? Simply put, it’s by looking at what the most common next word is in all of the text it has ever seen. There’s a bit more maths than that, but that’s basically the approach.
- This is also why the models have no concept of truth or accuracy, and the cause of the so-called “hallucinations” that resulted in Google’s Gemini system recommending people eat a rock a day, or use glue to stick toppings to pizza. Because it is only trying to put together words that are commonly together. Patterns it has seen before. Not answers that are true. Sometimes it’s lucky, and those common phrases add up to a reasonably accurate answer. But sometimes they don’t. And it can’t tell you either way, because it is not designed to. Generative AI takes every bit of data it has ever been trained on, and mixes it up according to statistical filters to produce some output. It is not actually generating content. It is regurgitating it.
  - In some cases, LLMs have been found to regurgitate whole chunks of this stolen text in their answers. In others, they simply use the stolen text as fodder for text that they have recompiled into something “original”. This means that if any private data is fed into a chatbot, that private information can be exposed when someone else uses that chatbot. This means that entering your personal information, or other private data such as student assignments or company documents, into a chatbot is potentially making them public.
Inappropriate applications of Chatbots
- Another problem with this lack of truth and accuracy is that some of the ways Chatbots are being used are wildly inappropriate. Many of these Chatbots, such as the examples below, have never been independently evaluated to see whether they are helpful or harmful. Do they give dangerously inaccurate information? Do they amplify bias? Do they expose sensitive personal information without the consent of their users? In many cases we simply don’t know.
- For example, asking a chatbot for recipes can lead to issues such as being told to use glue to keep the toppings on your pizza, or to eat one small rock every day, as the chatbot takes things it has seen in forums like reddit and regurgitates them. When chatbots are used as customer service agents things can go horribly wrong, as in the case where an Air Canada customer service chatbot cited a refund policy that did not exist. Challenged in court, the company was told it had to honour the policy, because the customer had acted in good faith on the advice the chatbot gave it.
  - https://www.wired.com/story/air-canada-chatbot-refund-policy/
  - https://www.bbc.com/news/articles/cd11gzejgz4o
- Chatbots also lack judgement. They do not think in any sense, and can’t even reliably do maths, much less exercise the kind of fine judgement used for marking, rating, or ranking things. Ask a chatbot to summarise an article in a certain number of words, and it will almost always fail. We asked Claude to summarise something in under 200 words and it gave us 232. Given that maths and counting is the one area where computers have (mostly) been reliable before, this might be a surprise, unless you really understand the way chatbots work. (We go deeper into this later in the book.)
- One crucial result of this lack of judgement is that, although many Chatbot based systems are marketed as being good at assignment marking, there is ample evidence that they are actually not capable of the kind of judgement that a marker should exercise. Chatbots recognise patterns, so where your students’ work matches patterns the chatbot has seen before, it may allocate a plausible mark. But where a student takes a novel approach, or applies lateral thinking to a problem – both surely traits we wish to encourage – the chatbot will very likely miss the point and assign a wildly inappropriate grade.
  - https://techcrunch.com/2024/10/02/why-is-chatgpt-so-bad-at-math/
  - https://leonfurze.com/2024/05/27/dont-use-genai-to-grade-student-work/
- Chatbot or LLM based systems are increasingly popular as recruitment tools, judging applications, resumes, and interview responses. Once again, though, this is a very dangerous approach. There is growing evidence that these systems amplify bias, disqualify highly qualified candidates, and lack transparency and accountability.
  - https://www.bbc.com/worklife/article/20240214-ai-recruiting-hiring-software-bias-discrimination
- Chatbots are increasingly popular in healthcare, as doctors are sold systems that, they are told, can transcribe appointments, freeing them from taking notes, and allowing them to concentrate on the patients. Sounds wonderful, but there are many concerns with these systems. Recent evidence shows that the ongoing issue of AI hallucinations applies to these “transcriptions”, to the extent that they have fabricated both symptoms and treatments.
  - https://apnews.com/article/ai-artificial-intelligence-health-business-90020cdf5fa16c79ca2e5b6c4c9bbb14
- Another issue is that of informed consent. Linda was recently present as the support person in a medical appointment where the doctor asked the patient’s consent to use AI transcription, saying it was perfectly safe as the audio was not retained, and the transcription would be deleted by her after two weeks. Careful reading of the terms and conditions of that software, however, showed that the company providing the service reserves the right to share patients’ personal data with just about anyone, in order to improve the software.
  - https://adsei.org/2024/10/07/informed-consent/
A lot of these issues boil down to the problems of lack of
- Independent evaluation
- Transparency, and
- Accountability
We will talk more about these key issues later in the book.

Activity – Putting the AI to the test

Having everyone working with the same chatbot, set the class the task of finding questions that the chatbot gives wrong answers to. Once there’s a selection of questions, try the same questions on different chatbots. Do they give the same wrong answers? How are they different? How can you test the answers given by a chatbot?

With the class using a range of different chatbots, ask it the questions from a recent test the class sat – whether it’s Digital Technologies, Science, Humanities, Maths… and then mark the chatbot. How many did it get right? Were different chatbots more or less successful? Were any of the answers ambiguous or hard to understand?

One other problem we need to touch on is that of equality of access. With chatbots increasingly offering low levels of access for free, and higher level, better performing systems coming at a cost, we risk further entrenching disadvantage, as the wealthiest people buy access to the best tools, and the poorest people only have access to the low performing free tier.

The rest of this book will contain more about things AI can do, and things it can’t (and why). It will talk about what problems we can use AI to solve right now, and what it might be able to do in the future. We’re also going to spend some time talking about hype cycles and hyperbole, and the very manipulative ways AI is being marketed and discussed right now. We’ll cover some of the ways in which AI can be harmful to us, and to the world. That naturally leads into the issue of bias in Machine Learning. Where and how can it appear, why, and what impact does that have on AI outcomes?

No discussion of AI would be complete without a look at our rights in AI systems, and what reasonable expectations we can and should have of the way these systems operate, which don’t always align well with the ways AI companies operate. And then we’ll look at a path to a better future, where AI systems are built with transparency, fairness, safety, social good, and wellbeing built in.

Most importantly, we’ll give you practical activities that you can use to explore AI for the classroom, and a list of helpful resources you and your students can use to go deeper into the ideas discussed here.

What’s our position on Artificial Intelligence? Well, like AI, our position is evolving. For now, we can say with certainty that there is no such thing as (AGI) Artificial General Intelligence, or humanlike intelligence. At some point in the future there might be, but despite all of the hype, it’s not imminent, and it certainly doesn’t exist yet. We don’t even have any good evidence that it’s possible to create true AI, though, equally, there’s no reason to believe that it isn’t. Our brains are physical things, just biological computers, really. A mass of electrical connections bathed in a sea of hormones. We should be able to puzzle them out and mimic them in some way. Truthfully, though, we are a surprisingly long way from being able to do that in any kind of meaningful fashion.

Activity – Computers versus Humans

Describe the things that humans are good at that computers aren’t. Describe the things that computers are good at that humans aren’t. Are there some ways that computers have already surpassed human abilities? Are there ways that human abilities can never be matched by a computer, and why/why not?

Extension: Discuss whether it’s even a good goal to try to reproduce all forms of human intelligence with computers.

https://codebots.com/artificial-intelligence/the-3-types-of-ai-is-the-third-even-possible

The important thing about our response to Artificial Intelligence systems, now and forever, is that we evaluate them critically and rationally, and demand evidence of their strengths and weaknesses, rather than simply taking the hype at face value.

Back to Introduction Forward to How AI Works