It’s a little frustrating when I hear people say that “sometimes” chatbots hallucinate.
“Oh, sure it makes mistakes, but it’s pretty good a lot of the time.”
“Occasionally it hallucinates, but it saves me so much time!”
An academic friend of mine who really should know better recently used a chatbot to find a gluten free restaurant we could go to for dinner in London, which is how we wound up at a restaurant that served literally nothing gluten free at all. Apart from some of the alcohol. That was an interesting night.
We have this idea that a Large Language Model chatbot is trying to give you the right answer. It is trying to be accurate. It’s just that sometimes it messes up.
Press releases tout the virtues of the “Advanced reasoning” in newer versions, as though reasoning is something these models do.
Even though the design of the systems deliberately reinforces the idea that they are sentient reasoners*, chatbots are still merely statistical text extruders, or as Professor Emily Bender says, racist piles of linear algebra.
They are not trying to produce correct answers. They are not trying to be accurate. They are simply producing statistically plausible strings of text that pattern match, in some unspecified way, the text you used as a prompt. Consider this example, produced by ChatGPT today:

It doesn’t merely give the wrong answer. It actually shows you the “reasoning” it used to get the answer. The thing is, it’s not reasoning. It’s not a reasoning machine. The fact that the statistically plausible text is occasionally correct is the surprise. That it regularly gets things wrong is exactly how these systems are intended to work.
It’s not predictably wrong of course – that would actually make it a little more useful!
Let’s try babbling baby bonobos.

Two bs in babbling, two in baby, three in bonobos. Sometimes less, sometimes right, sometimes more. Okey dokey then.
(I particularly like the way, when it says a word contains 3 bs, it gives you the bs in brackets: (b,b,b), as though that helps in some way. At least the number of bs in the brackets seems to correspond with the number in its answer, I suppose!)
The argument that it’s easy to spot the mistakes, or to understand what it will get right and what it will get wrong, really badly misses the point. The point is that to Large Language Models there is no right or wrong. There are just patterns, and strings of text. It doesn’t understand them. It doesn’t reason about them. It doesn’t try to get them right. It simply produces strings of words placed together by a statistical process that takes all the words it has ever been fed (all that stolen creative work, plus an alarming proportion of the festering cesspits of the internet), and calculates their likelihood.
It’s not even entirely due to garbage in, garbage out, or false information in its training data. When your chatbot’s “writing” is done by a process of statistical likelihood, even if your training data is entirely accurate and correct, your output is going to be wrong sometimes. Because it’s not even reproducing any facts that might exist in its data. It’s constructing strings of text by statistical processes.
Regardless of what the frantic hype machine tells us, chatbots don’t think, or reason, or try to give us correct answers. They are racist lie generators that happen, by bizarre coincidence, to occasionally get the right answer.
Incidentally, I asked chatGPT if it was a racist pile of linear algebra and my prompt was removed as offensive.
* (see Emily Bender and Alex Hanna’s book The AI Con)
