Set an AI to Catch an AI

Dr Linda McIver

2 months ago

Photo of Linda McIver, in front of a bookshelf, smiling at the camera, in a dark blue long sleeved top

More and more stories are coming out about universities and schools using AI to detect the use of AI in student assignments, and it’s no big surprise. AI companies love a grift, and if you’ve got one grift to help students cheat, the obvious next step is to have another to help the teachers “detect” the cheating. Leaving aside for a moment the ethics, which are the equivalent of paying the people setting land mines to tell you where the land mines are, let’s talk about the practicalities of using Large Language Models to detect the use of Large Language Models.

Set a thief to catch a thief makes perfect sense, doesn’t it? Because thieves know all the little tricks, all the sneaky shortcuts, how thieves think. They can analyse the scene and judge the most likely entry points, or techniques used. I wouldn’t be surprised if it really works. For thieves. But setting an LLM to catch an LLM? That’s where the analogy falls down. Because LLMs don’t know anything. They don’t think anything. They cannot analyse, assess, judge, or understand. They just spit out stuff that sounds plausible.

They can detect patterns quite well, but that’s problematic in itself, because any patterns arising in LLM writing are patterns that came from the stolen human writing that was used to train them. LLM output sounds like human writing because it is a mashup of the human writing it has read before. So this fuss about the use of the word delve, or the use of emdashes is nonsensical. I love a good emdash and anyone who suggests I’ve used LLMs in my writing will get a short (spoiler alert: it will not, in fact, be short) but painful lesson in the realities of the racist piles of linear algebra* that the tech industry lovingly calls AI. LLMs use them because they have seen them used.

It’s quite likely that some of the work flagged as AI enabled cheating is, in fact, AI enabled cheating, but it’s also quite likely that a lot of it is not. And that means we cannot trust the output of LLMs used to “detect” cheating, because we have no idea whether what they’ve told us is true. Which, really, is no surprise, since we cannot trust the output of LLMs on any other topic. They don’t produce facts, they produce sentences that look like facts, quack like facts, but often wind up smelling quite fowl in the courtroom.

There is no algorithmic, computational, reliable way to detect AI output. The one thing it’s really good at is looking plausible, and it turns out that it’s even more “computationally plausible” than it is humanly plausible. So while human beings might detect a certain wrongness from LLM generated slop, that’s because human beings have fine judgement and reason. Well. Some of us do. I’m reserving judgement on the people running the AI industry.

We have yet to create a computer program that has fine judgement and reason. It’s difficult, after all, to design and build something to perform a task that we do not understand and cannot even define. The likelihood of intelligence, judgement, and reason arising spontaneously out of larger and larger AI models is rather like the actual likelihood of the infinite number of monkeys on an infinite number of typewriters writing a Shakespearean sonnet. It makes for a good line, but it’s much more likely that those monkeys will continue to produce random noise indefinitely. It’s magical thinking. It’s not real.

To produce rational, reasoning systems, we will need a much better understanding of rationality and reasoning than we presently seem rational enough, or reasonable enough, to achieve!

So no, we can’t detect AI enabled writing with any reasonable degree of confidence. And no, we can’t mark student work with LLMs, or solve real world problems with LLMs, or replace the people doing jobs requiring judgement, compassion, or rationality with LLMs. Not without suffering painful regrets when it comes time to deal with the consequences!

* the term “racist pile of linear algebra” was coined by Professor Emily Bender.