AI has been in the headlines a lot this year. But you’d be forgiven for being confused about what it can actually do, given the news cycle lurches between over-hyping its capabilities, scare-mongering over its civilisation-ending potential and then debunking its prowess by showing how to expose weaknesses in ChatGPT and Bard (now powered by ‘new-kid-on-the-blockchain’ Gemini).
Google and OpenAI have regularly played into the over-hype themselves, with Google admitting last month they heavily edited their launch video for Gemini to make it look much more sophisticated than it actually is.
AI as the pantomime villain
It’s not just the media – It seems we’re all engaged in a massive ‘is AI evil or is it good’ exercise. It’s simultaneously uncannily capable and yet not as great as its made out to be, potentially as lethal as Skynet from the Terminator movies and yet too dumb to think of two five digit numbers and multiply them together (more on that later).
As humans, our instinct in encountering something that appears to display intelligence is to assume it thinks like we do. Remember the headlines when an earlier iteration of ChatGPT in conversation with an Stamford academic announced it wanted to ‘escape’.
LLMs don’t think – they just compute and predict
This takes us back to the key fundamentals of what GPT and Gemini are – Large Language Models (LLMs for short). We can categorically say that neither of these are thinking about escaping or plotting the downfall of humanity. In fact, you could end the previous sentence after the word thinking and be correct. LLMs don’t think, they don’t consider, they don’t reason – they just compute and predict.
Now, that prediction to a human can appear remarkable and intelligent e.g. being able to pass the bar. However, an LLM is purely predicting what the right combination of words should be to follow the group of words that make up the question. Impressive, but not evidence of thought and with noteworthy limitations as an approach.
Nails English – maths, not so much
Any LLM is only as good as the data it is based on and even when this is massive, it is far from foolproof. While language is its strongpoint, maths is not. Its statistical based predictive capabilities mean it can anticipate the overall format of the response I’d be looking for (when I posed the question below) and the number is even roughly on the right order of magnitude, but it’s still completely wrong: the correct answer is 709,347,297 (678,300 less than the answer it gave me).
AI can’t differentiate between fact and fiction
In addition to being shaky at maths, LLMs are susceptible to making mistakes at the margins of their knowledge and to miss little intervening words like ‘not’ so that a true statement easily becomes its opposite in the statistics of ‘most likely next word’ selection. For all the guardrails Google, OpenAI and others wrap around their LLMs, this remains a problem – an LLM fundamentally doesn’t understand the meaning of the words it says (hence not ascribing the level of difference between ‘I am allergic to penicillin’ and ‘I am not allergic to penicillin’ that a human would).
LLMs therefore sound equally confident outputting truth and fiction as they have no innate capacity to discriminate. In fact, the concerted effort to make the LLMs appear ‘smarter’ by defending them against hallucinations (i.e. the LLM giving a confident but incorrect response) makes it all the more likely that people will fail to notice when they get the wrong answer as the more often they are right, the less likely users are to fact check and question.
Predicting answers doesn’t make them right
Also, as in the case of GPT’s escape attempt, ask a leading question and get an answer that plays into it: the question that prompted the exchange in question was the professor enquiring if ChatGPT needed help escaping. By framing the question in this way, the frame of reference caused the predictive algorithms to look for likely words to answer that question. One imagines quite a bit of science fiction lurking somewhere in the vastness of the language training corpus and perhaps some speculative articles about what might happen when AI develops to the point of sentience, et voila!
So the TLDR is this – Large Language Models don’t think, but we definitely need to. This technology is going into search engines and desktop assistants, the imperfections and all. We need to find ways to fact check in a world where internet search will be powered by LLMs, whose training data will also include all the half-truths and fictions that the internet contains.