Why Your Smartest Model Might Be Your Most Unreliable One
Threading the needle between intellectual creativity and hallucination
OpenAI recently released a new crop of models, and so begins another upswing in the hype cycle. Some are touting o3’s “near genius” level of performance, while others are pointing the stubborn persistence of hallucinations, a problem which, by OpenAI’s own admission has actually worsened. Despite the raw computational power of o3, like all transformer models of the same type, it lacks an internal concept of truth. Some of the more meaningful signs of progress I’ve noticed are actually coming from the smaller, more modest models, such as 4.1 and 4o-mini. This is a trait that you might call epistemic caution, and it’s the true path forward to make these AI systems more reliable.
Epistemic caution is commonly exhibited by scientists and other careful reasoners who, through their training, have some practiced doubt that they might be right, or that they might not have all the information they need to proceed and more is still out there, prompting further investigation. Thus, instead of just blurting out overconfident assertions, they hedge their bets a bit, pause to elicit more information, or package their claims as hypotheses and best guesses. While tinkering with 4.1 in my coding projects, I noticed that rather than rushing ahead, it would often pause to ask me to confirm something on my end, and seemed to implicitly understand that there was more to the story than currently presented to it. It simulates “knowing what it doesn’t know” to a better degree, and this, I argue, is the only true solution to hallucination.
While others have offered what is (to me) an unsatisfying technical explanation for why o3 hallucinates more (with one researcher at Transluce blaming some details about reinforcement learning) , I think the real explanation is deeper and has to do with the fundamental inability of these models to learn from experience.
The Knowledge Argument
Mary is a brilliant neurophysiologist who studies color perception. She lives in a black and white room, and has never experienced color. Mary has spent many years perfecting her knowledge of the biophysical mechanics of color perception, yet some philosophers argue she still has less knowledge of color than someone who has actually experienced the perception of color, and Mary would learn something new, and altogether distinct, if and when she were to see color for the first time. No amount of studying can amount to an experiential understanding of color, the argument goes.
This philosophical thought experiment, known as the Knowledge Argument, goes a long way toward explaining the stubborn truth issues of language models. No matter how much we scale them, they remain much like Mary. They are a lot of like naive bookworms who have studied whole libraries worth of books, but lack any real world experience. They’ve obtained all this symbolic knowledge, but it is purely formal—it doesn’t “stick” to anything “out there” in the real world. At best, these models can weigh the consistency of statements relative to each other, but they never map statements “in correspondence” to experience. It can learn that “up” means the opposite of “down”, but it can’t look up or look down.
Because these models lack an internal concept of truth for fundamental, constitutive, dare I say metaphysical reasons, my money is on epistemically cautious models. Their more scientific state of mind presents a winning formula for compensating for a fundamental lack of an understanding of truth.
Epistemically cautious models short-circuit hallucination by orienting themselves toward external sources of truth and by providing their best guesses, which is more honest and more useful than models that act like brash know-it-alls.
Don’t get me wrong, o3 appears to be a substantial model. Its integration of tool use into its “reasoning” process could unlock a whole new suite of capabilities. However, it’s interesting that it evidently hallucinates more than its less advanced predecessors. What if this is a direct consequence of its enhanced capabilities?
A few hypotheses present themselves:
1. The combination of capabilities such as tool use in the reasoning process opens up the space of generative potentials the model can work with. More moving parts means more room for confabulation
2. In pushing the model to be intellectually bolder and more assertive—which is what really impresses when it gets things right—OpenAI also inadvertently increased its bullshitting powers
3. The absence of epistemic caution is attributable to the difficulty of training in these finer reasoning skills in more advanced, complex models
It’s very hard to be both factual and inventive. Novelists have it easier than scientists in some respects, because they can just make stuff up. Truth is measured only in narrative consistency— a consistency they dictate by ensuring a cohesive plot. Creative scientists of the genius variety have to somehow both make stuff up (in the sense of proposing novel ideas) and remain factual-a feat that requires a good deal of epistemic caution, second-guessing, fact checking, and self-correction. An intellectually creative model, an AI genius, has to balance on a knife edge between factual creativity and hallucination.
After all, the history of science is littered with many ingenious, albeit ultimately discredited theories, the phlogiston theory of fire, the luminiferous aether, Ptolemaic Geocentrism. The only difference between these and AI hallucinations is that they were ultimately tested against an external reality and later disproven.
Of course there’s always the chance I’m wrong (gotta be epistemically cautious, after all). Maybe o3 just hallucinates more because of a technical bug that will be found and fixed. Or there could be deeper, more fundamental reasons why the connection between generative intelligence and novelty without scientific discipline tends to produce seemingly plausible but indeed false explanations.
I suspect this perspective is not what the model proprietors want you to hear. They want you to be ooh’d and ahh’d by flashy displays of synthetic insight. In highly structured problem solving environments where the right boundaries have been set up, o3 may indeed impress. But for vague real-world, open-ended problem solving savviness, caution is advised.