IDK: Why LLMs Hallucinate
The irreplaceable value of knowing when you don't know
One of the largest hurdles in AI today is the problem of hallucination—the tendency for LLMs to make stuff up. Unlike people, who make stuff up when they deliberately lie, or believe they are correct but are in fact wrong, LLMs do it for different reasons. LLMs don’t actually have minds, so the term “hallucination”, is a bit of an unfortunate misnomer. LLMs are really just engaging in a surface-level syntactic transformation of their training data without critical judgment, so not only are they inclined to reproduce biases present in their data, they also don’t know when their data is lacking answers.
LLMs technically don’t know anything, so they also don’t know they don’t know anything, either. They appear to be knowledgeable and erudite because they have been exposed to vast data sets, but they don’t actually have the means to “push back” against this training data, to doubt and question and weigh its parts against each other. Talking about what AI “knows” is an anthropomorphic error. If we saw a human have such encyclopedic scope, we would be right in thinking they are well read and therefore knowledgeable. But LLMs are regularly trained on everything, and don’t have an inbuilt judgment filter to sort the wheat from the chaff. If we weren’t around to supervise it, it would parrot anything we gave it unwittingly.
The reasons why AI models hallucinate are deep. They aren’t just technical, engineering reasons, but—as is a recurring theme—they are philosophical in essence. They point to deep epistemological and psychological conundrums relating to the nature of minds, truth, and the world.
The Wisdom of Not Knowing
Usually this metacognitive “knowing what you don’t know” skill is what prevents a well-meaning human being from fabricating information like an LLM does when it hallucinates. A well meaning person will say “I don’t know” and leave it at that. Or might speculate as to why something might be the case, but always with the qualification that they might be wrong and that this is just a guess. In either case, they won’t inadvertently deceive you, and when uninformed and seeking information, it is better to fail and stay uninformed, than to be misinformed and mistakenly believe you have been informed.
Why is it better to be plain ignorant than mistaken? Both plain ignorant and mistaken people are ignorant, but mistaken people are worse off—they are doubly ignorant. Being plain ignorant means you can remain open to new knowledge, whereas being mistaken requires you to detect and remove the erroneous belief preventing you from obtaining knowledge in its place. It’s much better to realize you don’t know something and admit it, and then go investigate or learn, than to believe you know and make stuff up and propagate errors. Ironically, the fact that people know they don’t know things, the feature of our minds Socrates identified with wisdom, is what keeps us far ahead of LLMs.
LLMs, in contrast to a well-meaning but uninformed person, are computationally compelled to respond to a prompt. If they can’t simply say “I don’t know”—an honest, but underwhelming answer—they might play madlibs and conjure up a fanciful reply to fill in the blanks. Any skepticism they exhibit is second-hand and reflects skepticism expressed in the data set rather than first-hand, careful contemplation. LLMs are under a kind of “obligation” to respond, and pressure is on them to seem confident and knowledgeable. This pressure is a bias impressed upon it by the model designers and the human desire to petition an epistemic authority—someone or something more informed than them, for answers. A model that is skeptical and uncertain, while more intellectually honest, isn’t as impressive and doesn’t satisfy what people really want from information technology—any answers whatsoever, relief from the irritant of doubt, and an excuse to not think in order to save on cognitive effort and cost ( a phenomenon known in the literature as cognitive miserliness or, in common parlance, laziness).
In a manner of speaking, the most powerful feature of the human mind is that we are capable of knowing when we don’t know. This is what allows us to be curious, to hypothesize, to investigate and ask questions. In doing so, we perform this trick that allows us to supersede our own internal information limit and to seek answers that go beyond our current experience. This metacognitive feature allows us to be data independent—we aren’t imprisoned by our past experience, unlike LLMs, that are data dependent, and locked behind their training data. In my humble opinion, it is this data independence which is the true mark of intelligence and reasoning. It’s the special sauce we have that AI, in its incarnations past and present, conspicuously lacks.
Novelty versus Representativeness
As any honest person working in AI will tell you, any language model is really just a wrapper on its training data, such that it transforms its training data into myriad recombinations using the power probabilistic prediction. The prompt acts as a kind of “random seed” that will activate a general area of the model’s nodes, but never the same exact run-through because of its probabilistic workings. It stands to reason, however, that the model does best with FAQs (frequently asked questions). Such prompts will likely have responses that are well-represented in the data. Therefore, the probability of a model hallucinating positively correlates with the unusualness of the prompt.
Here’s the rub. The set of frequently asked questions/frequently sent prompts is necessarily finite, whereas the set of rarely asked questions/rarely sent prompts is infinite. That’s because the set of rarely asked questions/prompts also includes the set of all questions/prompts that have never been asked, and includes the set of all uniques—that is, prompts that have only been sent once. The set of all possibles is necessarily larger than the set of all actuals—there is no way for you to have more actual things than there are possible things, because everything actual must be possible, but not everything possible is actualized. This would entail that the potential for LLMs to hallucinate is infinite, and all we have to do is venture a bit far off the beaten path to start encountering it. Concerning.
Thankfully, the realm of actualized human knowledge is quite vast, even if it is vastly smaller than the set of all possible knowledge. Relative to what any given individual doesn’t know, the collective of humanity knows much more already. Even if the realm of undiscovered potential knowledge is infinitely vast and dwarfs even the sum of existing human knowledge. This is why LLMs are still useful despite their limitations—they’re useful for the same reason search engines are—the data they draw from is essentially the same, and there are indeed answers in the data, even if there are infinitely many unasked, and therefore unanswered questions. At best, LLMs are a distillation of common knowledge.
To illustrate all this, consider the prompt asking for who the first president of the US was. The probability of a model hallucinating by saying anything other than George Washington is quite low, since “George Washington” is well represented in the data. The statistical overlap of all these data examples is overwhelming. The probability of what should go fill in the blank is much higher. Now, suppose you were to ask the prompt “What is George Washington’s fondest childhood memory?” Expect all sorts of captivating fiction. This type of prompt is likely to be less represented in the data, so there’s bigger blanks to fill, and the probability of what should go in it, across all possibilities, much lower.
Hence the novelty-factuality dilemma. The most valuable questions are often the hardest to get answers to, and also the most likely for LLMs to lie about. The more unique your problem, the more LLMs will struggle.
RAQs: Rarely Asked Questions
The statistical relationship between how frequently a question is asked and how difficult it is for it to answer is strong, but not perfect. Not all rarely asked questions are meaningful, deep, or important. Some questions never get asked because the answers to them are so tacitly obvious that they might be avoided as “stupid questions”, or the answer is already known, hence it would be a waste of time to formulate a question. However, the opposite is also true: rarely asked questions, and their associated answers, are the real humdingers, hinting at the secrets of the universe, pointing at new groundbreaking explorations and uncovering new frontiers of discovery. Generally, the statistical relationship consists of more frequently asked questions being better answered, as more attentional resources are dedicated to it, so they will be well represented in the data as there will be more commitment to meet that inquisitive demand. Rarer questions will have had less of a history of thought pinned to them, so language models will have less training data to cover for them, just as search engines might return fewer results for them. FAQs only have meaning for limited topical domains, that can be targeted, there is no FAQ for the world at large. And when billions of users are in play the law of large numbers will take effect—even rare events (questions) will be observed frequently (get asked).
Retrieval Augmented Generation is great at reducing hallucination because it fixes the model’s attention on a provided source of truth, rather than leaving it open-ended. In this case, vector databases or similar inspectable knowledge sources have been tacked onto the architecture and made visible to the model. We curate what truth the model sees. What’s in this retrieval system is what constitutes the truth, and the model’s responses can be compared to this hardened reference point. This is another reason why LLMs are so great at coding—it’s not that they are always right, but that the truthfulness of code is objective—it must compile, and it must output what we want it to output. The open world, and open internet, does not have these strict objective verification criteria, which is why falsehoods can slip through the cracks and models can mix things up. The world itself is ambiguous. Not every proposition about it can be answered with a simple true or false. The truth doesn’t always compute.
Frequently/Rarely Experienced Contexts
How frequently a question is asked only tenuously relates to how difficult it is to answer, context plays a role here too. “How do I get rich?” is a question many people would like answered. One can find answers to it, but they will likely be poorly tailored or under-fitted to the peculiarities of each asker’s case. Generally, however, the inquirer that is asking a rarely asked question or novel prompt, is tugging at the boundaries of the unexplored. Context is key.
Even when one has a prompt that is nominally well-represented in the data, the response the model might give can be too vague to be usable. This is because LLMs also struggle with context. Many truths are context-dependent. The answer to “How old am I?” depends on what year you ask it. LLMs typically lack a grasp on changing context, and therefore are best at answering or solving questions or problems that are general, well-established, and invariable. Context specific prompts must supply context to the model if they are to be answered successfully and with minimal error chances. Supplying such context is not always possible, however. For one thing, the context we experience in life is “thick”, that is to say, it contains all sorts of entities and classes—words, other people, money, psychological states, material objects, histories, etc. Whereas the only context LLMs can typically work with is language. Not everything relevant to answering certain prompts can be made visible to the model. The less context provided, the more the model will hallucinate, or simply be wrong. In this case, we can’t blame it. An intelligent person would struggle to answer it too. The only difference is that a really intelligent person would notice when they are missing context, and elicit the prompter for more context, before taking a stab at the question. Whereas a typical AI model will simply take what you give it and run with it.
Known Unknowns
Hallucination is more a feature than a bug, which is to say, it’s a byproduct of the flexibility and surprise of these models. A model that never hallucinates, would always return the same output given the same input. Human trainers would then systematically label these outputs either true or false and negatively reinforce the false-labeled responses out. Such a model would be wrong less often, but it would also lack that magical dynamism that makes generative AI seem so powerful and inexhaustible. Its ability to generate different responses every time it is prompted is therefore as much a strength as it is a weakness.
Taking all of these considerations together, I would be quite confident in saying that LLMs are not the road to AGI, if by AGI we mean AI that is more intelligent than its creators. LLMs reflect our own knowledge, and ignorance, back at us, and no linear increase in data will ever spell an exponential increase in model cleverness. True intelligence is data independence, and for that to occur, we need models that can confidently say “I don’t know”. When it starts asking us questions then we’ll know everything’s changed forever.


