Perhaps one of the most entrenched beliefs in AI is that it needs huge amounts of data. If you want to think about it there is no necessary reason for this big data sentiment. When children learn, they don’t need exposure to millions of examples of arithmetic to figure out how to add. Teach a bright kid that 1+1=2, and 2+1=3, and 3+2=5, and soon they will realize you can add any numbers in any order. They learn through composition and construction, by internalizing the rules of the domain rather than every example of the domain. This observation alone is strong evidence that huge amounts of data is not intrinsic to the concept of intelligence generally construed.
When All You Have Is Data, Everything Looks Like A Statistical Model
Indeed, huge sample size is a requirement for certain statistical mathematical models that have dominated the field of AI for decades. While these statistical approaches are powerful, they suffer from critical limitations. While they excel at within-distribution tasks (recalling a solution that they have already memorized or a close enough approximation of it) and interpolation (figuring out something that is halfway between two solutions it has memorized) they struggle to generalize to new problems that were not represented in the training data. For whatever reasons, the industry has blindly insisted that the solution to these generalization issues is simply more data, with a somewhat mystical and unscientific expectation that once the models reach a sufficient scale they will generalize robustly. That hasn’t happened.
Recent research suggests that indeed human-supplied, labeled data might not be as central to AI as one thought. A new paper introduced Absolute Zero, a language model that exhibits competitive performance on math and coding tasks — with no human supplied labels or supervision. Compellingly, this model is particularly good at out-of-distribution tasks, that is, tasks it was not exposed to during training, which means it may have enhanced generalization capacities.
How is this possible? While we will dig into some of the details and caveats later on, what the researchers did in essence is almost so obvious in retrospect that you almost want to smack your forehead and say “why didn’t I think of that?”
Basically what the researchers realized is that like a person, AI can learn by doing: they can act on an environment, on a problem, get feedback from the environment, evaluate the efficacy of the attempted solution, and revise their efforts until the problem is solved. It’s not quite true to say the model is trained on zero data, the data it trains itself on is generated through its own evaluations of its proposed and enacted solution attempts.
In effect, they trained the model on problems, and the model learns from evaluating its own attempted solutions relative to each other. The reason the model is so effective at math and coding problems in particular is that kinds of problems provide objective criteria of correctness — either both sides of the equals sign are exactly the same quantity expressed differently, or they aren’t. Either the code compiles, or it doesn’t. Thus they provide the right sort of definite environmental feedback for this technique to work.
So rather than being trained on “zero” data, the model trained itself through self-generated data that arose out of its own problem-solving dynamics, or what the researchers call “self-play.”
If you’re having a hard time wrapping your head around what’s happening here, think of it in terms of how a child might learn how to build with toy blocks. At first, the child might attempt to stack the blocks in all kinds of ways. Sometimes, the stack topples over. Other times, it stands firm. The child’s own experience is sufficient training data. Once the child learns that they can stack the blocks in a certain way, they can evaluate their own experience and reuse that piece of knowledge in the future, and they can tell that this method is more effective than a previous attempt that failed. They don’t need a supervisor to tell them how to stack the blocks or to give them the answers.
It all makes so much sense.
The strongest learners tend to be self-directed: they learn not by absorbing the most information, but by formulating the best explanation from available information — or any information.
Indeed, the researchers found that the model performs best using abductive reasoning — also known as “inference from the best explanation,” and saw comparatively little benefit from deductive (reasoning from first principles) and inductive (reasoning from past examples) — compared to more traditional models. Abductive reasoning is similar to analogical reasoning, in that it signifies a kind of world-modeling or mapping of concepts from one domain to another.
Here is a brief definition of the three types of reasoning and their meaning in the context of the paper.
•Deduction: Predicting the output from a program and input.
•Abduction: Inferring the input from a program and output.
•Induction: Generating the program from a set of input/output pairs.
(In the paper, the researchers found the model benefits most from abductive reasoning. This contrasts with inductive reasoning, which requires problem and solutions to be paired as training data in the hopes that the model will create an association and generalize beyond the examples shown.)
It’s no surprise that Absolute Zero performs better at generalization, since it thinks for itself. The data-driven models are essentially acing tests by memorizing the answers. Absolute Zero is arriving at the answers by working through the problems itself. Presumably, this teaches the model to approach problems from a more principled direction.
Questions and Forecasts
So what is there to make of these findings? It’s not that we’ve been lied to this whole time. Data is still important. It’s what grounds models in an external source of truth. But the clues suggest simply more data is not the solution.
I believe there is a real chance that older generation models using data-driven techniques could be *leap-frogged* by these more independently-minded models that utilize this self-play learning mechanism. DeepSeek’s innovative use of reinforcement learning is one within the same school of thought as Absolute Zero. So it may be the case that on the national arena, China, with its openness to new ideas and less settled dogmas about how to approach AI could pull ahead of the US which insists more often than not that the only way to improve AI is to burn heaps of cash.
One of my own criteria for general intelligence is data independence, if all an AI is doing is interpolating from data, it’s fundamentally limited. Absolute Zero suggests that there may be a way beyond this constraint. A model that can bootstrap itself by generating its own data by evaluating its solution attempts could, in theory, go from zero to infinity.
Some questions remain. Are models of this type going to require more compute to compensate for the lack of data? (Obviously, a hybrid approach with lots of data might be worth looking into as well.) What are the dangers with misalignment for models like this? (The researchers noted some concerning “rebellious” thinking in some of the tests.)
Is it possible for such a model to “miseducate” itself? Is it exposed to new epistemic risks by being more self-reliant?
More profoundly, where would this kind of model go if it was given endless time and compute to keep training itself and was scaled to huge extents?
What will happen to the data broker economy when people realize you actually don’t need that much to arrive at synthetic intelligence?
It’s notable that Absolute Zero is also another transformer model, and so is still, technically, another statistical model. Whether it exhibits true compositional intelligence is debatable. It may just be feeding itself more robust and generalizable statistical patterns rather than learning to combine the underlying rules that describe the logic of a problem space.
How well can this self-play technique generalize beyond well-defined problem domains of math and logic? Real world problem solving is more nuanced and vague than these constructions, and clear feedback isn’t always promised.
Most of the world’s most interesting and pressing problems, indeed, are often vague in this way. Most of life’s mysteries aren’t presented as well-formulated puzzles. Maybe Absolute Zero will still inherit the same limitation as the other models—a lack of true world experience and no understanding of how to navigate uncertainty and overcome novel challenges devoid of immediate, clear feedback.