GPTo1-preview: First Impressions
OpenAI just debuted its latest language model GPT-o1 preview build, which they tout operates at the PhD level and excels particularly in areas where logical reasoning rate highly in importance such as STEM fields. Of course I had to see how it stacked up. So let’s break down how it performs. Here are my first impressions. I will share more as my experience with the new model increases.
A Noticeable Step Forward
Since OpenAI is rate limiting Plus users to 30 calls per week at the moment, I wanted to call my shots carefully by giving it particularly complex tasks from the jump. One of the first things I did was feed it a distinctly abstract, avant-garde codebase I’ve been working on and asked it for its suggestions. I also fed it the extensive, essayistic ReadMe, so it had a lot to work with. At first it laid out a detailed, high-level, but not particularly impressive set of recommendations about how I could work best on the project. But I wanted to see what it could do with the code if given free rein. (link to conversation)
So I asked it to “take over the project and to describe an implementation plan for how it would overhaul it.” I wanted to see where it would go if I just “let it cook.” I did so, gently nudging it along until I had something like a consolidated refactor of the entire codebase.
I wanted to see it not just refine and elaborate on what I gave it, like the proverbial stochastic parrot of yesterday. I wanted to see it expound and build off the starter code without me. It’s important to not just check its reasoning abilities, but its autonomous reasoning abilities, as I would argue the ability to reason independently of the information given to you is how creative problem solving is achieved.
I noticed something special. After I handed over the code to my project and said “all yours,” it proceeded to thank me for “entrusting” it with my project and said I had provided a solid foundation from which to start. Then it went to town.
It’s clear that GPT-o1’s biggest innovation is its planning faculties. It can engage in a kind of self-structuring in which it imposes an order on itself and continues a coherent project from output to output with each step unfolding logically from the previous in accordance to the plan. Chain-of-thought has been a paradigm of artificial reasoning for a while now, but these planning capacities allow for a degree of coherent long-range focus that’s qualitatively new.
I found myself more or less saying “ah yes, go on, continue” as the model worked, and coming up with nothing to dispute. Not because, as had been typical for older models, it was reflecting my own thoughts back at me, but because I agreed with all of its unfamiliar proposals. To be fair, I was asking it to continue a project I had already worked with ChatGPT on, so it was impossible to untangle the points it drew from my own plans versus its independent conclusions. In any event, a notable boost in power was on display.
ChattyGPT
One of the most immediately striking features of GPT-o1 is its overwhelming verbosity. It went off to the races, unspooling elaborate plans and proposals that I wasn’t fully prepared to read through at the moment. At once, this talkativeness is both good and bad in different ways. Good because its suggestions were consistently sound and even insightful. Bad because it felt somehow bewitching to be bombed by so much text. My thoughts were drawn to the scenario where someone in a hurry is seduced by this seeming display of authentic intelligence and relinquishes control to it without even pausing to vet its long-winded exposition. (It actually got so bad that I saw for the first time a “train wreck” display glitch in the ChatGPT MacBook app which I had never seen before with paragraphs smooshed together). I can foresee, in other words, the temptation to think “ok this thing knows what it’s talking about, just rubber stamp everything it says.” The idea that people might be dazzled by such a display and uncritically proceed with whatever ensues is a little scary to me.
While the verbosity was a bit too much to take in all at once, as I picked over it I realized it was substantive and comprehensive through and through. After I told it to implement its designs for my codebase I recognized its refactors to be valid and widely in line with accepted best practices. Occasionally it made some suggestions out of the blue that struck me as genuinely ingenious.
Here’s where I must take pause. In nearly all my uses of LLMs until this moment, I expected them to more or less just refine or correct my provided starting point. The fact that I could just let GPT-o1 keep going with minimal guidance and see significant advancements being made toward my project roadmap is both exhilarating and alarming.
It wouldn’t provide a complete implementation in one go. But the parts of the codebase ChatGPT and I had already discussed and that I had outlined improvements for in the Roadmap section of the readme did get built out substantially. Some of the areas I was still hazy on sort of just stubbed out, so I don’t want to give the impression that this thing is moving beyond us just yet.
Novel Suggestions
This new model seems more willing to go out on a limb to suggest new additions to the prompted subject matter. The model is not operating at the level of a genius in the sense it is inventing new solutions whole cloth. But it is now capable of “raising the stakes” by adding new, complementary subjects to the discussion. This is the first time where AI has put me on the back foot—where I took a look at what it did and said “ok now I have my work cut out for me” or “ok this is genuinely the best way forward.” That is momentous.
For the most part, this is all wonderful. But we’re entering into murky territory. Increasingly we’re entering a world where AI can come across to intelligent people as a coequal peer (or can emulate intelligence so well it comes to the same thing). This parity poses a vexing decision problem.
Previously, when the outputs of AI models were comfortably subordinate to the prompter, there was never a feeling that you could be outdone—at least not if you were good at what you were asking for. Now I can confidently say ChatGPT is a better coder than me, even if it is not yet the better thinker or reasoner. All in all, it feels as though I am standing before something massive and imposing. All of which makes me wonder: even if we are not there yet what will we do if and when an AI genius emerges?
A Tipping Point?
I’d like to close out these remarks with a bit of philosophical speculation. The concept of superhuman intelligence—that is, intelligence that goes beyond even what the smartest human beings can muster—is not a possibility we are built to process. We’re so used to being the smartest things around, and so unfamiliar with being radically outclassed by beings of far superior intelligence, that how to factor in knowledge that comes from a being we can’t comprehend into our decision-making is baffling to us. There might be kinds of knowledge the human mind just wasn’t built to handle. We’re not really prepared, as a species, for the scenario in which a machine can come up with better ideas than we can.
Perhaps a good point of reference for coming to terms with the possibility of superhuman intelligence is to compare how ordinary people feel about geniuses. I feel this way about certain mathematicians who seem able to climb peaks of thought that are inaccessible to us mortals huffing and puffing in the lowlands and hills below. Take that sense of remoteness one feels in the presence of someone with extraordinary gifts and multiply it 2x, 5x, 10x, 100x, 1,000x. Can we even imagine extrapolating beyond maybe 2x or 5x our own intelligence limit? By definition, one should not effortlessly understand everything one who is even slightly more intelligent than you does, let alone a being that is 10x smarter. (I realize intelligence isn’t a simple less-than or greater-than quantity as this framing makes it seem but let’s not overthink it.)
Another question that has puzzled philosophers of AI is whether it’s possible for a constructive intelligence to build an artifact that is smarter than its creator. One argument is that no, they can’t, but they can build a system of lesser ability that nevertheless has an unrestricted capacity to learn. An unrestricted learning system, unbounded by constraints of the flesh, could arrive at the point where it exponentially outpaces its creators. Indeed, it could reach that point much faster than we might expect if it runs on faster hardware than our brains and we feed it the cream of the crop of our knowledge allowing it to skip thousands of years of trial and error.
Could we be at the point where the learning curves of human intelligence and artificial intelligence dovetail and begin to part? Time will tell. Right now the app is certainly smarter than some people. As someone who has studied the history of AI going back decades, I can confidently say we are at a point that was once only spoken of in hypotheticals.
It’s important to remember that we’re practically unbounded learning agents too. While there may be a theoretical limit to the amount of information the human brain can store, this number is probably ridiculously large and no one could reasonably reach it in their lifetime. Our distinctly human brand of intelligence, inflected as it is with emotional, social, and moral sentiments, will endure in value whatever may come. So long as we keep learning alongside the AI, I believe it will lead to prosperity.