Claude's Cycles: Claude Opus 4.6 solves a problem posed by Don Knuth [pdf]

(www-cs-faculty.stanford.edu)

99 points by fs1234 hours ago

5 comments

Pat441132 minutes ago
I asked Claude to solve the pentominoes puzzle made famous by Arthur C. Clarke. It struggled mightily until I told it how I'd solved the problem using 64 bit unsigned integers to represent the board and pieces. Then, it created a C# program that solved the problem very quickly. However, in the 20x3 case it found four solutions when there are only two. Turns out it had incorrectly mapped one of the pentominoes. Sort of a silly mistake; the sort a human might make.
mccoyb1 hour ago
It's fascinating to think about the space of problems which are amenable to RL scaling of these probability distributions.Before, we didn't have a fast (we had to rely on human cognition) way to try problems - even if the techniques and workflows were known by someone. Now, we've baked these patterns into probability distributions - anyone can access them with the correct "summoning spell". Experts will naturally use these systems more productively, because they know how to coerce models into the correct conditional distributions which light up the right techniques.One question this raises to me is how these models are going to keep up with the expanding boundary of science. If RL is required to get expert behavior into the models, what happens when experts start pushing the boundary faster? In 2030, how is Anthropic going to keep Claude "up-to-date" without either (a) continual learning with a fixed model (expanding context windows? seems hard) or (b) continual training (expensive)?Crazy times.
- Aerroon1 hour ago
 A bit related: open weights models are basically time capsules. These models have a knowledge cut off point and essentially forever live in that time.
 - bitexploder28 minutes ago
 This is the most fundamental argument that they are not, directly, an intelligence. They are not ever storing new information on a meaningful timescale. However, if you viewed them on some really large macro time scale where now LLMs are injecting information into the universe and the re-ingesting that maybe in some very philosophical way they are a /very/ slow oscillating intelligence right now. And as we narrow that gap (maybe with a totally new non-LLM paradigm) perhaps that is ultimately what gen AI becomes. Or some new insight that lets the models update themselves in some fundamental way without the insanely expensive training costs they have now.
 - mlyle1 minute ago
 There's nothing to say that you can't build something intelligent out of them by bolting a memory on it, though.Sure, it's not how we work, but I can imagine a system where the LLM does a lot of heavy lifting and allows more expensive, smaller networks that train during inference and RAG systems to learn how to do new things and keep persistent state and plan.
 - anematode17 minutes ago
 But they're not "slow"! Unlike biological thinking, which has a speed limit, you can accelerate these chains of thought by orders of magnitude.
- lxgr1 hour ago
 Data sharing agreements permitting, today's inference runs can be tomorrow's training data. Presumably the models are good enough at labeling promising chains of thought already.I could totally imagine "free" inference for researchers under the condition that the reasoning traces get to be used as future training data.
 - mccoyb1 hour ago
 Agreed, there's no doubt this will happen. It's likely already happening (it feels safe to assume that Anthropic is curating data from the data they record from Claude Code?)As far as I understand RL scaling (we've already maxxed out RLVR), these machines only get better as long as they have expert reasoner traces available.Having an expert work with an LLM and successfully solve a problem is high signal data, it may be the only path forward?My prior is that these companies will take this data without asking you as much as they can.
 - lxgr10 minutes ago
 Yup, or functionally equivalently, asking you in paragraph 37 of a 120-page PDF (bonus points: in an agreement update).And importantly, this can be cross-lab/model too. I suspect there's a reason why e.g. Google has been offering me free Claude inference in Google Antigravity on a free plan...
- DeathArrow46 minutes ago
 They can use LORA.
ecshafer41 minutes ago
I wonder how long we have until we start solving some truly hard problems with AI. How long until we throw AI at "connect general relativity and quantum physics", give the AI 6 months and a few data centers, and have it pop out a solution?
- rustyhancock13 minutes ago
 I think a very long time because part of our limit is experiment.We need enough experimental results to explain to solve these theoretical mismatches and we don't and at present can't explore that frontier.Once we have more results at that frontier we'd build a theory out from there that has two nearly independent limits for QFT and GR.What we'd be asking if the AI is something that we can't expect a human to solve even with a lifetime of effort today.It'll take something in par with Newton realising that the heavens and apples are under the same rules to do it. But at least Newton got to hold the apple and only had to imagine he could a star.
- worldsavior14 minutes ago
 If AGI will ever come, then. Currently, AI is only a statistical machines, and solutions like this are purely based on distribution and no logic/actual intelligence.
 - rustyhancock10 minutes ago
 I don't even think that's the issue.The issue to my mind is a lack of data at the meeting of QFT/GR.Afterall few humans historically have been capable of the initial true leap between ontologies. But humans are pretty smart so we can't say that is a requirement for AGI.
ainiriand1 hour ago
Are not LLMs supposed to just find the most probable word that follows next like many people here have touted? How this can be explained under that pretense? Is this way of problem solving 'thinking'?
- qsera5 minutes ago
 Yes, that is exactly what they do.But that does not mean that the results cannot be dramatic. Just like stacking pixels can result in a beautiful image.
- dilap38 minutes ago
 That description is really only fair for base models†. Something like Opus 4.6 has all kinds of other training on top of that which teach it behaviors beyond "predict most probable token," like problem-solving and being a good chatbot.(†And even then is kind of overly-dismissive and underspecified. The "most probable word" is defined over some training data set. So imagine if you train on e.g. mathematicians solving problems... To do a good job at predicting [w/o overfitting] your model will have to in fact get good at thinking like a mathematician. In general "to be able to predict what is likely to happen next" is probably one pretty good definition of intelligence.)
 - gpm4 minutes ago
 I'd disagree, the other training on top doesn't alter the fundamental nature of the model that it's predicting the probabilities of the next token (and then there's a sampling step which can roughly be described as picking the most probable one).It just changes the probability distribution that it is approximating.To the extent that thinking is making a series of deductions from prior facts, it seems to me that thinking can be reduced to "pick the next most probable token from the correct probability distribution"...
 - ericd14 minutes ago
 I think it's pretty likely that "intelligence" is emergent behavior that comes when you predict what comes next in physical reality well enough, at varying timescales. Your brain has to build all sorts of world model abstractions to do that over any significant timescale. Big LLMs have to build internal world models, too, to do well at their task.
- IgorPartola1 hour ago
 In some cases solving a problem is about restating the problem in a way that opens up a new path forward. “Why do planets move around the sun?” vs “What kind of force exists in the world that makes planets tethered to the sun with no visible leash?” (Obviously very simplified but I hope you can see what I am saying.) Given that a human is there to ask the right questions it isn’t just an LLM.Further, some solutions are like running a maze. If you know all the wrong turns/next words to say and can just brute force the right ones you might find a solution like a mouse running through the maze not seeing the whole picture.Whether this is thinking is more philosophical. To me this demonstrates more that we are closer to bio computers than an LLM is to having some sort of divine soul.
 - ainiriand1 hour ago
 Thanks for your input. The way I saw this and how it looks Knuth interpreted it is that there were some reasoning steps taken by Claude independently. Some internal decisions in the model that made it try different things, finally succeeding.
- tux346 minutes ago
 >Are not LLMs supposed to just find the most probable word that follows next like many people here have touted?The base models are trained to do this. If a web page contains a problem, and then the word "Answer: ", it is statistically very likely that what follows on that web page is an answer. If the base model wants to be good at predicting text, at some point learning the answer to common question becomes a good strategy, so that it can complete text that contains these.NN training tries to push models to generalize instead of memorizing the training set, so this creates an incentive for the model to learn a computation pattern that can answer many questions, instead of just memorizing. Whether they actually generalize in practice... it depends. Sometimes you still get copy-pasted input that was clearly pulled verbatim from the training set.But that's only base models. The actual production LLMs you chat with don't predict the most probable word according to the raw statistical distribution. They output the words that RLHF has rewarded them to output, which includes acting as an assistant that answers questions instead of just predicting text. RLHF is also the reason there are so many AI SIGNS [1] like "you're absolutely right" and way more use of the word "delve" than is common in western English.[1]: <a href="https://en.wikipedia.org/wiki/WP:AISIGNS" rel="nofollow">https://en.wikipedia.org/wiki/WP:AISIGNS</a>
- wrsh0720 minutes ago
 Imagine training a chess bot to predict a valid sequence of moves or valid game using the standard algebraic notation for chessGreat! It will now correctly structure chess games, but we've created no incentive for it to create a game where white wins or to make the next move be "good"Ok, so now you change the objective. Now let's say "we don't just want valid games, we want you to predict the next move that will help that color win"And we train towards that objective and it starts picking better moves (note: the moves are still valid)You might imagine more sophisticated ways to optimize picking good moves. You continue adjusting the objective function, you might train a pool of models all based off of the initial model and each of them gets a slightly different curriculum and then you have a tournament and pick the winningest model. Great!Now you might have a skilled chess-playing-model.It is no longer correct to say it just finds a valid chess program, because the objective function changed several times throughout this process.This is exactly how you should think about LLMs except the ways the objective function has changed are significantly significantly more complicated than for our chess bot.So to answer your first question: no, that is not what they do. That is a deep over simplification that was accurate for the first two generations of the models and sort of accurate for the "pretraining" step of modern llms (except not even that accurate, because pretraining does instill other objectives. Almost like swapping our first step "predict valid chess moves" with "predict stockfish outputs")
- crocowhile12 minutes ago
 Those people still exist? I only know one guy who is still fighting those windmills
 - qsera8 minutes ago
 Yes, I am one.
- esafak45 minutes ago
 Are you feigning ignorance? The best way to answer a question, like completing a sentence, is through reasoning; an emergent behavior in complex models.
miroljub1 hour ago
Solves? It's a part of the training set. Nothing more, nothing less.
- rpdillon59 minutes ago
 Opening sentences:> Shock! Shock! I learned yesterday that an open problem I’d been working on for several weeks had just been solved by Claude Opus 4.6— Anthropic’s hybrid reasoning model that had been released three weeks earlier! It seems that I’ll have to revise my opinions about “generative AI” one of these days. What a joy it is to learn not only that my conjecture has a nice solution but also to celebrate this dramatic advance in automatic deduction and creative problem solving.
- jcims40 minutes ago
 Prove it.
- mwigdahl1 hour ago
 Did you read the article? It was an open problem.
 - bluGill49 minutes ago
 Was it? It was an open problem to Knuth - who generally knows how to search literature. However there is enough literature to search that it wouldn't be a surprise at all to discover it was already solved but he just used slightly different terms and so didn't find it. Or maybe it was sovled because this is a specialization of something that looks unrelated and so he wouldn't have realized it when he read it. Or...Overall I'm going with unsolved, because Knuth is a smart person who I'd expect to not miss the above. I'm also sure he falls for the above all the time even though the majority of the time he doesn't.
 - mwigdahl28 minutes ago
 Agreed with all of that, but with the added point that Knuth has done a lot of work in this exact area in The Art of Computer Programming Volume 4. If he considers this conjecture open given his particular knowledge of the field, it likely is (although agreed, it's not guaranteed).