Are not LLMs supposed to just find the most probable word that follows next like many people here have touted? How this can be explained under that pretense? Is this way of problem solving 'thinking'?
Yes, that is exactly what they do.<p>But that does not mean that the results cannot be dramatic. Just like stacking pixels can result in a beautiful image.
That description is really only fair for base models†. Something like Opus 4.6 has all kinds of other training on top of that which teach it behaviors beyond "predict most probable token," like problem-solving and being a good chatbot.<p>(†And even then is kind of overly-dismissive and underspecified. The "most probable word" is defined over some training data set. So imagine if you train on e.g. mathematicians solving problems... To do a good job at predicting [w/o overfitting] your model will have to in fact get good at thinking like a mathematician. In general "to be able to predict what is likely to happen next" is probably one pretty good definition of intelligence.)
I'd disagree, the other training on top doesn't alter the fundamental nature of the model that it's predicting the probabilities of the next token (and then there's a sampling step which can roughly be described as picking the most probable one).<p>It just changes the probability distribution that it is approximating.<p>To the extent that thinking is making a series of deductions from prior facts, it seems to me that thinking can be reduced to "pick the next most probable token from the correct probability distribution"...
I think it's pretty likely that "intelligence" is emergent behavior that comes when you predict what comes next in physical reality well enough, at varying timescales. Your brain has to build all sorts of world model abstractions to do that over any significant timescale. Big LLMs have to build internal world models, too, to do well at their task.
In some cases solving a problem is about restating the problem in a way that opens up a new path forward. “Why do planets move around the sun?” vs “What kind of force exists in the world that makes planets tethered to the sun with no visible leash?” (Obviously very simplified but I hope you can see what I am saying.) Given that a human is there to ask the right questions it isn’t <i>just</i> an LLM.<p>Further, some solutions are like running a maze. If you know all the wrong turns/next words to say and can just brute force the right ones you might find a solution like a mouse running through the maze not seeing the whole picture.<p>Whether this is thinking is more philosophical. To me this demonstrates more that we are closer to bio computers than an LLM is to having some sort of divine soul.
>Are not LLMs supposed to just find the most probable word that follows next like many people here have touted?<p>The base models are trained to do this. If a web page contains a problem, and then the word "Answer: ", it is statistically very likely that what follows on that web page is an answer. If the base model wants to be good at predicting text, at some point learning the answer to common question becomes a good strategy, so that it can complete text that contains these.<p>NN training tries to push models to generalize instead of memorizing the training set, so this creates an incentive for the model to learn a computation pattern that can answer many questions, instead of just memorizing. Whether they actually generalize in practice... it depends. Sometimes you still get copy-pasted input that was clearly pulled verbatim from the training set.<p>But that's only base models. The actual production LLMs you chat with don't predict the most probable word according to the raw statistical distribution. They output the words that RLHF has rewarded them to output, which includes acting as an assistant that answers questions instead of just predicting text. RLHF is also the reason there are so many AI SIGNS [1] like "you're absolutely right" and way more use of the word "delve" than is common in western English.<p>[1]: <a href="https://en.wikipedia.org/wiki/WP:AISIGNS" rel="nofollow">https://en.wikipedia.org/wiki/WP:AISIGNS</a>
Imagine training a chess bot to predict a valid sequence of moves or valid game using the standard algebraic notation for chess<p>Great! It will now correctly structure chess games, but we've created no incentive for it to create a game where white wins or to make the next move be "good"<p>Ok, so now you change the objective. Now let's say "we don't just want valid games, we want you to predict the next move that will help that color win"<p>And we train towards that objective and it starts picking better moves (note: the moves are still valid)<p>You might imagine more sophisticated ways to optimize picking good moves. You continue adjusting the objective function, you might train a pool of models all based off of the initial model and each of them gets a slightly different curriculum and then you have a tournament and pick the winningest model. Great!<p>Now you might have a skilled chess-playing-model.<p>It is no longer correct to say it just finds a valid chess program, because the objective function changed several times throughout this process.<p>This is exactly how you should think about LLMs except the ways the objective function has changed are significantly significantly more complicated than for our chess bot.<p>So to answer your first question: no, that is not what they do. That is a deep over simplification that was accurate for the first two generations of the models and sort of accurate for the "pretraining" step of modern llms (except not even that accurate, because pretraining does instill other objectives. Almost like swapping our first step "predict valid chess moves" with "predict stockfish outputs")
Those people still exist? I only know one guy who is still fighting those windmills
Are you feigning ignorance? The best way to answer a question, like completing a sentence, is through reasoning; an emergent behavior in complex models.