Not to be a luddite, but large language models are fundamentally not meant for tasks of this nature. And listen to this:<p>> Most notably, it provides confidence levels in its findings, which Cheeseman emphasizes is crucial.<p>These 'confidence levels' are suspect. You can ask Claude today, "What is your confidence in __" and it will, unsurprisingly, give a 'confidence interval'. I'd like to better understand the system implemented by Cheeseman. Otherwise I find the whole thing, heh, cheesy!
I've spent the last ~9 months building a system that, amongst other things, uses a vLLM to classify and describe >40 million house images of number signs in all of Italy. I wish I was joking, but that aside.<p>When asked about their confidence, these things are almost entirely useless. If the Magic Disruption Box is incapabele of knowing whether or not it read "42/A" correctly, I'm not convinced it's gonna revolutionize science by doing autonomous research.
How exactly are we asking for the confidence level?<p>If you give the model the image <i>and</i> a prior prediction, what can it tell you? Asking for it to produce a 1-10 figure in the same token stream as the actual task seems like a flawed strategy.
A blind mathematician can do revolutionary work despite not being able to see
> large language models are fundamentally not meant for tasks of this nature<p>There should be some research results showing their fundamental limitations. As opposed to empirical observations. Can you point at them?<p>What about VLMs, VLAs, LMMs?
Can't LLMs be fed the entire corpus of literature to synthesise (if not "insight") useful intersections? Not to mention much better search than what was available when I was a lowly grad...
Finding patterns in large datasets is one of the things LLMs are really good at. Genetics is an area where scientists have already done impressive things with LLMs.<p>However you feel about LLMs, and I say this because you don't have to use them for very long before you witness how useful they can be for large datasets so I'm guessing you're not a fan, they are undeniably incredible tools in some areas of science.<p><a href="https://news.stanford.edu/stories/2025/02/generative-ai-tool-marks-a-milestone-in-biology-and-accelerates-the-future-of-life-sciences" rel="nofollow">https://news.stanford.edu/stories/2025/02/generative-ai-tool...</a><p><a href="https://www.nature.com/articles/s41562-024-02046-9" rel="nofollow">https://www.nature.com/articles/s41562-024-02046-9</a>
In reference to the second article: who cares? What we care about is experimental verification. I could see maybe accurate prediction being helpful in focusing funding, but you still gotta do the experimentation.<p>Not disagreeing with your initial statement about LLMs being good and finding patterns in datasets btw.
As a scientist, the two links you provided are severely lacking in utility.<p>The first developed a model to calculate protein function based on DNA sequence - yet provides no results of testing of the model. Until it does, it’s no better than the hundreds of predictive models thrown on the trash heap of science.<p>The second tested a models “ability to predict neuroscience results” (which reads really oddly). How did they test it? Pitted humans against LLMs in determining which <i>published abstracts were correct</i>.<p>Well yeah? That’s exactly what LLMs are good at - predicting language. But science is not advanced by predicting which abstracts <i>of known science</i> are correct.<p>It reminds me of my days in working with computational chemists - we had an x-ray structure of the molecule bound to the target. You can’t get much better than that at hard, objective data.<p>“Oh yeah, if you just add a methyl group here you’ll improve binding by an order of magnitude”.<p>So we went back to the lab, spent a week synthesizing the molecule, sent it to the biologists for a binding study. And the new molecule was <i>50% worse at binding</i>.<p>And that’s not to blame the computation chemist. Biology is <i>really damn hard</i>. Scientists are constantly being surprised at results that are contradictory to current knowledge.<p>Could LLMs be used in the future to help come up with broad hypotheses in new areas? Sure! Are the hypotheses going to prove fruitless most of the time? Yes! But that’s science.<p>But any claim of a massive leap in scientific productivity (whether LLMs or something else) should be taken with a grain of salt.
> Finding patterns in large datasets is one of the things LLMs are really good at.<p>Where by "good at" you mean "are totally shit at"?<p>They routinely hallucinate things even on tiny datasets like codebases.
I don't follow the logic that "it hallucinates so it's useless". In the context of codebases I know for sure that they can be useful. Large datasets too. Are they also really bad at some aspects of dealing with both? Absolutely. Dangerously, humorously bad sometimes.<p>But the latter doesn't invalidate the former.
> I don't follow the logic that "it hallucinates so it's useless".<p>I... don't even know how to respond to that.<p>Also. I didn't say they were useless. Please re-read the claim I responded to.<p>> Are they also really bad at some aspects of dealing with both? Absolutely. Dangerously, humorously bad sometimes.<p>Indeed.<p>Now combine "Finding patterns in large datasets is one of the things LLMs are really good at." with "they hallucinate even on small datasets" and "Are they also really bad at some aspects of dealing with both? Absolutely. Dangerously, humorously bad sometimes"<p>Translation, in case logic somehow eludes you: if an LLM finds a pattern in a large dataset given that it often hallucinates, dangerously, humorously bad, what are the chances that the pattern it found isn't a hallucination (often subtle one)?<p>Especially given the undeniable verifiable fact that LLMs are shit at working with large datasets (unless they are explicitly trained on them, but then it still doesn't remove the problem of hallucinations)
I made a toy order item cost extractor out of my pile of emails. Claude added confidence percentage tracking and it couldn't be more useless.
This is what Yan Le Cun means when he talks about how research is at a dead end at the moment with everyone all in on LLMs to a fault
I'm just a noob but lecun seems obsessed with the idea of world models, which I assume means a more rigorous physical approach, and I don't understand (again, confused noob here) how are t would help precise abstract thinking.