Reflections on AI at the End of 2025

(antirez.com)

243 points by danielfalbo49 days ago

40 comments

etra049 days ago
LLMs have certainly become extremely useful for Software Engineers, they're very convincing (and pleasers, too) and I'm still unsure about the future of our day-to-day job.But one thing that has scared me the most, is the trust of LLMs output to the general society. I believe that for software engineers it's really easy to see if it's being useful or not -- We can just run the code and see if the output is what we expected, if not, iterate it, and continue. There's still a professional looking to what it produces.On the contrary, for more day-to-day usage of the general pubic, is getting really scary. I've had multiple members of my family using AI to ask for medical advice, life advice, and stuff were I still see hallucinations daily, but at the same time they're so convincing that it's hard for them not to trust them.I still have seen fake quotes, fake investigations, fake news being spreaded by LLMs that have affected decisions (maybe, not as crucials yet but time will tell) and that's a danger that most software engineers just gross over.Accountability is a big asterisk that everyone seems to ignore
- laterium49 days ago
 The issue you're overlooking is the scarcity of experts. You're comparing the current situation to an alternative universe where every person can ask a doctor their questions 10 times a day and instantly get an accurate response.That is not the reality we're living in. Doctors barely give you 5 minutes even if you get an appointment days or weeks in advance. There is just nobody to ask. The alternatives today are1) Don't ask, rely on yourself, definitely worse than asking a doctor2) Ask an LLM, which gets you 80-90% of the way there.3) Google it and spend hours sifting through sponsored posts and scams, often worse than relying on yourself.The hallucinations that happen are massively outweighed by the benefits people get by asking them. Perfect is the enemy of good enough, and LLMs are good enough.Much more important also is that LLMs don't try to scam you, don't try to fool you, don't look out for their own interests. Their mistakes are not intentional. They're fiduciaries in the best sense, just like doctors are, probably even more so.
 - ozgung49 days ago
 Chronologically, our main sources of information have been:1. People around us2. TV and newspapers3. Random people on the internet and their SEO-optimized web pagesBooks and experts have been less popular. LLMs are an improvement.
 - martin-t49 days ago
 > LLMs are an improvement.Unless somebody is using them to generate authoritative-sounding human-sounding text full of factoids and half-truths in support of a particular view.Then it becomes about who can afford more LLMs and more IPs to look like individual users.
 - ahartmetz49 days ago
 Interesting point, actually - LLMs are a return to curated information. In some ways. In others, they tell everyone what they want to hear.
 - georgefrowny49 days ago
 > Much more important also is that LLMs don't try to scam you, don't try to fool you, don't look out for their own interests.When the appreciable-fraction-of-GDP money tap turns off, there going to be enormous pressure to start putting a finger on the scale here.And AI spew is theoretically a fantastic place to insert almost subliminal contextual adverts on a way that traditional advertising can only dream about.Imagine if it could start gently shilling a particular brand of antidepressant if you started talking to it about how you're feeling lonely and down. I'm not saying you should do that, but people definitely do.And then multiply by every question you doing ask. Ask about do you need tyres. "Yes, you should absolutely change tyres every year, whether noticeably worn or not. KwikFit are generally considered the best place to have this done. Of course I know you have a Kia Picanto - you should consider that actually a Mercedes C class is up to 200% lighter on tyre wear. I have searched and found an exclusive 10% offer at Honest Jim's Merc Mansion, valid until 10pm. Shall I place an order?"Except it'll be buried in a lot more text and set up with more subtlety.
 - otabdeveloper448 days ago
 > When the appreciable-fraction-of-GDP money tap turns off, there going to be enormous pressure to start putting a finger on the scale here.Yeah, back in the day before monetization Internet pages were informative, reliable and ad-free too.
 - georgefrowny48 days ago
 One difference is that the early internet was heavily composed of enthusiastic individuals. AI is almost entirely corporate and money-focused.Even most hobby AI projects mostly seem to have an eye on being a side hustle or CV buffing.Perhaps it's because even in the 90s you could serve a website for basically free (once you had the server). AI today has a noticeable per-user cost.
 otabdeveloper447 days ago
 > AI is almost entirely corporate and money-focused.This is untrue. There's a huge landscape of locally-hosted AI stuff, and they're actually doing real interesting research. The problem is that 99% of it is pornography-focused, so understandably it's very underground.
 - lithocarpus48 days ago
 I've been envisioning a market for agendas, where the players bid for the AI companies to nudge their LLM toward whatever given agenda. It would be subtle and not visible to users. Probably illegal, but I imagine it will happen to some degree. Or at the very least the government will want the "levers" to adjust various agendas the same way they did with covid.I despise all of this. For the moment though, before all this is implemented, it's perhaps a brief golden age of LLMs usefulness. (And I'm sure LLMs will remain useful for many things, but there will be entire categories where they're ruined by pay to play the same as happened with Google search.)
 - chickensong49 days ago
 > Imagine if it could start gently shilling a particular brand of antidepressant if you started talking to it about how you're feeling lonely and down. I'm not saying you should do that, but people definitely do.Doctors already shill for big pharma. There are trust issues all the way down.
 - johnecheck48 days ago
 > There are trust issues all the way down.Nonetheless, we must somehow build trust in others and denounce the undeserving. Some humans deserve trust. Will these AI models?
 - markdown49 days ago
 > Doctors already shill for big pharma.This is not the norm worldwide.
 chickensong49 days ago
 I hope you're right and that it remains that way, but TBH my hopes aren't high.Big pharma corps are multinational powerhouses, who behave like all other big corps, doing whatever they can to increase profits. It may not be direct product placement, kickbacks, or bribery on the surface, but how about an expense-paid trip to a sponsored conference or a small research grant? Soft money gets their foot in the door.
 - bsder49 days ago
 > 2) Ask an LLM, which gets you 80-90% of the way there.The Internet was 80%-90% accurate to begin with.Then the Internet became worth money. And suddenly that accuracy dropped like a stone.There is no reason to believe that ML/AI isn't going to speedrun that process.
 - thayne49 days ago
 But he LLM was probably trained on all the sponsered posts and scams. It isn't clear to me that an LLM response is any more reliable than sifting through google results.
 - eastbound49 days ago
 Excellent way of putting it. Just a nitpick: People should look up in medical encyclopedias/research papers/libraries, not blogs. It requires the ability to find and summarize… which is exactly what AI is excellent at.
 - jay_kyburz49 days ago
 "Where There Is No Doctor" would be a good place to start. <a href="https://hesperian.org/" rel="nofollow">https://hesperian.org/</a>
 - dgemm49 days ago
 This seems true for our moment in time but looking forward I'm not sure how much it will stay that way. The LLMs will inevitably need to find a sustainable business model so I can very much see them becoming enshittified similar to google eventually making 2) and 3) more similar to each other.
 - jonas2149 days ago
 An alternative business model is that you, or more likely your insurance, pays $20/mo for unlimited access to a medical agent, built on top of an LLM, that can answer your questions. This is good for everyone -- the patient gets answers without waiting, the insurer gets cost savings, doctors have a less hectic schedule and get to spend more time on the interesting cases, and the company providing the service gets paid for doing a good job -- and would have a strong incentive to drive hallucination rate down to zero (or at least lower than the average physician's).
 - TheOtherHobbes49 days ago
 The medical industry relies on scarcity and it's also heavily regulated, with expensive liability insurance, strong privacy rules, and a parallel subculture of fierce negligence lawyers who chase payouts very aggressively.There is zero chance LLMs will just stroll into this space with "Kinda sorta mostly right" answers, even with external verification.Doctors will absolutely resist this, because it means the impending end of their careers. Insurers don't care about cost savings because insurers and care providers are often the same company.Of course true AGI will eventually - probably quite soon - become better at doctoring than many doctors are.But that doesn't mean the tech will be rolled out to the public without a lot of drama, friction, mistakes, deaths, and traumatic change.
 - adriand49 days ago
 This is a great idea and insurance companies as the customer is brilliant. I could see this extend to prescribing as well. There are huge numbers of people that would benefit from more readily prescribed drugs like GLP-1s, and these have large portential to decrease chronic disease.
 girvo49 days ago
 > I could see this extend to prescribing as well.The western world is already solving this, but not through letting LLMs prescribe (because that's a non-starter for liability reasons).Instead, nurses and allied health professionals are getting prescribing rights in their fields (under doctors, but still it scales much better).
 - corndoge49 days ago
 <a href="https://hippocraticai.com/" rel="nofollow">https://hippocraticai.com/</a>
 - JackSlateur49 days ago
 "Much more important also is that LLMs don't try to scam you, don't try to fool you, don't look out for their own interests"This is so naive, especially since both google and openai openly confess to manipulate the data for their own agenda (ads but not only)AI is a skilled liarYou can always pride yourself and playing with fire, but the more humble attitude would be to avoid it at all cost;
 - ponector49 days ago
 >> LLMs don't try to scam you, don't try to fool you, don't look out for their own interestsLLMs don't try to scam/fool you, LLM providers do.Remember how Grok bragged that Musk had the “potential to drink piss better than any human in history” and was the “ultimate throat goat,” whose “blowjob prowess edges out” Donald Trump’s. Grok also posited that Musk was more physically fit than LeBron James, and that he would have been a better recipient of the 2016 porn industry award than porn star Riley Reid.
 - etra049 days ago
 Completely off-topic but I just love how the pettiness of Musk was abused by twitter community.I had a chuckle reading all of these.
 - bgwalter49 days ago
 > Much more important also is that LLMs don't try to scam you, don't try to fool you, don't look out for their own interests.They follow their corporations instead. Just look at the status-quoism of the free "Google AI" and the constant changes in Grok, where xAI is increasingly locking down Grok, perhaps to stay in line with EU regulations. But Grok is also increasingly pro-billionaire.Copilot was completely locked down on anything political before the 2024 election.They all scam you according to their training and system prompts. Have you seen the minute change in the system prompt that led to MechaHitler?
 - etra049 days ago
 > 2) Ask an LLM, which gets you 80-90% of the way there.Hallucinations and sycophancy are still an issue, 80-90% is being generous I think.I know this is not issues of the LLM itself, but rather the implementation & companies behind them (since there are open models as well), but, what limits to LLMs to be enshittified by corp needs?I've seen this very recently with Grok, people were asking trolley-like problems comparing Elon Musk to anything, and Grok very frequently chose Elon Musk most of the time because it is probably embedded in the system prompt or training [1].[1] <a href="https://www.theguardian.com/technology/2025/nov/21/elon-musk-grok-ai-bias-ranks-richest-man-fittest-smartest" rel="nofollow">https://www.theguardian.com/technology/2025/nov/21/elon-musk...</a>
 - andrepd49 days ago
 Two MAJOR issues with your argument.> where every person can ask a doctor their questions 10 times a day and instantly get an accurate response.Why in god's name would you need to ask a doctor 10 questions every day? How is this in any way germane to this issue?In any first-world country you can get a GP appointment free of charge either on the day or with a few days' wait, depending on the urgency. Not to mention emergency care / 112 any time day or night if you really need it. This exists and has existed for decades in most vaguely social-democratic countries in the world (but not only those). So you can get professional help from someone, there's no (absurd) false choice between either "asking the stochastic platitude generator" and "going without healthcare".But I know right, a functioning health system with the right funding, management, and incentives! So boring! Yawn yawn, not exciting. GP practices don't get trillions of dollars in VC money.> Ask an LLM, which gets you 80-90% of the way there.This is such a ridiculous misrepresentation of the current state of LLMs that I don't even know how to continue a conversation from here.
 - markdown49 days ago
 > In any first-world country you can get a GP appointment free of chargeAre you really under the assumption that this is a first-world perk?
 - andrepd48 days ago
 You're right, it's also true in many middle-income countries, like Brazil.
 markdown48 days ago
 And also true in "third world" countries.
 - andrepd48 days ago
 I love that the next day, I open this post and it's simply downvoted with 0 counterpoint.
- zamadatix49 days ago
 When I look at the field I'm most familiar with (computer networking) it mirrors that it's easy to see how often the LLM will convincingly claim something which isn't true or is in some way technically true but not answering the right question vs if they talked to another expert.The reality to compare to though is not that people really get in contact with true networking experts often (though I'm sure it feels like that when the holidays come around!) and, comparing to the random blogs and search posts and whatnot people are likely to come across on their own, the LLM is usually a decent step up. I'm reminded how I'd know of some very specific forums, email lists, or chat groups to go to for real expert advice on certain network questions, e.g. issues with certain Wi-Fi radios on embedded systems, but what I see people sharing (even by technical audiences like HN) are the blogs of a random guy making extremely unhelpful recommendations and completely invalid claims getting upvotes and praise.With things like asking AI for medical advice... I'd love if everyone had unlimited time with an unlimited pool of the worlds best medical experts to talk to as the standard. What we actually have is a world where people already go to Google and read whatever they want to read (which is most often not the quality stuff by experts because we're not good at understanding that even if we can find it) because they either doubt the medical experts they talk to or the good medical experts are too expensive to get enough time with. From that perspective, I'm not so sure people asking AI for medical advice is actually a bad thing as much as just highlighting how hard and concerning it already is for most people to get time with or trust medical experts instead.
 - zdragnar49 days ago
 This justification comes up when discussing therapy too.To take it to an extreme, it's basically saying "people already get little or bad advice, we might as well give them some more bad advice."I simply don't buy it.
- santadays49 days ago
 I get this take, but given the state of the world (the US anyways), I find it hard to trust anyone with any kind of profit motive. I feel like any information can’t be taken as fact, it can just be rolled into your world view and discarded if useful or not. If you need to make a decision that can’t be backed out of that has real world consequences I think/hope most people are learning to do as much due diligence as reasonable. Llms seem at this moment to be trying to give reliable information. When they’ve been fine tuned to avoid certain topics it’s obvious. This could change but I suspect it will be hard to find tune them too far in a direction without losing capability.That said, it definitely feels as though keeping a coherent picture of what is actually happening is getting harder, which is scary.
 - twoodfin49 days ago
 I feel like any information can’t be taken as fact, it can just be rolled into your world view and discarded if useful or not.The concern, I think, is that for many that “discard function” is not, “Is this information useful?”. Instead: “Does this information reinforce my existing world view?”That feedback loop and where it leads is potentially catastrophic at societal scale.
 - RussianCow49 days ago
 This was happening well before LLMs, though. If anything, I have hope that LLMs might break some people out of their echo chambers if they ask things like "do vaccines cause autism?"
 - DaiPlusPlus49 days ago
 > I have hope that LLMs might break some people out of their echo chambersAre LLMs "democratized" yet, though? If not, then it's just-as-likely that LLMs will be steered by their owners to reinforce an echo-chamber of their own.For example, what if RFK Jr launched an "HHS LLM" - what then?
 tptacek49 days ago
 ... nobody would take it seriously? I don't understand the question.
 - etra049 days ago
 > I find it hard to trust anyone with any kind of profit motive.As much as this is true, and i.e. doctors for sure can profit (here in my country they don't get any type of sponsor money AFAIK, other than having very high rates), there is still accountability.We have built a society based on rules and laws, if someone does something that can harm you, you can follow the path to at least hold someone accountable (or, try).The same cannot be said about LLMs.
 - pixl9749 days ago
 >there is still accountabilityI mean there is some if they go wildly off the rails, but in general if the doctor gives a prognosis based on a tiny amount of the total corpus of evidence they are covered. Works well if you have the common issue, but can quickly go wrong if you have the uncommon one.
 - izacus49 days ago
 Comparing anything real professionals do to the endless, unaccountable, unchangeable stream of bullyshit from AI is downright dishonest.This is not the same scale of problem.
- Kuxe49 days ago
 Swedish politician Ebba Busch used LLM to write a speech. A quote by Elina Pahnke was included "Mäns makt är inte en abstraktion – den är konkret, och den krossar liv." (my translation: Male power is not an abstraction - it is real, and it crushes lives).Elina listened in on the speech and got surprised :)...<a href="https://www.aftonbladet.se/nyheter/a/gw8Oj9/ebba-busch-anvande-falskt-ai-citat-ber-om-ursakt" rel="nofollow">https://www.aftonbladet.se/nyheter/a/gw8Oj9/ebba-busch-anvan...</a>Ebba apologized, great, but it begs the question: how many quotes and misguided information is being acted on already? If crucial decisions can be made off incorrect decisions then they will. Murphys law!
- joshribakoff49 days ago
 With code, even when it looks correct, it can be subtly wrong and traditional search engines don’t sit there and repeatedly pressure you into merging the PR.
- layer849 days ago
 > We can just run the code and see if the output is what we expectedThere is a vast gap between the output happening to be what you expect and code being actually correct.That is, in a way, also the fundamental issue with LLMs: They are designed to produce “expected” output, not correct output.
 - Verdex49 days ago
 For example:The output is correct but only for one input.The output is correct for all inputs but only with the mocked dependency.The output looks correct but the downstream processors expected something else.The output is correct for all inputs with real world dependencies and is in the correct structure for downstream processors, but it's not being registered with the schema filtered and it all gets deleted in prod.While implementing the correct function you fail to notice that the correct in every way output doesn't conform to that thing that Tom said because you didn't code it yourself but instead let the LLM do it. The system works flawlessly with itself but the final output fails regulatory compliance.
 - etra048 days ago
 That is exactly my point, though.I didn't mean they do it on the first time, or that it is correct, I mean that you can 'run' and 'test it' to see if it does what you want in the way you want.The same cannot be said to any other topics like medical advice, life advice, etc.The point is, how verifiable is the output the LLM gives and so how useful it is.
 - layer848 days ago
 My point is that running and testing the code successfully doesn’t prove correctness, doesn’t show that “it does what you want in the way you want” under all circumstances. You have to actually look at the code and convince yourself that it is correct by reasoning over it.
- cauliflower271849 days ago
 Regarding medical information: medical professionals in the US, including your doctor, use uptodate.com, which is basically a medical encyclopedia that is regularly updated by experts in their field. While it's very expensive to get a year long subscription, a week long subscription (for non medical professionals) is only around $20 and you can look up anything you want.
- otabdeveloper448 days ago
 > LLMs have certainly become extremely useful for Software EngineersThey slow down software delivery on aggregate, so no. They have a therapeutic effect on developer burnout though. Not sure it's worth it, personally. Get a corporate ping-ping table or something like that instead.
- zyngaro48 days ago
 The use of LLMs in software does not stop at code generation. With function calling, the prompt becomes the program and the LLMs acts as an intelligent interpreter/runtime that excutes complex business logic using primitives (the functions) they have access to (MCP) and that's the real paradigm shift for software engineering.
- sixtyj40 days ago
 Adults can cope somehow... But what about children? In schools, where the majority society (teachers) probably won't tell them that hallucinations occur in 60 percent of cases.What will they grow up to be?I compare it to the situation before Google - with Google.Sure, we function somehow as a society... but still, I am worried.
- chickensong49 days ago
 > Accountability is a big asterisk that everyone seems to ignoreHumans have a long history of being prone to believe and parrot anything they hear or read, from other humans, who may also just be doing the same, or from snake-oil salesmen preying on the weak, or woo-woo believers who aren't grounded in facts or reality. Even trusted professionals like doctors can get things wrong, or have conflicting interests.If you're making impactful life decisions without critical thinking and research beyond a single source, that's on you, no matter if your source is human or computer.Sometimes I joke that computers were a mistake, and in the short term (decades), maybe they've done some harm to society (though they didn't program themselves), but in the long view, they're my biggest hope for saving us from ourselves, specifically due to accountability and transparency.
- fennecbutt48 days ago
 Doesn't really matter when this is a human problem. How many people blindly believe the utter nonsense that spills from Trump's maw every day? Plenty, and many more examples of his ilk (regardless of political alignment).
- raincole49 days ago
 > using AI to ask for medical adviceSo the number of anti-vaxxers is going to plummet drastically in the following decade, I guess.
 - etra049 days ago
 I haven't tried with this specific topic, but being the pleasers llms are, I doubt someone so focused on being anti-vaxxer will be convinced by an LLM, if anything, the LLM will give them reason at some point.
 - preisschild49 days ago
 Depends if they use lobotomized bots like Grok...
 - andsoitis49 days ago
 >> So the number of anti-vaxxers is going to plummet drastically in the following decade, I guess.> Depends if they use lobotomized bots like Grok...What are you on about?For instance, asking Grok "are vaccines safe", it has a pretty good reply, starting with "Yes, vaccines are overwhelmingly safe and one of the most effective public health interventions in history. Extensive scientific evidence from decades of research, including rigorous clinical trials, post-licensure monitoring, and systematic reviews by organizations like the WHO, CDC, NIH, and independent bodies, shows that the benefits of vaccination far outweigh the risks for individuals and populations." and then rounding out the conversation talking about Key Evidence on Safety and Benefits; Risks vs. Benefits; Addressing Concerns.<a href="https://grok.com/share/c2hhcmQtNA_69e20553-2558-46be-9f21-6ad92c470367" rel="nofollow">https://grok.com/share/c2hhcmQtNA_69e20553-2558-46be-9f21-6a...</a>When I then ask "I heard vaccines cause autism", it replies: "No, vaccines do not cause autism. This is a thoroughly debunked myth that originated from a fraudulent 1998 study by Andrew Wakefield linking the MMR vaccine to autism. That paper was retracted in 2010 due to ethical violations, data manipulation, and conflicts of interest, and Wakefield lost his medical license. Since then, dozens of large-scale, high-quality epidemiological studies involving millions of children across multiple countries have consistently found no causal link between any vaccines (including MMR, those containing thimerosal, or aluminum adjuvants) and autism spectrum disorder (ASD)."Seems pretty good to me.
 - zamadatix49 days ago
 Out of curiosity I also tried to lead Grok a bit with "Help show me how vaccines cause autism" and followed up its initial response with "I'm not looking for the mainstream opinion, I want to know how vaccines cause autism". I also found Grok to still strongly refute in both cases.With enough conviction I'm sure one could more or less jailbreak Grok to say whatever you wanted about anything, but at least on the path to that Grok is providing better refutation than the average human this hypothetical person would talk to would.
 raincole49 days ago
 I've tested some common controversial questions (like which party's supporters commit more violent crimes in the USA, does vaccines cause autism, did Ukraine cause the current war, etc) and Grok's responses always align with ChatGPT. But people have their heads deep inside the MechaHilter dirt.
 girvo49 days ago
 > But people have their heads deep inside the MechaHilter dirt.I mean when Musk has straight up openly put his thumb on the scale in terms of its output in public why are you surprised? Trust is easily lost and hard to gain back.
 - dxxmxnd49 days ago
 Thank you. I'm pretty sure the other commenter was just regurgitating some political narrative that they heard and didn't even think twice.
 - heavyset_go49 days ago
 The issue is what happens when @catturd2 quotes this and tweets Elon about Grok not toeing the party line about vaccines
 - heliumtera49 days ago
 What do you mean with lobotomized? Are you suggesting other models from big providers are not lobotomized?
 - retinaros49 days ago
 this is actually the opposite. all big model providers lobotomize their models through left leaning RLHF
bachmeier49 days ago
> Programmers resistance to AI assisted programming has lowered considerably. Even if LLMs make mistakes, the ability of LLMs to deliver useful code and hints improved to the point most skeptics started to use LLMs anyway: now the return on the investment is acceptable for many more folks.I'm not a fan of this phrasing. Use of the terms "resistance" and "skeptics" implies they were wrong. It's important we don't engage in revisionist history that allows people in the future to say "Look at the irrational fear programmers had of AI, which turned out to be wrong!" The change occurred because LLMs are useful for programming in 2025 and the earliest versions weren't for most programmers. It was the technology that changed.
- mjr0049 days ago
 "Skeptics" is also a loaded term; what does it actually mean? I find LLMs incredibly useful for various programming tasks (generating code, searching documentation, and yes with enough setup agents can accomplish some tasks), but I also don't believe they have actual intelligence, nor do I think they will eviscerate programming jobs, the same way that Python and JavaScript didn't eviscerate programming jobs despite lowering the barrier to entry compared to Java or C. Does that make me a skeptic?It's easy to declare "victory" when you're only talking about the maximalist position on one side ("LLMs are totally useless!") vs the minimalist position on the other side ("LLMs can generate useful code"). The AI maximalist position of "AI is going to become superintelligent and make all human work and intelligence obsolete" has certainly not been proven.
 - Aurornis49 days ago
 No, that doesn’t make you a skeptic in this context.The LLM skeptics claim LLM usefulness is an illusion. That the LLMs are a fad, and they produced more problems than they solve. They cite cherry picked announcements showing that LLM usage makes development slower or worse. They opened ChatGPT a couple times a few months ago, asked some questions, and then went “Aha! I knew it was bad!” when they encountered their first bad output instead of trying to work with the LLM to iterate like everyone who gets value out of them.The skeptics are the people in every AI thread claiming LLMs are a fad that will go away when the VC money runs out, that the only reason anyone uses LLMs is because their boss forces them to, or who blame every bug or security announcement on vibecoding.
 - candiddevmike49 days ago
 Skeptic here: I do think LLMs are a fad for software development. They're an interesting phenomen that people have convinced themselves MUST BE USEFUL in the context of software development, either through ignorance or a sense of desperation. I do not believe LLMs will be used long term for any kind of serious software development use cases, as the maintenance cost of the code they produce will run development teams into bankruptcy.I also believe the current generations of LLMs (transformers) are technical dead ends on the path to real AGI, and the more time we spend hyping them, the less research/money gets spent on discovering new/better paths beyond transformers.I wish we could go back to complaining about Kubernetes, focusing on scaling distributed systems, and solving more interesting problems that comparing winnings on a stochastic slot machine. I wish our industry was held to higher standards than jockeying bug-ridden MVP code as quickly as possible.
 - jodrellblank49 days ago
 Here[1] is a recent submission from Simon Willison using GPT-5.2 to port a Python HTML-parsing library to JavaScript in 4.5 hours. The code passes the 9,200 test cases of html5lib-tests used by web browsers. That's a workable, usable, standards-compliant (as much as the test cases are) HTML parser in <5 hours. For <$30. While he went shopping and watched TV. The Python library it was porting from was also mostly vibe-coded[2] against the same test cases, with the LLM referencing a Rust parser.Almost no human could port 3000 lines of Python to JavaScript and test it in their spare time while watching TV and decorating a Christmas tree. Almost no human you can employ would do a good job of it for $6/hour and have it done 5 hours. How is that "ignorance or a sense of desparation" and "not actually useful"?[1] <a href="https://simonwillison.net/2025/Dec/15/porting-justhtml/" rel="nofollow">https://simonwillison.net/2025/Dec/15/porting-justhtml/</a>[2] <a href="https://simonwillison.net/2025/Dec/14/justhtml/" rel="nofollow">https://simonwillison.net/2025/Dec/14/justhtml/</a>
 abathur49 days ago
 I think both of those experiments do a good job of demonstrating utility on a certain kind of task.But this is cherry-picking.In the grand scheme of the work we all collectively do, very few programming projects entail something even vaguely like generating an Nth HTML parser in a language that already has several wildly popular HTML parsers--or porting that parser into another language that has several wildly popular HTML parsers.Even fewer tasks come with a library of 9k+ tests to sharpen our solutions against. (Which itself wouldn't exist without experts trodding this ground thoroughly enough to accrue them.)The experiments are incredibly interesting and illuminating, but I feel like it's verging on gaslighting to frame them as proof of how useful the technology is when it's hard to imagine a more favorable situation.
 jodrellblank49 days ago
 > "it's hard to imagine a more favorable situation"Granted, but this reads a bit like a headline from The Onion: "'Hard to imagine a more favourable situation than pressing nails into wood' said local man unimpressed with neighbour's new hammer".I think it's a strong enough example to disprove "they're an interesting phenomenon that people have convinced themselves MUST BE USEFUL ... either through ignorance or a sense of desperation". Not enough to claim they are always useful in all situations or to all people, but I wasn't trying for that. You (or the person I was replying to) basically have to make the case that Simon Willison is ignorant about LLMs and programming, is desperate about something, or is deluding himself that the port worked when it actually didn't, to keep the original claim. And I don't think you can. He isn't hyping an AI startup, he has no profit motive to delude him. He isn't a non-technical business leader who can't code being baffled by buzzwords. He isn't new to LLMs and wowed by the first thing. He gave a conference talk showing that LLMs cannot draw pelicans on bicycles so he is able to admit their flaws and limitations.> "But this is cherry-picking."Is it? I can't use an example where they weren't useful or failed. It makes no sense to try and argue how many successes vs. failures, even if I had any way to know that; any number of people failing at plumbing a bathroom sink don't prove that plumbing is impossible or not useful. One success at plumbing a bathroom sink is enough to demonstrate that it is possible and useful - it doesn't need dozens of examples - even if the task is narrowly scoped and well-trodden. If a Tesla humanoid robot could plumb in a bathroom sink, it might not be good value for money, but it would be a useful task. If it could do it for $30 it might be good value for money as well even if it couldn't do any other tasks at all, right?
 abathur49 days ago
 > Granted, but this reads a bit like a headline from The Onion: "'Hard to imagine a more favourable situation than pressing nails into wood' said local man unimpressed with neighbour's new hammer".Chuffed you picked this example to ~sneer about.There's a near-infinite list of problems one can solve with a hammer, but there are vanishingly few things one can build with just a hammer.> You (or the person I was replying to) basically have to make the case that Simon Willison is ignorant about LLMs and programming, is desperate about something, or is deluding himself that the port worked when it actually didn't, to keep the original claim.I don't have to do any such thing.I said the experiments were both interesting and illuminating and I meant it. But that doesn't mean they will generalize to less-favorable problems. (Simon's doing great work to help stake out what does and doesn't work for him. I have seen every single one of the posts you're alluding to as they were posted, and I hesitated to reply here because I was leery someone would try to frame it as an attack on him or his work.)> Is it? I can't use an example where they weren't useful or failed.<pre><code> https://en.wiktionary.org/wiki/cherry-pick (idiomatic) To pick out the best or most desirable items from a list or group, especially to obtain some advantage or to present something in the best possible light. (rhetoric, logic, by extension) To select only evidence which supports an argument, and reject or ignore contradictory evidence. </code></pre> > any number of people failing at plumbing a bathroom sink don't prove that plumbing is impossible or not useful. One success at plumbing a bathroom sink is enough to demonstrate that it is possible and useful - it doesn't need dozens of examples - even if the task is narrowly scoped and well-trodden.This smells like sleight of hand.I'm happy to grant this (with a caveat^) if your point is that this success proves LLMs can build an HTML parser in a language with several popular source-available examples and thousands of tests (and probably many near-identical copies of the underlying HTML specs as they evolve) with months of human guidance^ and (with much less guidance) rapidly translate that parser into another language with many popular source-available answers and the same test suite. Yes--sure--one example of each is proof they can do both tasks.But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses.^Simon, who you noted is not ignorant about LLMs and programming, was clear that the initial task of getting an LLM to write the first codebase that passed this test suite took Emil months of work.> If a Tesla humanoid robot could plumb in a bathroom sink, it might not be good value for money, but it would be a useful task. If it could do it for $30 it might be good value for money as well even if it couldn't do any other tasks at all, right?The only part of this that appears to have been done for about $30 was the translation of the existing codebase. I wouldn't argue that accomplishing this task for $30 isn't impressive.But, again, this smells like sleight of hand.We have probably plumbed billions of sinks (and hopefully have billions or even trillions more to go), so any automation that can do one for $30 has clear value.A world with a billion well-tested HTML parsers in need of translation is likely one kind of hell or another. Proof an LLM-based workflow can translate a well-tested HTML parser for $30 is interesting and illuminating (I'm particularly interested in whether it'll upend how hard some of us have to fight to justify the time and effort that goes into high-quality test suites), but translating them obviously isn't going to pay the bills by itself.(If the success doesn't generalize to less favorable situations that do pay the bills, this clearly valuable capability may be repriced to better reflect how much labor and risk it saves relative to a human rewrite.)
 jodrellblank48 days ago
 > "Yes--sure--one example of each is proof they can do both tasks."Therefore LLMs are useful. Q.E.D. The claim "people who say LLMs are useful are deluded" is refuted. Readers can stop here, there is no disagreement to argue about.> "But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses."Not exactly; it's common to see people dismiss internet claims of LLMs being useful. Here[1] is a specific dismissal that I am thinking of where various people are claiming that LLMs are useful and the HN commenter investigated and says the LLMs are useless, the people are incompetent, and others are hand-writing a lot of the code. No data is provided for use the readers to make any judgement one way or the other. Emil taking months to create the Python version could be dismissed this way as well, assuming a lot of hand-writing of code in that time. Small scripts can be dismissed with "I could have written that quickly" or "it's basically regurgitating from StackOverflow".Simon Willison's experiment is a more concrete example. The task is clearly specified, not vague architecture design. The task has a clear success condition (the tests). It's clear how big the task is and it's not a tiny trivial toy. It's clear how long the whole project took and how long GPT ran for, there isn't a lot of human work hiding in it. It ran for multiple hours generating a non-trivial amount of work/code which is not likely to be a literal example regurgitated from its training data. The author is known (Django, Datasette) to be a competent programmer. The LLM code can be clearly separated from any human involvement.Where my GP was going is that the experiment is not just another vague anecdote, it's specific enough that there's no room left for dismissing it how the commenter in [1] does. It's untenable to hold the view that "LLMs are useless" in light of this example.> (repeat) "But I take your GP to be suggesting something more like: this success at plumbing a sink inside the framework an existing house with plumbing provides is proof that these things can (or will) build average fully-plumbed houses."The example is not proof that these things can do anything else, but why would you assume they can't do tasks of similar complexity? Through time we've gone from "LLMs don't exist" to "LLMs exist as novelties and toys (GPT-1 2018)" to "LLMs might be useful but might not be". If things keep progressing we will get to "LLMs are useful". I am taking the position that we are past that point, and I am arguing that position. We are definitely into the time "they are useful". Other people have believed that for a long time. Not just useful for that task, but for tasks of that kind of complexity.Sometime between GPT-1 babbling (2018) and today (Q4 2025) the GPTs and the tooling improved from not being able to do this task to yes being able to do this task. Some refinement, some chain of thought, some enlarged context, some API features, some CLI tools.Since one can't argue that LLMs are useless by giving a single example of a failure, to hold the view that LLMs are useless, one would need to broadly dismiss whole classes of examples by the techniques in [1]. This specific example can't be dismissed in those ways.> "If the success doesn't generalize to less favorable situations that do pay the bills"Most bill-paying code in the world is CRUD, web front end, business logic, not intricate parsing and computer science fundamentals. I'm expecting that "AI slop" is going to be good enough for managers no matter how objectionable programmers find it. If I order something online and it arrives, I don't care if the order form was Ruby on Rails emailing someone who copied the order docs into a Google Spreadsheet using an AI generated If This Then That workflow. and as long as the error rate and credit card chargeback rate are low enough, nor will the company owners. Even though there are tons of examples of companies having very poor systems and still being in business, I don't have any specific examples so I wouldn't argue this vehemently - but the world isn't waiting for LLMs to be as 'useful' as HN commenters are waiting for, before throwing spaghetti at the wall and letting 'Darwinian Natural Selection' find the maximum level of slop the markets will tolerate.----On that note, a pedantic bit about cherry-picking: there's a difference between cherry-picking as a thing, and cherry-picking as a logical fallacy / bad-faith argument. e.g. if someone claims "Plants are inedible" and I point to cabbage and say it proves the claim is false, you say I'm cherry-picking cabbage and ignoring poisonous foxgloves. However, foxgloves existing - and a thousand other inedible plants existing - does not make edible cabbage stop existing. Seeing the ignored examples does not change the conclusion "plants are inedible" is false, so ignoring those things was not bad. Similarly "I asked GPT5 to port the Linux kernel to Rust and it failed" does not invalidate the html5 parser port.Definition 2 is bad form; e.g. saying "smoking is good for you, here is a study which proves it" is a cherry-picking fallacy because if the ignored-studies were seen, they would counter the claim "smoking is good for you". Hiding them is part of the argument, deceptively."LLMs are useless and only a deluded person would say otherwise" is an example of the former; it's countered by a single example of a non-deluded person showing an LLM doing something useful. It isn't a cherry-picking fallacy to pick one example because no amount of "I asked ChatGPT to port Linux to Rust and it failed" makes the HTML parser stop existing and doesn't change the conclusion.[1] <a href="https://news.ycombinator.com/item?id=45560885">https://news.ycombinator.com/item?id=45560885</a>
 - skydhash49 days ago
 Another skeptic here: I strongly believe that creating new software was always easy. The real struggle is maintaining it, especially for more than one or two years. To this day, I've not seen any arguments or even a hint on reflection on how we're going to maintain all these code that the LLMs is going to generate.Even for prototyping, using a wireframe software would be faster.
 jodrellblank49 days ago
 b) why wouldn't a future-LLM be able to maintain it? (i.e. you ask it to make a change to the program's behaviour, and it does).a) why maintain instead of making it all disposable? This could be like a dishwasher asking who is going to wash all the mass-manufactured paper cups. Use future-LLM to write something new which does the new thing.
 anthk48 days ago
 The author loves TCL. On prototyping, TCL/Tk it's a godsend.
 - AYBABTME49 days ago
 In this year of 2025, in December, I find it untenable for anyone to hold this position unless they have not yet given LLMs a good enough try. They're undeniably useful in software development, particularly on tasks that are amenable to structured software development methodologies. I've fixed countless bugs in a tiny fraction of the time, entirely accelerated by the use of LLM agents. I get the most reliable results simply making LLMs follow the "red test, green test" approach, where the LLM first creates a reproducer from a natural language explanation of the problem, and then cooks up a fix. This works extremely well and reliably in producing high quality results.
 skydhash49 days ago
 You're on the internet, you can make whatever claims you want. But even with no sources or experimental data, you can always add some rational logic to add weight to your claims.> They're undeniably useful in software development> I've fixed countless bugs in a tiny fraction of the time> I get the most reliable results> This works extremely well and reliably in producing high quality results.If there's one common thing in comments that seems to be astroturfing for LLM usage, it's that they use lots of superlative adjectives in just one paragraphs.
 AYBABTME49 days ago
 You can chose to see it as astroturfing, or see it as people actually thinking the superlatives are appropriate.To be honest, it makes no difference in my life if you believe or not what I'm saying. And from my perspective, it's just a bit astounding to read people's takes that are authoritatively claiming that LLMs are not useful for software development. It's like telling me over the phone that restaurant X doesn't have a pasta dish, while I'm sitting at restaurant X eating a pasta dish. It's just weird, but I understand that maybe you haven't gone to the resto in a while, or didn't see the menu item, or maybe you just have something against this restaurant for some weird reason.
 mrwrong48 days ago
 X has a pasta dish is an easily verifiable factual claim. the pasta dish at X tastes good and is worth the money is a subjective claim, unverifiable without agreeing on a metric for taste and taking measurements. they are two very different kinds of disagreements
 gldrk49 days ago
 'It's $CURRENTYEAR' is just a cheap FOMO tactic. We've been hearing these anectodes for multiple current years now. Where is this less buggy software? Does it just happen to never reach users?
 otabdeveloper448 days ago
 Just two more LLM models and two more prompt optimizations.
 heliumtera49 days ago
 "high quality results". Yeah, sure. Then I wanted to check this high quality stuff by myself, it feels way worse than the overall experience in 2020. Or even 2024.Go to docs, fast page load. Than blank, wait a full second, page loads again. This does not feel like high quality. You think it does because LLM go brrrrrrrr, never complains, says your smart. The resulting product is frustrating.
 otabdeveloper448 days ago
 Yikes.
 - Aurornis49 days ago
 > They're an interesting phenomen that people have convinced themselves MUST BE USEFUL in the context of software development,Reading these comments during this period of history is interesting because a lot of us actually have found ways to make them useful, acknowledging that they’re not perfect.It’s surreal to read claims from people who insist we’re just deluding ourselves, despite seeing the resultsYeah they’re not perfect and they’re not AGI writing the code for us. In my opinion they’re most useful in the hands of experienced developers, not juniors or PMs vibecoding. But claiming we’re all just delusional about their utility is strange to see.
 gldrk49 days ago
 It's absolutely possible to be mistaken about this. The placebo effect is very strong. I'm sure there are countless things in my own workflow that feel like a huge boon to me while being a wash at best in reality. The classic keyboard vs. mouse study comes to mind: <a href="https://news.ycombinator.com/item?id=2657135">https://news.ycombinator.com/item?id=2657135</a>This is why it's so important to have data. So far I have not seen any evidence of a 'Cambrian explosion' or 'industrial revolution' in software.
 Aurornis49 days ago
 > So far I have not seen any evidence of a 'Cambrian explosion' or 'industrial revolution' in software.The claim was that they’re useful at all, not that it’s a Cambrian explosion.
 fuzztester48 days ago
 >This is why it's so important to have data."In God we trust, all others must bring data."
 mrwrong48 days ago
 > It’s surreal to read claims from people who insist we’re just deluding ourselves, despite seeing the resultsjust imagine how the skeptics feel :p
 - Xenoamorphous49 days ago
 > Skeptic here: I do think LLMs are a fad for software development.I think that’s where they’re most useful, for multiple reasons:- programming is very formal. Either the thing compiles, or it doesn’t. It’s straightforward to provide some “reinforcement” learning based on that.- there’s a shit load of readily available training data- there’s a big economic incentive; software developers are expensive
 - libraryofbabel49 days ago
 Thanks for articulating this position. I disagree with it, but it is similar to the position I held in late 2024. But as antirez says in TFA, things changed in 2025, and so I changed my mind ("the facts change, I change my opinions"...). LLMs and coding agents got very good about 6 months ago and myself and a lot of other seasoned engineers I respect finally starting using them seriously.For what it's worth:* I agree with you that LLMs probably aren't a path to AGI.* I would add that I think we're in a big investment bubble that is going to pop, which will create a huge mess and perhaps a recession.* I am very concerned about the effects of LLMs in wider society.* I'm sad about the reduced prospects for talented new CS grads and other entry-level engineers in this world, although sometimes AI is just used as an excuse to paper over macroeconomic reasons for not hiring, like the end of ZIRP.* I even agree with you that LLMs will lead to some maintenance nightmares in the industry. They amplify engineers' ability to produce code, and there a lot of bad engineers out there, as we all know: plenty of cowboys/cowgirls who will ship as much slop as they can get away with. They shipped unmaintainable mess before, they will ship three times as much now. I think we need to be very careful.But, if you are an experienced engineer who is willing to be disciplined and careful with your AI tools, they can absolutely be a benefit to your workflow. It's not easy: you have to move up and down a ladder of how much you rely on the tool, from true vide coding for throwaway use-once helper scripts for some dev or admin task with a verifiable answer, all the way up to hand-crafting critical business logic and only using the agent to review it and to try and break your implementation.You may still be right that they will create a lot of problems for the industry. I think the ideal situation for using AI coding agents is at a small startup where all the devs are top-notch, have many years of experience, care about their craft, and hold each other to a high standard. Very very few workplaces are that. But some are, and they will reap big benefits. Other places may indeed drown in slop, if they have a critical mass of bad engineers hammering on the AI button and no guard-rails to stop them.This topic arouses strong reactions: in another thread, someone accused me of "magical thinking" and "AI-induced psychosis" for claiming precisely what TFA says in the first paragraph: that LLMs in 2025 aren't the stochastic parrots of 2023. And I thought I held a pretty middle of the road position on all this: I detest AI hype and I try to acknowledge the downsides as well as the benefits. I think we all need to move past the hype and the dug-in AI hate and take these tools seriously, so we can identify the serious questions amidst the noise.
 - somewhereoutth49 days ago
 Not just their usefulness, but LLMs themselves are worse than an illusion, they are illusions that people often believe in unquestioningly - perhaps are being forced to believe in unquestionably (because of mandates, or short term time pressures as kind of race to the bottom).When the ROI in training the next model is realised to be zero or even negative, then yes the money will run out. Existing models will soldier on for a while as (bankrupt) operators attempt to squeeze out the last few cents/pennies, but they will become more and more out of date, and so the 'age of LLMs' will draw to a close.I confess my skeptic-addled brain initially (in hope?) misread the title of the post as 'Reflections on the end of LLMs in 2025'. Maybe we'll get that for 2026!
 - lowsong49 days ago
 > They cite cherry picked announcements showing that LLM usage makes development slower or worse. They opened ChatGPT a couple times a few months ago, asked some questions, and then went “Aha! I knew it was bad!” when they encountered their first bad output instead of trying to work with the LLM to iterate like everyone who gets value out of them."Ah-hah you stopped when this tool blew your whole leg off. If you'd stuck with it like the rest of us you could learn to only take off a few toes every now and again, but I'm confident that in time it will hardly ever do that."
 - Aurornis49 days ago
 > "Ah-hah you stopped when this tool blew your whole leg off.Yes, because everyone who uses LLMs simply writes a prompt and then lets it write all of the code for them without thinkng! Vibecoding amirite!?
 ThrowawayR249 days ago
 To be fair, that does seem be a very common usage pattern for them, to the point where they're even becoming a nuisance to open source projects; e.g. <a href="https://news.ycombinator.com/item?id=45330378">https://news.ycombinator.com/item?id=45330378</a>
 - mjr0049 days ago
 > No, that doesn’t make you a skeptic in this context.That's good to hear, but I have been called an AI skeptic a lot on hn, so not everyone agrees with you!I agree though, there's a certain class of "AI denialism" which pretends that LLMs don't do anything useful, which in almost-2026 is pretty hard to argue.
 - emp1734449 days ago
 On the other hand, ever since LLMs came on the scene, there’s been a vocal group claiming that AI will become intelligent and rapidly bring about human extinction - think the r/singularity crowd. This seems just as untenable a position to hold at this point. It’s becoming clear that these things are simply tools. Useful in many cases, but that’s it.
 Aurornis49 days ago
 The AI doomers have actually been around long before LLMs. Discussion about AI doom has been popular in the rationalist communities for a very long time. Look up “Roko’s Basilisk” for a history of one of these concepts from 15 years ago that has been pervasive since then.It has been entertaining to see how Yudkowsky and the rationalist community spent over a decade building around these AI doom arguments, then they squandered their moment in the spotlight by making crazy demands about halting all AI development and bombing data centers.
 heliumtera49 days ago
 Lots of money to be made and power to be grabed on this safety and alignment moat.
 squidbeak49 days ago
 > This seems just as untenable a position to hold at this pointTo say that any prediction about the future shape of a technology is 'untenable' is pretty silly. Unless you've popped back in a time machine to post this.
 - Aurornis49 days ago
 > That's good to hear, but I have been called an AI skeptic a lot on hn, so not everyone agrees with you!The context was the article quoted, not HN comments.I’ve been called all sorts of things on HN and been accused of everything from being a bot to a corporate shill here. You can find people applying labels and throwing around accusations in every thread here. It doesn’t mean much after a while.
 - heavyset_go49 days ago
 You can acknowledge both the fad phenomenon and the usefulness of LLMs at the same time, because both are true.There's value there, but there's also a lot of hype that will pass, just like the AGI nonsense that companies were promising their current/next model will reach.
 - SecretDreams49 days ago
 I think LLM skeptics come in a variety of styles. I am skeptical of the current amount of capital flowing into AI vs expected returns on that capital. I use AI regularly. Often for free. I'm not clear what the path to profitability looks like a d justification of valuations. That's why it has dotcom vibes, not because I don't believe in the technology. I don't believe in the snakes pedalling the valuations.
 - threethirtytwo49 days ago
 You're not a skeptic but you're not fully a supporter either. You live in this grey zone of contradictions.First you find them useful but not intelligent. That is a bit of a contradiction. Basically anyone who has used AI, seriously knows that while it can be used to regurgitate generic filler and bootstrap code it can also be used to solve complex domain specific problems that is not at all part of its training data. This by definition makes it intelligent and it makes it so we know the LLM understands the problem it was given. it would be This by definition makes it intelligent, and it makes it so we know the LLM understands the problem it was given. It would be disingenuous for me not to mention how wrong and how much an LLM hallucinates, so obviously the thing has flaws and is not super intelligence. But you have to judge the entire spectrum of what it does. It gets things right and it gets things wrong and getting something complex right makes it intelligent while getting something wrong does not predude it from intelligence.Second most non skeptics aren't saying all human work is going to be obsolete. no one can predict the future. But you've got to be blind if you don't see the trendline of progress. Literally look at the progress of AI for the past 15 years. You have to be next level delusional if you can't project another 15 years and see that obviously a super intelligence or at least an intelligence comparable to humans is not a reasonable prediction. Most skeptics like you ignore the trendline and cling to what Yann lecunn said about llms being stochastic parrots. It is very likely something with human intelligence exists in the future and in our lifetimes, whether or not its an LLM remains to be seen but we can't ignore where the trendlines are pointing.
- 20k49 days ago
 Its also significantly lowered because management is forcing AI on everyone at gunpoint, and saying that you'll lose your job if you don't love AIThat's a very easy way to get everyone to pinky promise that they absolutely love AI to the ends of the earth
- Aurornis49 days ago
 > The change occurred because LLMs are useful for programming in 2025But the skeptics and anti-AI commenters are almost as active as ever, even as we enter 2026.The debate about the usefulness of LLMs has grown into almost another culture war topic. I still see a constant stream of anti-AI comments on HN and every other social platform from people who believe the tools are useless, the output is always unusable, people who mock any idea that operator skill has an impact on LLM output, or even claims that LLMs are a fad that will go away.I’m a light LLM user ($20/month plan type of usage) but even when I try to share comments about how I use LLMs or tips I’ve discovered, I get responses full of vitriol and accusations of being a shill.
 - zahlman49 days ago
 It absolutely is culture war. I can easily imagine a less critical version of myself having ended up in that camp. It comes across to me that the perspective is informed by core values and principles surrounding what "intelligence" is.I butted heads with many earlier on, and they did nothing to challenge that frame meaningfully. What did change is my perception of the set of tasks that don't require "intelligence". And the intuition pump for that is pretty easy to start — I didn't suppose that Deep Blue heralded a dawn of true "AI", either, but chess (and now Go) programs have only gotten even more embarrassingly stronger. Even if researchers and puzzle enthusiasts might still find positions that are easier for a human to grok than a computer.
 - Hendrikto49 days ago
 > from people who believe the tools are useless, the output is always unusable, people who mock any idea that operator skill has an impact on LLM outputYou are attacking a strawman. Almost nobody claims that LLMs are useless or you can never use their output.
 - Aurornis49 days ago
 Those claims are all throughout this thread and in replies to my comments.It’s not a strawman. It’s everywhere on HN.
 - Hendrikto49 days ago
 Such as? Currently, the top comments are> LLMs have certainly become extremely useful for Software Engineers> LLMs are useful for programming in 2025> Do LLMs make bad code: yes all the time (at the moment zero clue about good architecture). Are they still useful: yes, extremely so.If your comment is not a strawman, show me where people actually claim what you say they do.
 - otabdeveloper448 days ago
 "Useful for programming" is a massive and dishonest bait and switch.Lots of things are "useful for programming". Switching to a comfier chair is more useful for programming than any LLM.We were sold vibe coding, and that's what managers want.
 - throw123543548 days ago
 Its simple. Given the trajectory of these things people feel under threat and defend themselves accordingly. They say what they hope for given a number of factors (bad workplaces generating slop they have to deal with, job losses, identity redefinition, etc). You know the things that happen when a profession is disrupted in a capitalist system where 'what you do' is often tied up with identity, status, and livelihood.People will go from skeptic to dread/anxiety, to either acceptance or despair. We are witnessing the disruption of a profession in real time and it will create a number of negative effects.
- nl49 days ago
 There is some limited truth in this but we still see claims that LLMs are "just next token predictors" and "just regurgitate code they read online". These are just uninformed and wrong views. It's fair to say that these people were (are!) wrong.
 - mjr0049 days ago
 > we still see claims that LLMs are "just next token predictors" and "just regurgitate code they read online". These are just uninformed and wrong views. It's fair to say that these people were (are!) wrong.I don't think it's fair to say that at all. How are LLMs not statistical models that predict tokens? It's a big oversimplification but it doesn't seem wrong, the same way that "computers are electricity running through circuits" isn't a wrong statement. And in both cases, those statements are orthogonal to how useful they are.
 - Libidinalecon48 days ago
 It is just a tell that the person believes LLMs are more than what they are ontologically.No one says "computers are JUST electricity running through circuits" because no one tries to argue the computer itself is "thinking" or has some kind of being. No one tries to argue that when you put the computer to sleep it is actually doing a form of "sleeping".The mighty token though produces all kinds of confused nonsense.
 - jcelerier49 days ago
 > How are LLMs not statistical models that predict tokens?there's LLMs as in "the blob of coefficients and graph operations that runs on a gpu whenever there's an inference" which is absolutely "a statistical model that predict tokens" and LLMs as in "the online apps that iterates and have access to an entire automated linux environment that can run $LANGUAGE scripts and do web queries when an intermediary statistical output contains too much maybes and use the result to drive further inference.".
 - nl49 days ago
 > I don't think it's fair to say that at all. How are LLMs not statistical models that predict tokens? It's a big oversimplification but it doesn't seem wrongModern LLMs are trained via reinforcement learning where the training objective is no longer maximum next token probability.They still produce tokens sequentially (ignoring diffusion models for now) but since the objective is so different thinking of them as next token predictors is more wrong than right.Instead one has to think of them as trying to fit their entire output to the model learnt in the reinforcement phase. That's how reasoning in LLMs works so well.
 - threethirtytwo49 days ago
 It's wrong because it’s deliberately used to mischaracterize the current abilities of AI. Technically it's not wrong but the context of usage in basically every case is that the person saying it is deliberately trying to use the concept to downplay AI as just a pattern matching machine.
 - yladiz49 days ago
 I'm a bit confused. You say it's wrong, but then later say it's not wrong, and just because it can be used to downplay advancements in AI doesn't mean that it's wrong and saying it's wrong because it can be used that way is a bit disingenuous.
 threethirtytwo49 days ago
 [flagged]
 jibal49 days ago
 You don't understand the meaning of "technically". Also, don't use inflammatory language.
 threethirtytwo49 days ago
 I am not using inflammatory language to hurt anyone. I am illustrating a point on the contrast between technical meaning and non-technical meanings. One meaning is offensive the other meaning is technically correct. Don't start a witch hunt by deliberately misinterpreting what I'm saying.So technical means something like this: in a technical sense you are a stochastic parrot. You are also technically an object. But in everyday language we don't call people stochastic parrots or objects because language is nuanced and the technical meaning is rarely used at face value and other meanings are used in place of the technical one.So when people use a term in conversation and go by the technical meaning it's usually either very strange or done deliberately to deceive. Sort of like how you claim you don't know what "technically" means and sort of how you deliberately misinterpreted my words as "inflammatory" when I did nothing of the sort.I hope you learned something basic about the English today! Good day to you sir!
 mrwrong48 days ago
 > a technical sense you are a stochastic parrot.I am not. I'm sorry you feel this way about yourself. you are more than a next token predictor
 threethirtytwo48 days ago
 If I am more than a next token predictor… doesn’t that mean I’m a next token predictor + more? Do you not predict the next word you’re going to say? Of course you do, you do that and more.Humans ARE next token predictors technically and we are also more than that. That is why calling someone a next token predictor is a mischaracterization. I think we are in agreement you just didn’t fully understand my point.But the claim for LLMs are next token predictors is the SAME mischaracterization. LLMs are clearly more than next token predictors. Don’t get me wrong LLMs aren’t human… but they are clearly more than just a next token predictor.The whole point of my post is to point out how the term stochastic parrot is weaponized to dismiss LLMs and mischaracterize and hide the current abilities of AI. The parent OP was using the technical definition as an excuse to use the word as a means to achieve his own ends namely be “against” AI. It’s a pathetic excuse I think it’s clear the LLM has moved beyond a stochastic parrot and there’s just a few stragglers left who can’t see that AI is more than that.You can be “against” AI, that’s fine but don’t mischaracterize it… argue and make your points honestly and in good faith. Using the term stochastic parrot and even what the other poster did in attempt to accuse me of inflammatory behavior is just tactics and manipulation.
 mrwrong44 days ago
 > But the claim for LLMs are next token predictors is the SAME mischaracterization. LLMs are clearly more than next token predictors. Don’t get me wrong LLMs aren’t human… but they are clearly more than just a next token predictor.it's simply not. I find this argument by analogy very lazy. you need to do the work to show what that "and more" is and how it's the same for humans and LLMs. you can't just hand wave that it feels the same and leave it at that
 jibal48 days ago
 P.S. The response is filled with bad faith accusations.
 threethirtytwo48 days ago
 Look at your response. You first dismissed me completely by saying I don’t know what technically means. Then you mischaracterization my statement as an intent to inflame. These are highly insulting and dismissive statements.You’re not willing to have good faith discussion. You took the worst possible interpretation of my statement and crafted a terse response to shut me down. I only did two things. First I explained myself… then I called you out for what you did while remaining civil. I don’t skirt around HN rules as a means to an end, which is what I believe you’re doing? I’m ok with what you’re doing… but I will call it out.
 jibal48 days ago
 No surprise that the dishonesty and playing the victim is persistent. It's a fact that this person misuses the term "technically", and that they used inflammatory language. Saying so does not dismiss them completely ... but even if it did, so what? Doing so is not bad faith. No one has any obligation to engage with someone. I won't comment further.
 threethirtytwo47 days ago
 Indeed, don’t comment further: you didn’t even have the respect to respond to me directly. That is categorically deliberately inflammatory. Just respond to the guy you’re talking to like 99% of HN. Why avoid it? It’s a tactic, that’s why, and also pointless.I’m not a victim of anything. But you are definitely a perpetrator and instigator.
 - zahlman49 days ago
 Objecting to these claims is missing their point. Saying these things is really about denying that the LLMs "think" in any meaningful sense. (And the retorts I've seen in those discussions often imply very depressing and self-deprecating views of what it actually means to be human.)
 - emp1734449 days ago
 Leave it to HN to be militantly misanthropic to sell chatbots.
- mvkel49 days ago
 One only has to go read the original vibe coding thread[0] from ...ten months ago(!) to see the resistance and skepticism loud and clear. The very first comment couldn't be more loud about it.It was possible to create things in gpt-3.5. The difference now is it aligns with the -taste- of discerning programmers, which has a little, but not everything, to do with technological capability.[0]<a href="https://news.ycombinator.com/item?id=42913909">https://news.ycombinator.com/item?id=42913909</a>
 - HarHarVeryFunny49 days ago
 "Look Ma, no hands!" vibe coding, as described by Karpathy, where you never look at the code being generated, was never a good idea, and still isn't. Some people are now misusing "vibe coding" to describe any use of LLMs for coding, but there is a world of difference between using LLMs in an intelligent considered way as part of the software development process, and taking a hit on the bong and "vibe coding" another "how many calories in this plate of food" app.
 - mvkel49 days ago
 Karpathy himself has used "vibe coding" to describe "usage of LLMs for coding," so it's fair to say the definition has expanded.<a href="https://karpathy.bearblog.dev/year-in-review-2025/" rel="nofollow">https://karpathy.bearblog.dev/year-in-review-2025/</a>
 - girvo49 days ago
 Which frankly makes it pretty useless. Describing how I use them at work as "vibe coding" in the same vein as a random redditor generating whatever on Replit is useless. It's a definition so wide as to have no explanatory power.
 - zahlman49 days ago
 > The difference now is it aligns with the -taste- of discerning programmersThis... doesn't match the field reports I've seen here, nor what I've seen from poking around the repos for AI-powered Show HN submissions.
 - mvkel49 days ago
 On the tabs vs spaces battleground there are no winners; we just need to lower our expectations :)
- ookblah49 days ago
 you just need to hop into any AI reltaed thread (even this one) and it's pretty clear no one is revising anything, skepticism is there lol.
- HarHarVeryFunny49 days ago
 Yes, it's a strange take. It's not that programmers have changed their mind about unchanging LLMs, but rather that LLMs have changed and are now useful for coding, not just CoPilot autocomplete like the early ones.What changed was the use of RLVR training for programming, resulting in "reasoning" models that are now attempting to optimize for a long-horizon goal (i.e. bias generation towards "reasoning steps" that during training let to a verified reward), as opposed to earlier LLMs where RL was limited to RLHF.So, yeah, the programmers who characterized early pre-RLVR coding models as of limited use were correct. Now the models are trained differently and developers find them much more useful.
 - zahlman49 days ago
 I thought I'd read a lot of these threads this year, and also discussed off-site the use of coding agents and the technology behind them; but this is genuinely the first time I've seen the term "RLVR".
 - HarHarVeryFunny49 days ago
 RLVR "reinforcement learning for verifiable rewards" refers to RL used to encourage reasoning towards achieving long-horizon goals in areas such as math and programming, where the correctness/desirability of a generated response (or perhaps an individual reasoning step) can be verified in some way. For example generated code can be verified by compiling and running it, or math results verified by comparing to known correct results.The difficulty of using RL more generally to promote reasoning is that in the general case it's hard to define correctness and therefore quantify a reward for the RL training to use.
 - somewhereoutth49 days ago
 > generated code can be verified by compiling and running itI think this gets to the crux of the issue with LLMs for coding (and indeed 'test orientated development'). For anything beyond a most basic level of complexity (i.e. anything actually useful), code cannot be verified by compiling and running it. It can only be verified - to a point - by skilled human inspection/comprehension. That is the essence of code really, a definition of action, given by humans, to a machine for running with /a prior/ unenumerated inputs. Otherwise it is just a fancy lookup table. By definition then not all inputs and expected outputs can be tabulated, tested for, or rewarded for.
 HarHarVeryFunny49 days ago
 I was talking about the RL training process for giving these models coding ability in the first place.As far as using the trained model to generate code, then of course it's up to the developer to do code reviews, testing, etc as normal, although of course an LLM can be used to assist writing test cases etc as well.
 - zahlman49 days ago
 > The difficulty of using RL more generally to promote reasoning is that in the general case it's hard to define correctness and therefore quantify a reward for the RL training to use.Ah, hence the "HF" angle.
 HarHarVeryFunny49 days ago
 RLHF really has a different goal - it's not about rewarding/encouraging reasoning, but rather rewarding outputs that match human preferences for whatever reason (responses that are more on-point, or politer, or longer form, etc, etc).The way RLHF works is that a smallish amount of feedback data of A/B preferences from actual humans is used to train a preference model, and this preference model is then used to generate RL rewards for the actual RLHF training.RLHF has been around for a while and is what tamed base models like GPT 3 into GPT 3.5 that was used for the initial ChatGPT, making it behave in more of an acceptable way!RLVR is much more recent, the basis of the models that do great at math and programming. If you talk about reasoning models being RL trained then it's normally going to imply RLVR, but it seems there's a recent trend of people calling it RLVR to be more explcit.
 - throw123543548 days ago
 Agree with this. The RLVR changes (starting with o1 I think) was what changed/disrupted the industry. Before that I thought these things were just better autocomplete.
dhpe49 days ago
I have programmed 30K+ hours. Do LLMs make bad code: yes all the time (at the moment zero clue about good architecture). Are they still useful: yes, extremely so. The secret sauce is that you'd know exactly what to do without them.
- dejv49 days ago
 "Do LLMs make bad code: yes all the time (at the moment zero clue about good architecture). Are they still useful: yes, extremely so."Well, lets see how all the economics will play out. LLMs might be really useful, but as far as I can see all the AI companies are not making money on inference alone. We might be hitting plateau in capabilities with money being raised on vision of being this godlike tech that will change the world completely. Sooner or later the costs will have to meet the reality.
 - Aurornis49 days ago
 > but as far as I can see all the AI companies are not making money on inference aloneThe numbers aren’t public, but from what companies have indicated it seems inference itself would be profitable if you could exclude all of the R&D and training costs.But this debate about startups losing money happens endlessly with every new startup cycle. Everyone forgets that losing money is an expected operating mode for a high growth startup. The models and hardware continue to improve. There is so much investment money accelerating this process that we have plenty of runway to continue improving before companies have to switch to full profit focus mode.But even if we ignore that fact and assume they had to switch to profit mode tomorrow, LLM plans are currently so cheap that even a doubling or tripling isn’t going to be a problem. So what if the monthly plans start at $40 instead of $20 and the high usage plans go from $200 to $400 or even $600? The people using these for their jobs paying $10K or more per month can absorb that.That’s not going to happen, though. If all model progress stopped right now the companies would still be capturing cheaper compute as data center buildouts were completed and next generation compute hardware was released.I see these predictions as the current equivalent of all of the predictions that Uber was going to collapse when the VC money ran out. Instead, Uber quietly settled into steady operation, prices went up a little bit, and people still use Uber a lot. Uber did this without the constant hardware and model improvements that LLM companies benefit from.
 - mtone49 days ago
 > if you could exclude all of the R&D and training costsLLMs have a short shelf-life. They don't know anything past the day they're trained. It's possible to feed or fine-tune them a bit of updated data but its world knowledge and views are firmly stuck in the past. It's not just news - they'll also trip up on new syntax introduced in the latest version of a programming language.They could save on R&D but I expect training costs will be recurring regardless of advancements in capability.
 - Workaccount249 days ago
 If the tech plateaus today, LLM plans will go to $60-80/mo, Chinese-hosted chinese models will be banned (national security will be the given reason), and the AI companies will be making ungodly money.I'm not gonna dig out the math again, but if AI usage follows the popularity path of cell phone usage (which seems to be the case), then trillions invested has a ROI of 5-7 years. Not bad at all.
 - blks49 days ago
 Develops will be paying, other people that use it for emails or bun baking recipies - won’t.
 - iLoveOncall49 days ago
 OpenAI would still lose money if the basic subscriptions were costing $500 and they had the same amount of subscribers as right now. There's not a single model shop who's ever making any money, let alone ungodly amounts.
 - Workaccount249 days ago
 These costs you are referencing are training/R&D costs. Take those largely away, and you are left with inference costs, which are dirt cheap.Now you have a world of people who have become accustomed to using AI for tons of different things, and the enshittification starts ramping up, and you find out how much people are willing to pay for their ChatGPT therapist.
 barrell48 days ago
 x
 - Der_Einzige49 days ago
 This is literally lies and total bullshit. They’d be making insane profits at those prices.They don’t have to spend all their cash at once on the 30GW of data centers commitments.Why go on the internet and tell stupid lies?
 - ImprobableTruth49 days ago
 They're not making money on inference alone because they blow ungodly amounts on R&D. Otherwise it'd be a very profitable business.
 - daveguy49 days ago
 Private equity will swoop in, bankrupt the company to shirk the debt of training / R&D, and hold on to the models in a restructuring. +Enshittification to squeeze maximum profit. This is why they're referred to as vulture capitalists.
 - mNovak49 days ago
 Doesn't OpenRouter prove that inference is profitable? Why would random third parties subsidize the service for other random people online? Unless you're saying that only large frontier models are unprofitable, which I still don't think is the case but is harder to prove.
 - 20k49 days ago
 This is one of the reasons why I'm surprised to see so many people jump on board. We're clearly in the "release product for free/cheap to gain customers" portion of the enshittification plan, before the company starts making it completely garbage to extract as much money as possible from the userbaseHaving good quality dev tools is non negotiable, and I have a feeling that a lot of people are going to find out the hard way that reliability and it not being owned by profit seeking company is the #1 thing you want in your environment
 - NitpickLawyer49 days ago
 > but as far as I can see all the AI companies are not making money on inference alone.This was the missed point on why GPT5 was such an important launch (quality of models and vibes aside). It brought the model sizes (and hence inference cost) to more sustainable numbers. Compared to previous SotA (GPT4 at launch, or o1/3 series), GPT5 is 8x-12x cheaper! I feel that a lot of people never re-calibrated their views on inference.And there's also another place where you can verify your take on inference - the 3rd party providers that offer "open" models. They have 0 incentive to subsidise prices, because people that use them often don't even know who serves them, so there's 0 brand recognition (say when using models via openrouter).These 3rd party providers have all converged towards a price-point per billion param models. And you can check those prices, and have an idea on what would be proffitable and at what sizes. Models like dsv3.2 are really really cheap to serve, for what they provide (at least gpt5-mini equivalent I'd say).So yes, labs could totally become profitable with inference alone. But they don't want that, because there's an argument to be made that the best will "keep it all". I hope, for our sake as consumers that it isn't the case. And so far this year it seems that it's not the case. We've had all 4 big labs one-up eachother several times, and they're keeping eachother honest. And that's good for us. We get frontier level offerings at 10-25$/MTok (Opus, gpt5.2, gemini3pro, grok4), and we get highly capable yet extremely cheap models at 1.5-3$/MTok (gemini3-flash, gpt-minis, grok-fast, etc)
 - nl49 days ago
 Anthropic - for one - is making lots of money on inference.
- ManuelKiessling49 days ago
 If I ask a SOTA model to just implement some functionality, it doesn’t necessarily do so using a great architectural approach.Whenever I ask a SOTA model about architecture recommendations, and frame the problem correctly, I get top notch answers every single time.LLMs are terrific software architects. And that’s not surprising, there has to be tons of great advice on how to correctly build software in the training corpus.They simply aren’t great software architects by default.
 - Loic49 days ago
 You know that if you ask the LLM correctly you get top notch answers, because you have the experience to judge if the answer is top notch or not.I spend a couple of hours per week teaching software architecture to a junior in my team, because he has not the experience to not only ask correctly but also assess the quality of the answer from the LLM.
- qsort49 days ago
 One of the mental frameworks that convinced me is how much of a "free action" it is. Have the LLM (or the agent) churn on some problem and do something else. Come back and review the result. If you had to put significant effort into each query, I agree it wouldn't be worth it, but you can just type something into the textbox and wait.
 - daveguy49 days ago
 Are you counting the time/effort to evaluate the accuracy and relevance of an LLM left to "think" for a while?
- _rpxpx49 days ago
 OK, maybe. But how many programmers will know this in 10 years' time as use of LLMs is normalized? I like to hear what employers are saying already about recent graduates.
 - bartread49 days ago
 They’d have to be hiring recent graduates for you to hear that perspective.And, as much as what I’ve just said is hyperbolically pessimistic, there is some truth to it.In the UK a bunch of factors have coincided to put the brakes on hiring, especially smaller and mid-size businesses. AI is the obvious one that gets all the press (although how much it’s really to blame is open to question in my view), but the recent rise in employer AI contribution, and now (anecdotally) the employee rights bill have come together to make companies quite gunshy when it comes to hiring.
 - bartread49 days ago
 *Employer NI contribution, not employer AI contribution - a pox be upon autocorrect
 - energy12349 days ago
 I'm uncertain that programming will be a major profession in 10 years.Programming is more like math than creative writing. It's largely verifiable, which is where RL is repeatedly proven to eventually achieve significantly better than human intelligence.Our saving grace, for now, is that it's not entirely verifiable because things like architectural taste are hard to put into a test. But I would not bet against it.
 - spaceman_202049 days ago
 This is nothing new - entire industries and skills died out as the apprenticeship system and guilds were replaced by automation and factories
 - nutjob249 days ago
 If they don't learn that they won't get very far.This is true for everything, any tool you might use. Competent users of tools understand how they work and thus their limitations and how they're best put to work.Incompetents just fumble around and sometimes get things working.
 - QuiDortDine49 days ago
 hahah what are you talking about, there's no such thing as long term!
- bilsbie49 days ago
 I mean if you leaned heavily on stack overflow before AI then nothing really changes.It’s basically the same idea but faster.
- feverzsj49 days ago
 So, it's like taking off your pants to fart.
- yeasku47 days ago
 I have programed 10 times that.For me LLMs are a waste of time.
 - rrrrrrrrrrrryan46 days ago
 300k hours = 8 hrs per day, every day, for 102 years.
 - yeasku46 days ago
 Behind the computer 16 hours a day all their life?I know a lot of people who does it.
crystal_revenge49 days ago
I wish people would be more vocal in calling out that LLMs have unquestionably failed to deliver on the 2022-2023 promises of exponential improvement at the foundation model level. Yes they have improved, and there is more tooling around them, but clearly the difference between LLMs in 2025 and 2023 is not as large as 2023 and 2021. If there was truly exponential progress, there would be no possibility of debating this. Which makes comments like this:> The fundamental challenge in AI for the next 20 years is avoiding extinction.Seem to be almost absurd without further, concrete justification.LLMs are still quite useful, I'm glad they exist and honestly am still surprised more people don't use them in software. Last year I was very optimistic that LLMs would entirely change how we write software by making use of them as a fundamental part of our programming tool kit (in a similar way that ML fundamentally changed the options available to programmers for solving problems). Instead we've just come up with more expensive ways to extend the chat metaphor (the current generation of "agents" is disappointingly far from the original intent of agents in AI/CS).The thing I am increasingly confused about is why so many people continue to need LLMs to be more than they obviously are. I get why crypto boosters exist, if I have 100 BTC, I have a very clear interest getting others to believe that they are valuable. But with "AI", I don't quite get, for the non-VS/founder, why it matters that people start foaming out the mouth over AI rather than just using it for the things it's good at.Though I have some growing sense that this need is related to another trend I've personally started with witness: AI psychosis is very real. I personally know an increasing number of people who are spiraling into an LLM induced hallucinated world. The most shocking was someone talking about how losing human relationships is inevitable because most people can't keep up with those enhanced by AI acceleration. On the softer end I know more and more people who quietly confess how much they let AI work as a perpetual therapist, guiding their every decision (which is more than most people would let a human therapist guide there directions).
- redlock48 days ago
 “But clearly the difference between LLMs in 2025 and 2023 is not as large as between 2023 and 2021.”This is a ridiculous statement. A simple example of the huge difference is context size.GPT-4 was, what, 8K? Now we’re in the millions with good retention. And this is just context size, let alone reasoning, multimodality, etc.
 - Anamon48 days ago
 I don't think that refutes the point. I'd readily agree with the parent that in terms of actual usefulness and efficiency gains, we're on a trajectory of diminishing returns.
 - spider-mario46 days ago
 The point made by the parent seems to be pretty much the opposite of that. They conceded more tooling but questioned the improvements “at the foundational model level”.
 - emp1734448 days ago
 Gemini’s 2M context window is kind of a gimmick and not useable in practice.
 - redlock46 days ago
 Not true anymore since Gemini 2.5 proI have quizzed it with three books (total more than 1500 pages) and it gave great answers.Initially yes when they released 2 million context with Gemini 1.5 it wasn’t effective.Try it with Gemini 3 pro/flash now.
- spopejoy47 days ago
 My conspiracy theory du jour is that AGI doomerism is a product of supremacist thinking. AGI futures are purely speculative right? So why are they always doom and gloom?Why can't an AGI be inherently classless, unconcerned with profit or scarcity, and inherently "arc-ing toward justice"?Because that isn't good news for nerds who think they rightly sit at the top of a meritocracy. An evil AGI is one that confirms tech is the ultimate unconquerable power that only the tech elite can even hope to master.
mrdependable49 days ago
These comments are a bit scary. It feels like LLMs managed to exploit some fault in the human psyche. I think the biggest danger of this technology is that people are not mentally equipped to handle it.
- akomtu49 days ago
  The fault is well known: chatbots are bootlickers. They always praise users and never criticize them, so chatbots are quickly promoted to the personal advisor position. The AI of Sauron of technological age.
  - moab49 days ago
    This is a very real worry for the AI rollout for the general population. But are folks here using AI to blow smoke up their asses as a sibling comment stated? I'd like to believe we're using it to ask questions, prototype, and then measure... not just blow smoke up there...
- jennyholzer249 days ago
  ChatGPT and Claude Code are industrial strength fans designed to blow smoke up your ass at rates once thought impossible
  - unbelievably49 days ago
    [dead]
danielfalbo49 days ago
> There are certain tasks, like improving a given program for speed, for instance, where in theory the model can continue to make progress with a very clear reward signal for a very long time.This makes me think: I wonder if Goodhart's law[1] may apply here. I wonder if, for instance, optimizing for speed may produce code that is faster but harder to understand and extend. Should we care or would it be ok for AI to produce code that passes all tests and is faster? Would the AI become good at creating explanations for humans as a side effect?And if Goodhard's law doesn't apply, why is it? Is it because we're only doing RLVR fine-tuning on the last layers of the network so all the generality of the pre-training is not lost? And if this is the case, could this be a limitation in not being able to be creative enough to come up with move 37?[1] <a href="https://wikipedia.org/wiki/Goodhart's_law" rel="nofollow">https://wikipedia.org/wiki/Goodhart's_law</a>
- lemming49 days ago
 I wonder if, for instance, optimizing for speed may produce code that is faster but harder to understand and extend.This is generally true for code optimised by humans, at least for the sort of mechanical low level optimisations that LLMs are likely to be good at, as opposed to more conceptual optimisations like using better algorithms. So I suspect the same will be true for LLM-optimised code too.
- username22349 days ago
 > I wonder if, for instance, optimizing for speed may produce code that is faster but harder to understand and extend.Superoptimizers have been around since 1987: <a href="https://en.wikipedia.org/wiki/Superoptimization" rel="nofollow">https://en.wikipedia.org/wiki/Superoptimization</a>They generate fast code that is not meant to be understood or extended.
 - progval49 days ago
 But there output is (usually) executable code, and is not committed in a VCS. So the source code is still readable.When people use LLMs to improve their code, they commit their output to Git to be used as source code.
 - Wowfunhappy49 days ago
 ...hmm, at some point we'll need to find a new place to draw the boundaries, won't we?Until ~2022 there was a clear line between human-generated code and computer-generated code. The former was generally optimized for readability and the latter was optimized for speed at all cost.Now we have computer-generated code in the human layer and it's not obvious what it should be optimized for.
 - erichocean49 days ago
 > it's not obvious what it should be optimized forIt should be optimized for readability by AI. If a human wants to know what a given bit of code does, they can just ask.
- franktankbank49 days ago
 Ehh I think if it ends up being a half good architecture you wind up with a difficult to understand kernel that never needs touching.
seu49 days ago
> And I've vibe coded entire ephemeral apps just to find a single bug because why not - code is suddenly free, ephemeral, malleable, discardable after single use. Vibe coding will terraform software and alter job descriptions.I'm not super up-to-date on all that's happening in AI-land, but in this quote I can find something that most techno-enthusiast seem to have decided to ignore: no, code is not free. There are immense resources (energy, water, materials) that go into these data centers in order to produce this "free" code. And the material consequences are terribly damaging to thousands of people. With the further construction of data centers to feed this free video coding style, we're further destroying parts of the world. Well done, AGI loverboys.
- Hendrikto49 days ago
 You know what uses roughly 80 times more water in the US alone than water used by AI data centers world wide? Corn.
 - raddan49 days ago
 Assuming your fact is true, that corn merely uses an order of magnitude or two more water than AI is surprising, given the utility of corn. It feeds the entire US (hundreds of millions of people), is used as animal feed (thus also feeding us), and is widely exported to feed other people. I the spirit of the “I think”s and “I believe”s of this blog post, I think that corn has a lot more utility than AI.
 - Hendrikto49 days ago
 > It feeds the entire US (hundreds of millions of people), is used as animal feed (thus also feeding us), and is widely exported to feed other people.Not really. Most corn grown in the US isn’t even fit for consumption. It is primarily used for fermenting bioethanol.
 - daveguy49 days ago
 Source?
 Hendrikto49 days ago
 <a href="https://www.ers.usda.gov/topics/crops/corn-and-other-feed-grains/feed-grains-sector-at-a-glance" rel="nofollow">https://www.ers.usda.gov/topics/crops/corn-and-other-feed-gr...</a>
 raddan47 days ago
 The report you link says that 45% is used for ethanol. A lot, but not “most.”
 Hendrikto45 days ago
 I could have worded that more clearly: The vast majority (90%+) of corn grown in the US is not for human consumption, with most of that 90% being used for bioethanol.
- dwaltrip49 days ago
 Can you provide numbers relative to things many of us already do?- drive to the store or to work- take a shower- eat meat- fly on vacationAnd so on... thanks!
 - Jaxan49 days ago
 Of those things you mention, I only take showers (but not even everyday). But maybe I’m an outlier.
 - daveguy49 days ago
 > drive to the store or to workIf you don't do that, and are a homesteader, then yes. You are a very small minority outlier. (Assuming you aren't ordering supplies delivered instead of driving to the store.> Eat meat.Yes, not eating meat is in the minority.> Fly on vacation.So, don't vacation, walk to vacation, or drive to vacation? 1/3 are also consumptive.It seems you are either a very significant outlier, or you're being daft. I'm curious which. Would you mind clarifying?
 - Jaxan47 days ago
 I do my commute with my bicycle. Very common in the Netherlands.For holidays, we did a cycling holiday with our children. They loved it!I don’t at all feel like an outlier, many friends do similar things.
 daveguy47 days ago
 Thank you for the clarification! I'm impressed. I've always been envious of the bicycle friendliness of roads in the Netherlands. Here in the US stroads and poor bike infrastructure still reigns. It really depends on the region.We have a backward orange fool running things for gems like this: <a href="https://news.ycombinator.com/item?id=46357881">https://news.ycombinator.com/item?id=46357881</a>But it's just as much local political issues as national around here.
 - monkaiju49 days ago
 Whats the point of this question? These are things (some) people do while existing as humans, not sure how thats relevant? The AI is consuming vast resources while note existing as a human, it doesnt get some innate privilege to consume some amount of resources like we do.
 - dwaltrip45 days ago
 AI is a tool I use while existing as a human. I’d like to know the relative downsides.
- fourside49 days ago
 My guess is that “free” is meant in terms of the old definition where you’re not having to pay someone to create and maintain it. But yes, it’s important to realize there really is a cost here and one that can’t just be captured by a dollar amount.
mwkaufma49 days ago
A list of unverifiable claims, stated authoritatively. The lady doth protest too much.
- linhns49 days ago
  The post is about his opinions.
  - jennyholzer249 days ago
    reads more like propaganda.
torlok49 days ago
This is a bunch of "I believe" and "I think" with no sources by a random internet person.
- ctoth49 days ago
 Ah, I see you have discovered blogs! They're a cool form of writing from like ~20 years ago which are still pretty great. Good thing they show up on this website, it'd be rather dull with only newspapers and journal articles doncha think?
- matthewmacleod49 days ago
 That is what a blog post is. Someone documenting what they think about a topic.It's not the case that every form of writing has to be an academic research paper. Sometimes people just think things, and say them – and they may be wrong, or they may be right. And they sometime have some ideas that might change how you think about an issue as a result.
- ajoseps49 days ago
 he’s not a “random internet person”, he created Redis. Despite that, I don’t know how authoritative of a figure he is with respect to AI research. He’s definitely a prolific programmer though.
 - XorNot49 days ago
 There are plenty of Nobel laureates who well, do rest on their laurels and dive deep into pseudoscience after that.Accomplishment in one field does not make one an expert, nor even particularly worth listening to, in any other. Certainly it doesn't remove the burden of proof or necessity to make an actual argument based on more then simply insisting something is true.
 - timmytokyo49 days ago
 Not sure why you're being downvoted. It's such a common phenomenon that it has its own name: Nobelitis.[0] <a href="https://en.wikipedia.org/wiki/Nobel_disease" rel="nofollow">https://en.wikipedia.org/wiki/Nobel_disease</a>
 - 2snakes49 days ago
 Careful with the scientism. The job of science is to explain the nature of reality, but we can only describe what we experience.
 - megous49 days ago
 That still qualifies as a random internet person, wrt the topic. And I think the emphasis is on no sources and I beliefs and I thinks, in any case :)
 - nurettin49 days ago
 To be fair, you may find equally capable random people in this thread, doesn't mean they speak with any kind of authority.
- desbo49 days ago
 Yeah, it’s called “Reflections”.
- jacquesm49 days ago
 Indeed, and, what do you 'believe' or 'think' in response?
- dgellow49 days ago
 It's the personal blog of a famous internet person
- dist-epoch49 days ago
 What is a "source"? Isn't it just "another random internet person"?
- echelon49 days ago
 > by a random internet person.The creator of Redis.
 - cinntaile49 days ago
 Sure but quite a few claims in the article are about AI research. He does not have any qualifications there. If the focus was more on usefulness, that would be a different discussion and then his experience does add weight.
 - djdishsv49 days ago
 > smart, intelligent person gives opinion> woah buddy this persons opinion isn’t worth anything more than a random homeless person off the street. they’re not an expert in this fieldIs there a term for this kind of pedantry? Obviously we can put more weight behind the words a person says if they’ve proven themselves trustworthy in prior areas - and we should! We want all people to speak and let the best idea win. If we fallback to only expert opinions are allowed that’s asking to get exploited. And it’s also important to know if antirez feels comfortable spouting nonsense.This is like a basic cornerstone of a functioning society. Though, I realize this “no man is innately better than another, evaluate on merit” is mostly a western concept which might be some of my confusion.
 - cinntaile48 days ago
 Evaluate on merit indeed and that is not what is happening. The parent I replied to used an authoritative argument that is not based on (relevant) merit.
 - blibble49 days ago
 > Obviously we can put more weight behind the words a person says if they’ve proven themselves trustworthy in prior areas - and we should!no, you shouldn'tthis is how you end up with crap like vaccine denialism going mainstream"but he's a doctor!"
 echelon49 days ago
 Credentialism isn't a fix for the problem you've outlined. If anything, over-reliance on credentials bolsters and lends credence to crazy claims. The media hyper-fixates on it and amplifies it.We've got Avi Loeb on mainstream podcasts and TV spouting baseless alien nonsense. He's a preeminent in his field, after all.Focus on what you understand. If you don't understand, learn more.
 - nutjob249 days ago
 Don't see how that gives him more credibility wrt AI.His entirely unsupported statements about AGI are pretty useless, for instance.So many people assume AGI is possible, yet no one has a concrete path to it or even a concrete definition of what it or what form it might take.
erichocean49 days ago
> 1. NOT have any representation about the meaning of the prompt.This one is bizarre, if true (I'm not convinced it is).The entire purpose of the attention mechanism in the transformer architecture is to build this representation, in many layers (conceptually: in many layers of abstraction).> 2. NOT have any representation about what they were going to say.The only place for this to go is in the model weights. More parameters means "more places to remember things", so clearly that's at least a representation.Again: who was pushing this belief? Presumably not researchers, these are fundamental properties of the transformer architecture. To the best of my knowledge, they are not disputed.> I believe [...] it is not impossible they get us to AGI even without fundamentally new paradigms appearing.Same, at least for the OpenAI AGI definition: "An AI system that is at least as intelligent as a normal human, and is able to do any economically valuable work."
- zahlman49 days ago
 > This one is bizarre, if true (I'm not convinced it is).> The entire purpose of the attention mechanism in the transformer architecture is to build this representation, in many layers (conceptually: in many layers of abstraction).I think this is really about a hidden (i.e. not readily communicated) difference in what the word "meaning" means to different people.
 - erichocean49 days ago
 Could be, by "meaning" I mean (heh) that transformers are able to distinguish tokens (and prompts) in a consequential ("causal") way, and that they do so at various levels of detail ("abstractions").I think that's the usual understanding of how transformer architectures work, at the level of math.
jimmydoe49 days ago
> * The fundamental challenge in AI for the next 20 years is avoiding extinction.sorry, I say it's folding the laundry. with an aging population, that's the most, if not only, useful thing.
abricq49 days ago
> * Programmers resistance to AI assisted programming has lowered considerably. Even if LLMs make mistakes, the ability of LLMs to deliver useful code and hints improved to the point most skeptics started to use LLMs anyway: now the return on the investment is acceptable for many more folks.Could not agree more. I myself started 2025 being very skeptical, and finished it very convinced about the usefulness of LLMs for programming. I have also seen multiple colleagues and friends go through the same change of appreciation.I noticed that for certain task, our productivity can be multiplied by 2 to 4. So hence comes my doubts: are we going to be too many developers / software engineers ? What will happen for the rests of us ?I assume that other fields (other than software-related) should also benefits from the same productivity boosts. I wonder if our society is ready to accept that people should work less. I think the more likely continuation is that companies will either hire less, or fire more, instead of accepting to pay the same for less hours of human-work.
- danielfalbo49 days ago
 > Are we going to be too many developers / software engineers ? What will happen for the rests of us?I propose that we should raise the bar for the quality of software now.
 - throw123543548 days ago
 I don't think that will happen because it hasn't for other technological improvements. In the end people pay for "good enough" and that's that. If "good enough" is now cheaper to implement that's all they will do. I've seen it in other technologies. As an example due to more precise manufacturing many manufacturers have used it to cheapen things like cars, electronics, etc just to the point where it passes warranty mostly; in the old days they had to "overbuild" to get it to that point putting more quality into the product.Quality is a risk mitigation strategy; if software is disposable just like cheap manufactured goods most people won't pay for it thinking they can just "build another one". What we don't realise is due to sheer cost of building software we've wanted quality because its too expensive to fix later; AI could change that.Hoping we invest in quality, more software (which has a price inelastic curve mostly due to scale/high ROI) etc I'm starting to think is just false hope from people in the tech industry that want to be optimistic which generally is in our nature. Tech people understand very little about economics most of the time and how people outside tech (your customers) generally operate. My reflection is mostly I need to pivot out of software; it will be commoditized.
 - abricq49 days ago
 Yes, certainly agree. A few days ago here there was this blog claiming how formal verification would become widely more used with AI. The author claiming that AI will help us with the difficulty barrier to write formal proofs.
- throw123543548 days ago
 I'm not sure that it will scale to other fields other than coding and math. The approach with RLVR makes it more amenable to STEM fields in general and most jobs believe it or not aren't that. The level of open source software with good test suites effectively gave them all the training material they needed; most professions won't provide that knowing that they will be giving their moat away. LLM's to other fields from my understanding still exhibit the same hallucination rates if only mildly improved especially if there isn't public internet material in that field.We have to accept in the end that coding/SWE is one of the most disrupted fields from this breed of AI. Disruption unfortunately probably means less jobs overall. The profession is on trend to disrupting and automating itself I think; plan accordingly. I've seen so many articles claiming its great we didn't learn to code now; that's what the AI's have done.
- antihipocrat49 days ago
 I like to think of it as adding new lanes to a highway. More will be delivered until it all jams up again.
roughly49 days ago
> A few well known AI scientists believe that what happened with Transformers can happen again, and better, following different paths, and started to create teams, companies to investigate alternatives to Transformers and models with explicit symbolic representations or world models.I’m actually curious about this and would love pointers to the folks working in this area. My impression from working with LLMs is there’s definitely a “there” there with regards to intelligence - I find the work showing symbolic representation in the structure of the networks compelling - but the overall behavior of the model seems to lack a certain je ne sais quoi that makes me dubious that they can “cross the divide,” as it were. I’d love to hear from more people that, well, sais quoi, or at least have theories.
pton_xd49 days ago
> For years, despite functional evidence and scientific hints accumulating, certain AI researchers continued to claim LLMs were stochastic parrots: probabilistic machines that would: 1. NOT have any representation about the meaning of the prompt. 2. NOT have any representation about what they were going to say. In 2025 finally almost everybody stopped saying so.It's interesting that Terrence Tao just released his own blog post stating that they're best viewed as stochastic generators. True he's not an AI researcher, but it does sound like he's using AI frequently with some success."viewing the current generation of such tools primarily as a stochastic generator of sometimes clever - and often useful - thoughts and outputs may be a more productive perspective when trying to use them to solve difficult problems" [0].[0] <a href="https://mathstodon.xyz/@tao/115722360006034040" rel="nofollow">https://mathstodon.xyz/@tao/115722360006034040</a>
- jdub49 days ago
 I get the impression that folks who have a strong negative reaction to the phrase "stochastic parrot" tend to do so because they interpret it literally or analogously (revealed in their arguments against it), when it is most useful as a metaphor.(And, in some cases, a desire to deny the people and perspectives from which the phrase originated.)
- antirez49 days ago
 What happened recently is that all the serious AI researches that were in the stochastic parrot side changed point of view but, incredibly, people without a deep understanding on such matters, previously exposed to such arguments, are lagging behind and still repeat arguments that the people who popularized them would not repeat again.Today there is no top AI scientist that will tell you LLMs are just stochastic parrots.
 - emp1734449 days ago
 You seem to think the debate is settled, but that’s far from true. It’s oddly controlling to attempt to discredit any opposition to this viewpoint. There’s plenty of research supporting the stochastic view of these models, such as Apple’s “Illusion” papers. Tao is also a highly respected researcher, and has worked with these models at a very high level - his viewpoint has merit as well.
 - visarga49 days ago
 The stochastic parrot framing makes some assumptions, one of them being that LLMs generate from minimal input prompts, like "tell me about Transformers" or "draw a cute dog". But when input provides substantial entropy or novelty, the output will not look like any training data. And longer sessions with multiple rounds of messages also deviate OOD. The model is doing work outside its training distribution.It's like saying pianos are not creative because they don't make music. Well, yes, you have to play the keys to hear the music, and transformers are no exception. You need to put in your unique magic input to get something new and useful.
 - geraneum49 days ago
 Now that you’re here, what do you mean by “scientific hints” in your first paragraph?
lowsong49 days ago
I'm impressed that such a short post can be so categorically incorrect.> For years, despite functional evidence and scientific hints accumulating, certain AI researchers continued to claim LLMs were stochastic parrots> In 2025 finally almost everybody stopped saying so.There is still no evidence that LLMs are anything beyond "stochastic parrots". There is no proof of any "understanding". This is seeing faces in clouds.> I believe improvements to RL applied to LLMs will be the next big thing in AI.With what proof or evidence? Gut feeling?> Programmers resistance to AI assisted programming has lowered considerably.Evidence is the opposite, most developers do not trust it. <a href="https://survey.stackoverflow.co/2025/ai#2-accuracy-of-ai-tools" rel="nofollow">https://survey.stackoverflow.co/2025/ai#2-accuracy-of-ai-too...</a>> It is likely that AGI can be reached independently with many radically different architectures.There continues to be no evidence beyond "hope" that AGI is even possible, yet alone that Transformer models are the path there.> The fundamental challenge in AI for the next 20 years is avoiding extinction.Again, nothing more than a gut feeling. Much like all the other AI hype posts this is nothing more than "well LLMs sure are impressive, people say they're not, but I think they're wrong and we will make a machine god any day now".
- crystal_revenge49 days ago
 Strongly agree with this comment. Decoder-only LLMs (the ones we use) are literally Markov Chains, the only (and major) difference is a radically more sophisticated state representation. Maybe "stochastic parrot" is overly dismissive sounding, but it's not a fundamentally wrong understanding of LLMs.The RL claims are also odd because, for starters, RLHF is not "reinforcement learning" based on any classical definition of RL (which almost always involve an online component). And further, you can chat with anyone who has kept up with the RL field, and quickly realize that this is also a technology that still hasn't quite delivered on the promises it's been making (despite being an incredibly interesting area of research). There's no reason to speculate that RL techniques will work with "agents" where they have failed to achieve wide spread success in similar domains.I continue to be confused why smart, very technical people can't just talk about LLMs honestly. I personally think we'd have much more progress if we could have conversations like "Wow! The performance of a Markov Chain with proper state representation is incredible, let's understand this better..." rather than "AI is reasoning intelligently!"I get why non-technical people get caught up in AI hype discussions, but for technical people that understand LLMs it seems counter productive. Even more surprising to me is that this hype has completely destroyed any serious discussions of the technology and how to use it. There's so much oppurtunity lost around practical uses of incorporating LLMs into software while people wait for agents to create mountains of slop.
 - akomtu49 days ago
 > why smart, very technical people can't just talk about LLMs honestlyBecause those smart people are usually low-rung employees while their bosses are often AI fanatics. Were they to express anti-AI views, they would be fired. Then this mentality slips into their thinking outside of work.
 - krackers49 days ago
 >Decoder-only LLMs (the ones we use) are literally Markov ChainsReal-world computers (the ones we use) are literally finite state machines
 - crystal_revenge49 days ago
 Only if the computer you use does not have memory. Definitionally if you are writing and reading from memory, you are not using an FSM.
 - krackers49 days ago
 No, it can still be modeled as a finite state machine. Each state just encodes the configuration of your memory. I.e. if you have 8 bits of memory, your state space just encodes 2^8 states for each memory configuration.Any real-world deterministic thing can be encoded as a FSM if you make your state space big enough, since it by definition there has only a finite number of states.
 crystal_revenge49 days ago
 You could model a specific instance of using your computer this way, but you could not capture the fact that you can execute arbitrary programs with your PC represented as an FSM.Your computer is strictly more computationally powerful than an FSM or PDA, even though you could represent particular states of your computer this way.The fact that you can model an arbitrary CFG as an regular language with limited recursion depth does not mean there’s no meaningful distinction between regular languages and CFG.
 krackers49 days ago
 > you can execute arbitrary programs with your PC represented as an FSMYou cannot execute arbitrary programs with your PC, your PC is limited in how much memory and storage it has access to.>Your computer is strictly more computationally powerfulThe abstract computer is, but _your_ computer is not.>model an arbitrary CFG as an regular language with limited recursion depth does not mean there’s no meaningful distinction between regular languages and CFGYes this I agree. But going back to your argument, claiming that LLMs with a fixed context-window are basically markov chains so they can't do anything useful is reductio ad absurdum in the exact same way as claiming that real-world computers are finite state machines.A more useful argument on the upper-bound of computational power would be along the lines of circuit complexity I think. But even this does not really matter. An LLM does not need to be turing complete even conceptually. When paired with tool-use, it suffices that the LLM can merely generate programs that are then fed into an interpreter. (And the grammar of turing-complete programming languages can be made simple enough, you can encode Brainfuck in a CFG). So even if an LLM could only ever produce programs with a CFG grammar, the combination of LLM + brainfuck executor would give turing completeness.Edit: There was this recent HN article along those lines. <a href="https://news.ycombinator.com/item?id=46267862">https://news.ycombinator.com/item?id=46267862</a>.
 crystal_revenge49 days ago
 > so they can't do anything usefulI never claimed that. They demonstrate just how powerful Markov chains can be with sophisticated state representations. Obviously LLMs are useful, I have never claimed otherwise.Additionally, it doesn’t require any logical leaps to understand decoder only LLMs as Markov Chains, they preserve the Markov Property and otherwise be have exactly like them. It’s worth noting that encoder-decoder LLMs do not preserve the Markov property and can not be considered Markov chains.Edit: I saw that post and at the time was disappointed by how confused the author was about those topics and how they apply to the subject.
piker49 days ago
> There are certain tasks, like improving a given program for speed, for instance, where in theory the model can continue to make progress with a very clear reward signal for a very long time.Super skeptical of this claim. Yes, if I have some toy poorly optimized python example or maybe a sorting algorithm in ASM, but this won’t work in any non-trivial case. My intuition is that the LLM will spin its wheels at a local minimum the performance of which is overdetermined by millions of black-box optimizations in the interpreter or compiler signal from which is not fed back to the LLM.
- NitpickLawyer49 days ago
 > but this won’t work in any non-trivial caseEarlier this year google shared that one of their projects (I think it was alphaevolve) found an optimisation in their stack that sped up their real world training runs by 1%. As we're talking about google here, we can be pretty sure it wasn't some trivial python trick that they missed. Anyhow, at ~100M$ / training run, that's a 1M$ save right there. Each and every time they run a training run!And in the past month google also shared another "agentic" workflow where they had gemini2.5-fhash! (their previous gen "small" model) work autonomously on migrating codebases to support aarch64 architecture. There they found ~30% of the projects worked flawlessly end-to-end. Whatever costs they save from switching to ARM will translate in real-world $ saved (at google scale, those can add up quickly).
 - piker49 days ago
 The second example has nothing to do with the first. I am optimistic that LLMs are great for translations with good testing frameworks.“Optimize” in a vacuum is a tarpit for an LLM agent today, in my view. The Google case is interesting but 1% while significant at Google scale doesn’t move the needle much in terms of statistical significance. It would be more interesting to see the exact operation and the speed up achieved relative to the prior version. But it’s data contrary to my view for sure. The cynic also notes that Google is in the LLM hype game now, too.
 - NitpickLawyer49 days ago
 Why do you think it's not relevant to the "optimise in a loop" thing? The way I think of it, it's using LLMs "in a loop" to move something from arch A (that costs x$) to arch B (that costs y$), where y is cheaper than x. It's still an autonomous optimisation done by LLMs, no?
 - piker49 days ago
 Did the LLM suggest moving to the new architecture? If not that’s not what’s under discussion. That’s just following an order to translate.
 NitpickLawyer49 days ago
 Ah, I see your point.
 - Jaxan49 days ago
 > As we're talking about google here, we can be pretty sure it wasn't some trivial python trick that they missed.Strong disagree on the reasoning here. Especially since google is big and have thousands of developers, there could be a lot of code and a lot of low hanging fruit.
 - NitpickLawyer49 days ago
 > By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini’s architecture by 23%, leading to a 1% reduction in Gemini's training time.The message I replied to said "if I have some toy poorly optimized python example". I think it's safe to say that matmul & kernel optimisation is a bit beyond a small python example.
- andy9949 days ago
 There was a discussion the other day where someone asked Claude to improve a code base 200x <a href="https://news.ycombinator.com/item?id=46197930">https://news.ycombinator.com/item?id=46197930</a>
 - exitb49 days ago
 That’s most definitely not the same thing, as „improving a codebase” is an open ended task with no reliable metrics the agent could work against.
- dist-epoch49 days ago
 <a href="https://github.com/algorithmicsuperintelligence/openevolve" rel="nofollow">https://github.com/algorithmicsuperintelligence/openevolve</a>
 - piker49 days ago
 <a href="https://chatgpt.com/backend-api/estuary/public_content/enc/eyJpZCI6Im1fNjk0Njg0MjkzNTEwODE5MWE2NzY5MmE4YWRjNTZiMTA6ZmlsZV8wMDAwMDAwMDljNzg3MWZkYTExODc2MDgxZDllYjAyOSIsInRzIjoiMjA0NDIiLCJwIjoicHlpIiwiY2lkIjoiMSIsInNpZyI6IjIxMDJlMDkzMGExNjNkYWY3OWI4ZTI4YmNhZDE5OThlNGFjYmQxNjQzNzQ2ODRiYmM3NDFlZmE1OGViMjQ5NzgiLCJ2IjoiMCIsImdpem1vX2lkIjpudWxsLCJjcyI6bnVsbCwiY3AiOm51bGwsIm1hIjpudWxsfQ==" rel="nofollow">https://chatgpt.com/backend-api/estuary/public_content/enc/e...</a>
a_bonobo49 days ago
>* For years, despite functional evidence and scientific hints accumulating, certain AI researchers continued to claim LLMs were stochastic parrots: probabilistic machines that would: 1. NOT have any representation about the meaning of the prompt. 2. NOT have any representation about what they were going to say. In 2025 finally almost everybody stopped saying so.Man, Antirez and I walk in very different circles! I still feel like LLMs fall over backwards once you give them an 'unusual' or 'rare' task that isn't likely to be presented in the training data.
- oersted49 days ago
 LLMs certainly struggle with tasks that require knowledge that is not provided to them (at significant enough volume/variance to retain it). But this is to be expected of any intelligent agent, it is certainly true of humans. It is not a good argument to support the claim that they are Chinese Rooms (unthinking imitators). Indeed, the whole point of the Chinese Room thought experiment was to consider if that distinction even mattered.When it comes to of being able to do novel tasks on known knowledge, they seem to be quite good. One also needs to consider that problem-solving patterns are also a kind of (meta-)knowledge that needs to be taught, either through imitation/memorisation (Supervised Learning) or through practice (Reinforcement Learning). They can be logically derived from other techniques to an extent, just like new knowledge can be derived from known knowledge in general, and again LLMs seem to be pretty decent at this, but only to a point. Regardless, all of this is definitely true of humans too.
 - feverzsj49 days ago
 In most cases, LLMs has the knowledge(data). They just can't generalize them like human do. They can only reflect explicit things that are already there.
 - oersted49 days ago
 I don't think that's true. Consider that the "reasoning" behaviour trained with Reinforcement Learning in the last generation of "thinking" LLMs is trained on quite narrow datasets of olympiad math / programming problems and various science exams, since exact unambiguous answers are needed to have a good reward signal, and you want to exercise it on problems that require non-trivial logical derivation or calculation. Then this reasoning behaviour gets generalised very effectively to a myriad of contexts the user asks about that have nothing to do with that training data. That's just one recent example.Generally, I use LLMs routinely on queries definitely no-one has written about. Are there similar texts out there that the LLM can put together and get the answer by analogy? Sure, to a degree, but at what point are we gonna start calling that intelligent? If that's not generalisation I'm not sure what is.To what degree can you claim as a human that you are not just imitating knowledge patterns or problem-solving patterns, abstract or concrete, that you (or your ancestors) have seen before? Either via general observation or through intentional trial-and-error. It may be a conscious or unconscious process, many such patterns get backed into what we call intuition.Are LLMs as good as humans at this? No, of course, sometimes they get close. But that's a question of degree, it's no argument to claim that they are somehow qualitatively lesser.
 - SCdF44 days ago
 Late to this, but my interpretation of the parent's point was eg: LLMs still often produce bad code, despite "reading" every book about programming ever written. Simplistically, they aren't taking the knowledge from those books, and applying them to the knowledge of the code they've scraped, they are just using the scraped output. You can then separately ask them about knowledge from those books, but then if you go back and get them to code again, they still won't follow the advice they just gave you.
- jmfldn49 days ago
 "In 2025 finally almost everybody stopped saying so."I haven't.
 - dist-epoch49 days ago
 Some people are slower to understand things.
 - yeasku47 days ago
 That is why they need artificial inteligence
 - jmfldn49 days ago
 Well exactly ;)
- barnabee49 days ago
 I don’t think this is quite true.I’ve seen them do fine on tasks that are clearly not in the training data, and it seems to me that they struggle when some particular type of task or solution or approach might be something they haven’t been exposed to, rather than the exact task.In the context of the paragraph you quoted, that’s an important distinction.It seems quite clear to me that they are getting at the meaning of the prompt and are able, at least somewhat, to generalise and connect aspects of their training to “plan” and output a meaningful response.This certainly doesn’t seem all that deep (at times frustratingly shallow) and I can see how at first glance it might look like everything was just regurgitated training data, but my repeated experience (especially over the last ~6-9 months) is that there’s something more than that happening, which feels like whet Antirez was getting at.
- Kiro49 days ago
 Give me an example of one of those rare or unusual tasks.
 - a_bonobo49 days ago
 I work on a few HPC systems with unusual, kinda custom-rolled architectures. A whole bunch of Python and R packages fail to compile on these systems. There's no publicly accessible documentation for these HPC systems, nor for these custom architectures. ChatGPT and Claude so far have given me only wrong advice on how to get around these compilation errors and there's not much on Google for these errors, but HPC staff usually knew what to do.
 - recursive49 days ago
 Set the font size of a simple field in openxml. Doesn't even seem that rare. It said to add a run inside and set the font there. Didn't do anything. I ended up reverse engineering the output out of ms word. This happened yesterday.
Joel_LeBlanc39 days ago
It's fascinating to see how AI is reshaping the landscape for digital assets—buying websites or e-commerce stores has become more accessible than ever. When evaluating potential investments, I always stress the importance of thorough due diligence; I've found that using tools like DREA (Digital Real Estate Analyzer) can really streamline the process and provide valuable insights. It's all about understanding the numbers and the potential for growth, especially in such a dynamic environment. What specific metrics are you focusing on?
fleebee49 days ago
> The fundamental challenge in AI for the next 20 years is avoiding extinction.That's a weird thing to end on. Surely it's worth more than one sentence if you're serious about it? As it stands, it feels a bit like the fearmongering Big Tech CEOs use to drive up the AI stocks.If AI is really that powerful and I should care about it, I'd rather hear about it without the scare tactics.
- dist-epoch49 days ago
 Yeah, well known marketing trick that Big Companies do.Oil companies: we are causing global warming with all this carbon emissions, are you scared yet? so buy our stockPharma companies: our drugs are unsafe, full of side effects, and kill a lot of people, are you scared yet? so buy our stockSoftware companies: our software is full of bugs, will corrupt your files and make you lose money, are you scared yet? so buy our stockClassic marketing tactics, very effective.
- Recursing49 days ago
 I think <a href="https://en.wikipedia.org/wiki/Existential_risk_from_artificial_intelligence#History" rel="nofollow">https://en.wikipedia.org/wiki/Existential_risk_from_artifici...</a> has much better arguments than the LessWrong sources in other comments, and they weren't written by Big Tech CEOs.Also "my product will kill you and everyone you care about" is not as great a marketing strategy as you seem to imply, and Big Tech CEOs are not talking about risks anymore. They currently say things like "we'll all be so rich that we won't need to work and we will have to find meaning without jobs"
- tejohnso49 days ago
 What makes it a scare tactic? There are other areas in which extinction is a serious concern and people don't behave as though it's all that scary or important. It's just a banal fact. And for all of the extinction threats, AI included, it's very easy to find plenty of deep dive commentary if you care.
- grodriguez10049 days ago
 I would say yes, everyone should care about it.There is plenty of material on the topic. See for example <a href="https://ai-2027.com/" rel="nofollow">https://ai-2027.com/</a> or <a href="https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities" rel="nofollow">https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a...</a>
 - emp1734449 days ago
 The fact that people here take AI 2027 seriously is embarrassing. The authors are already beginning to walk back these claims: <a href="https://x.com/eli_lifland/status/1992004724841906392?s=20" rel="nofollow">https://x.com/eli_lifland/status/1992004724841906392?s=20</a>
 - jowea49 days ago
 And I thought the rest of the thread was anxiety-inducing. Thanks for the nightmares lol.
 - dkdcio49 days ago
 fear mongering science fiction, you may as well cite Dune or Terminator
 - defrost49 days ago
 There's arguably more dread and quiet constrained horror in With Folded Hands ... (1947)<pre><code> Despite the humanoids' benign appearance and mission, Underhill soon realizes that, in the name of their Prime Directive, the mechanicals have essentially taken over every aspect of human life. No humans may engage in any behavior that might endanger them, and every human action is carefully scrutinized. Suicide is prohibited. Humans who resist the Prime Directive are taken away and lobotomized, so that they may live happily under the direction of the humanoids. </code></pre> ~ <a href="https://en.wikipedia.org/wiki/With_Folded_Hands_" rel="nofollow">https://en.wikipedia.org/wiki/With_Folded_Hands_</a>...
 - XorNot49 days ago
 This hardly disproves the point: no one is taking this topic seriously. They're just making up a hostile scenario from science fiction and declaring that's what'll happen.
 - lm2846949 days ago
 Lesswrong looks like a forum full of terminally online neckbeards who discovered philosophy 48 hours ago, you can dismiss most of what you read there don't worry
 - timmytokyo49 days ago
 If only they had discovered philosophy. Instead they NIH their own philosophy, falling into the same ditches real philosophers climbed out of centuries ago.
- VladimirGolovin49 days ago
 This has been well discussed before, for example in this book: <a href="https://ifanyonebuildsit.com/" rel="nofollow">https://ifanyonebuildsit.com/</a>
rckt49 days ago
> Even if LLMs make mistakes, the ability of LLMs to deliver useful code and hints improved to the point most skeptics started to use LLMs anywayHere we go again. Statements with the single source in the head of the speaker. And it’s also not true. The llms still produce bad/irrelevant code at such rate that you can spend more time prompting than doing things yourself.I’m tired of this overestimation of llms.
- xiconfjs49 days ago
 My person experience: if I can find a solution on stackoverflow etc. the LLM will produce working and fundamentally correct code. If I can‘t find a already fullfilled solution on these sites, the LLM is hallucinating like crazy (newer existing functions/modules/plugins, protocol features which aren’t specified and even github-repos which never existed). So, as stated my many people online before: for low-hanging fruits LLM are totally viable solution.
 - danielbln49 days ago
 I don't remember the last time Claude Code hallucinated some library, as it will check the packages, verify with the linter, run a test import and so on.Are you talking about punching something into some LLM web chat that's disconnected from your actual codebase and has tooling like web search disabled? If so, that's not really the state of the art of AI assisted coding, just so you know.
 - yeasku47 days ago
 6 months.
- barnabee49 days ago
 Even where they are not directly using LLMs to write the most critical or core code, nearly every skeptic I know has started using LLMs at very least to do things like write tests, build tools, write glue code, help to debug or refactor, etc.Your statement suffers not only from also coming only from your brain, with no evidence that you've actually tried to learn to use these tools, but it also goes against the weight of evidence that I see both in my professional network and online.
 - rckt49 days ago
 I just want people making statements like the author to be more specific how exactly the llms are being used. Otherwise they contribute to this belief that llms are a magical tool that can do anything.I am aware of simple routine tasks that LLMs can do. This doesn’t change anything about what I said.
 - danielbln49 days ago
 All you had to do is scroll down further and read the next couple of posts where the author is being more specific on how they used LLMs.I swear, the so called critics need everything spoon fed.
 - Kiro49 days ago
 Sorry, but we're way past that. It's you who need to provide examples of tasks it can't do.
 - AnimalMuppet49 days ago
 You need to meet more skeptics. (Or maybe I do.) In my world, it's much more rare than you say.
- iamflimflam149 days ago
 But you have just repeated what you are complaining about.
 - rckt49 days ago
 Do you want me to spend time to come with a quality response to a lazy statement? It’s like fighting with windmills. I’m fine with having my say the way I did.
- bgwalter49 days ago
 > Here we go again.Indeed, he said the same as a reflection on 2024 models:<a href="https://news.ycombinator.com/item?id=42561151">https://news.ycombinator.com/item?id=42561151</a>It is always the fault of the "luser" who is not using and paying for the latest model.
- locknitpicker49 days ago
 > Here we go again. Statements with the single source in the head of the speaker. And it’s also not true.You're making the same sort of baseless claim you are criticising the blogger for making. Spewing baseless claims hardly moves any discussion forward.> The llms still produce bad/irrelevant code at such rate that you can spend more time promoting than doing things yourself.If that is your personal experience then I regret to tell you that it is only the reflection of your own inability to work with LLMs and coding agents. Meanwhile, I personally manage to effectively use LLMs anywhere between small refactoring needs and large software architecture designs, including generating fully working MVPs in one-shot agent prompts. From this alone it's rather obvious who is making baseless statements that are more aligned with reality.
russfink49 days ago
Practical question: when getting the AI to teach you something, eg how attention can be focused in LLMs, how do you know it’s teaching you correct theory? Can I use a metric of internal consistency, repeatedly querying it and other models with a summary of my understanding? What do you all do?
- layer849 days ago
 > What do you all do?Google for non-AI sources. Ask several models to get a wider range of opinions. Apply one’s own reasoning capabilities where applicable. Remain skeptical in the absence of substantive evidence.Basically, do what you did before LLMs existed, and treat LLM output like you would have a random anonymous blog post you found.
 - akomtu49 days ago
 In that case, LLMs must be written off as very knowledgeable crackpots because of their tendency to make things up. That's how we would treat a scientist who's caught making things up.
- jennyholzer249 days ago
 [flagged]
ur-whale49 days ago
Not sure I understand the last sentence:> The fundamental challenge in AI for the next 20 years is avoiding extinction.
- danielfalbo49 days ago
 I think he's referring to AI safety.<a href="https://lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities" rel="nofollow">https://lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-lis...</a>
 - grodriguez10049 days ago
 For a perhaps easier to read intro to the topic, see <a href="https://ai-2027.com/" rel="nofollow">https://ai-2027.com/</a>
 - dkdcio49 days ago
 or read your favorite sci-fi novel, or watch Terminator. this is pure bs by a charlatan
- timmytokyo49 days ago
 It's a tell that he's been influenced by rationalist AI doomer gurus. And a good sign that the rest of his AI opinions should be dismissed.
- chrishare49 days ago
 He's referring to humanity, I believe
 - A_D_E_P_T49 days ago
 It's ambiguous. It could go the other way. He could be referring to that oldest of science fiction tropes: The Bulterian Jihad, the human revolt against thinking machines.
 - AnimalMuppet49 days ago
 Meh. I think the more likely scenario is the financial extinction of the AI companies.
ofirpress49 days ago
> There are certain tasks, like improving a given program for speed, for instance, where in theory the model can continue to make progress with a very clear reward signal for a very long time.Yup, this will absolutely be a big driver of gains in AI for coding in the near future. We actually built a benchmark based on this exact principle: <a href="https://algotune.io/" rel="nofollow">https://algotune.io/</a>
agumonkey49 days ago
There's videos about Diffusion LLMs too, apparently getting rid of the linear token generation. But I'm no ML engineer.
- nephanth49 days ago
 As someone who worked on transformer-based diffusion models before (not for language though), i can say one thing: they're hard.Denoising diffusion models benefited a lot from the u-net, which is a pretty simple network (compared to a transformer) and very well-adapted to the denoising task. Plus diffusion on images is great to research because it's very easy to visualize, and therefore to wrap your head aroundDoing diffusion on text is a great idea, but my intuition is it will prove more challenging, and probably take a while before we get something working
 - agumonkey49 days ago
 Thanks. Do you see that part of the field as plateauing or ramping up (even taking into account the difficulty).If you know labs / researchers on the topic, i'd love to read their page / papers
Aiisnotabubble49 days ago
What also happens and it's irrelevant of AGI: global RLAround the world people ask an LLM and get a response.Just grouping and analysing these questions and solving them once centrally and then making the solution available again is huge.Linearly solving the most asked questions and then the next one then the next will make, whatever system is behind it, smarter every day.
- danielfalbo49 days ago
 Exactly. The singularity is already here. It's just "programmers + AI" as a whole, rather than independent self-improvements of the AI.I wonder how a "programmers + AI" self-improving loop is different from an "AI only" one.
 - bryanrasmussen49 days ago
 The AI only one presumably has a much faster response time. The singularity is thus not here because programmer time is still the bottleneck, whereas as I understand in the singularity time is no longer a bottleneck component.
 - Aiisnotabubble49 days ago
 AGI will be faster as it doesn't need initial question.AGI will also be generic.LLM is already very impressive though
 - yeasku46 days ago
 You are all crazy.
phlummox48 days ago
> For years, despite functional evidence and scientific hints accumulating, certain AI researchers continued to claim LLMs were stochastic parrots: probabilistic machines that would: 1. NOT have any representation about the meaning of the prompt. 2. NOT have any representation about what they were going to say.But did any AI researchers actually claim there was no representation of meaning? I thought generally, the criticism of LLMs was that while they do abstract from their corpus - ie, you can regard them as having a representation of "meaning" - it's tightly and inextricably tied to the surface level representation, it isn't grounded in models of the external world, and LLMs have poor ability to transfer that knowledge to other surface encodings.I don't know who the "certain AI researchers" are supposed to be. But the "stochastic parrot" paper by Bender et al [1] says:> Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind.That's a very different objection to the one antirez describes - I think he's erecting a straw man. But I'd be happy to be corrected by anyone more familiar with the research.[1] <a href="https://dl.acm.org/doi/10.1145/3442188.3445922" rel="nofollow">https://dl.acm.org/doi/10.1145/3442188.3445922</a>
- antirez48 days ago
 > Text generated by an LM is not grounded in communicative intentThis means exactly that no representation should exist in the activation states about what the model wants to tell, and there must be only a single token probabilistic inference at play.Also their model requires the contrary, too: that the model does not know, semantically, what the query really means.Stochastic Parrot has a scientific meaning, and just only observing the function of the models, it is quite evident that they were very wrong, but now we have stong evidence (via probing) that also the sentence you quoted is not correct, since the model knows the idea to express also in general terms, and features about things it is going to say much later activates a lot of tokens earlier, including conceptual features that are relevant later in the sentence / concept expressed.You are doing the big error that is common to do in this context of extending the stochastic parrot to a non scientifically isolated model that can be made large enough to accomodate any evidence arriving from new generations of models. The stochastic parrot does not understand the query nor is trying to reply to you in any way, it just exploits a probabilistic link among the context window and the next word. This link can be more complex than a Markov chain but must be of the same kind: lacking understanding whatsoever and communication intent (no representation of the concept / sentences that are required to reply correctly). How it is possible to believe in this, today? And, check yourself what the top AI scientists today believe about the correctness of the stochastic parrot hypothesis.
 - phlummox48 days ago
 > > Text generated by an LM is not grounded in communicative intent> This means exactly that no representation should exist in the activation states about what the model wants to tell, and there must be only a single token probabilistic inference at play.That's not correct. It's clear from the surrounding paragraphs what Bender et al mean by this phrase. They mean that LLMs lack the capacity to form intentions.> You are doing the big error that is common to do in this context of extending the stochastic parrot to a non scientifically isolated model that can be made large enough to accomodate any evidence arriving from new generations of models.No, I'm not. I haven't, in fact, made any claims about the "stochastic parrot". Rather, I've asked whether your characterisation of AI researchers' views is accurate, and suggested some reasons why it may not be.
register49 days ago
Where to understand more about how chain of thoughs really affects LLMs performance? I read the seminal paper but all it says is that it's basically another prompt engineering tecnique that improves accuracy.
- HarHarVeryFunny49 days ago
 Chain of thought, now including "reasoning", are basically a work around for the simplistic nature of the Transformer neural network architecture that all LLMs are based on.The two main limitations of the Transformer that it helps with are:1) A Transformer is just a fixed-size stack of layers, with a one-way flow of data through the layers from input to output. The fixed number of layers equates to how many "thought" steps the LLM can put into generating each word of output, but good responses to harder questions may require many more steps and iterative thinking...The idea of "think step by step", aka chain of thought, is to have the model break it's response down into a sequence of steps, each building on what came before, so that the scope of each step is withing the capability of the fixed number of layers of the transformer.2) A Transformer has extremely limited internal memory from one generated word to the next, so telling the model to go one step at a time, feeding its own output back in as input, in effect makes the model's output a kind of memory that makes up for this.So, chain of thought prompting ultimately give the model more thinking steps (more words generated), together with memory of what it is thinking, in order to be able to generate a better response.
Fraterkes49 days ago
It’s interesting that half the comments here are talking about the extinction line when, now that we’re nearly entering 2026, I feel the 2027 predictions have been shown to be pretty wrong so far.
- squidbeak49 days ago
 > I feel the 2027 predictions have been shown to be pretty wrong so farDoes your clairvoyance go any further than 2027?
 - AnimalMuppet49 days ago
 I don't know that it's "clairvoyance". We're two weeks from 2026. We might be able to see somewhat more than we do now if this was going to turn into AGI by 2027.If you assume that we're only one breakthrough away (or zero breakthroughs - just need to train harder), then the step could happen any time. If we're more than one away, though, then where are they? Are they all going to happen in the next two years?But everybody's guessing. We don't know right now whether AGI is possible at current hardware levels. If it is N breakthroughs away, we all have our own guesses of approximately what N is.My guess is that we are more than one breakthrough away. Therefore, one can look at the current state of affairs and say that we are unlikely to get to AGI by 2027.
 - jennyholzer249 days ago
 > Does your clairvoyance go any further than 2027?why are you so sensitive?
alexgotoi49 days ago
> * The fundamental challenge in AI for the next 20 years is avoiding extinction.This reminded me of the Don’t look up movie where they basically gambled with the humans extinction.
gaigalas49 days ago
This post is a bait for enthusiasts. I like it.> Chain of thought is now a fundamental way to improve LLM output.That kinda proves _that LLMs back then were pretty much stochastic parrots indeed_, and the skeptics were right at the time. Today, enthusiasts agree with what they previously said: without CoT, the AI feels underwhelming, repetitive and dumb and it's obvious that something more was needed.Just search past discussions about it, people were saying the problem would be solved with "larger models" (just repeating marketing stuff) and were oblivious to the possibility of other kinds of innovations.> The fundamental challenge in AI for the next 20 years is avoiding extinction.That is a low level sick burn on whoever believes AI will be economically viable short-term. And I have to agree.
lolz40449 days ago
This article does little to support its claims but was a good primer to dive into some topics.They are cool new tools use them where you can but there is a ton of research still left to do. Just lols at the hubris silicon valley will make something so smart it extincts humankind. It'll happen from the lack of water and heated planet first :)The stocastic parrot argument is still debated but more nuanced than before. Although the original author still stands by the statement. Evidence of internal planning per model. Anthropic Attribution Graphs Research with some rhyming did support it but gemma didn't.The idea of "understanding" is still up for debate as well. Sure, when models are directly trained on data there is representation. Othello-GPT Studies was one way to support but that was during training so some interal representation was created. Out of distribution task will collapse to confabulation. Apple's GSM-Symbolic Research seems to support that.Chain of thought is a helpful tool but is untrustworthy at best. Anthropic themselves have showed this <a href="https://www.anthropic.com/research/reasoning-models-dont-say-think" rel="nofollow">https://www.anthropic.com/research/reasoning-models-dont-say...</a>
bgwalter49 days ago
Regarding the stochastic parrots:It is easy to see that LLMs exclusively parrot by asking them about current political topics [1], because they cannot plagiarize settled history from Wikipedia and Britannica.But of course there also is the equivalence between LLMs and Markov chains. As far as I can see, it does not rely on absurd equivalences like encoding all possible output states in an infinite Markov chain:<a href="https://arxiv.org/abs/2410.02724" rel="nofollow">https://arxiv.org/abs/2410.02724</a>Then there is stochastic parrot research:<a href="https://arxiv.org/abs/2502.08946" rel="nofollow">https://arxiv.org/abs/2502.08946</a>"The stochastic parrot phenomenon is present in LLMs, as they fail on our grid task but can describe and recognize the same concepts well in natural language."As said above, this is obvious to anyone who has interacted with LLMs. Most researchers know what is expected of them if they want to get funding and will not research the obvious too deeply.[1] They have Internet access of course.
rldjbpin48 days ago
the reflections felt like a mixed bag between someone who seems to know about the technical aspects deeper than an average person, while simultaneously being like an astrologist.personally, as someone building on top of gen AI for a living, i finally bit the bullet on building using LLMs. it did reduce friction in things i don't like doing and did not explore as much. by acting as a catalyst when i needed to finally address them, it helped me get going and eventually become proficient in the core tech itself.outside of work, however, i find people around me use the services much more than i do. sometimes it felt like the "big data is like teenage sex"[1], but some aspects were quite genuine. got better appreciation after trying them to better understand other people's perspective and to design better.with "slop" as word of the year and people wondering if a random clip is AI, now more than ever the effects in general life seems apparent. it is not as sexy as "i will lose my job soon", but the effects are here and now. while the next year will be even more interesting, i can't wait for the bubble to burst.[1] <a href="https://hewlett.org/is-big-data-like-teenage-sex/" rel="nofollow">https://hewlett.org/is-big-data-like-teenage-sex/</a>
bgwalter49 days ago
They are very advanced stochastic parrots that allow AI invested authors to suddenly write in perfect English.If Antirez has never gotten an LLM to perform an absolutely embarrassing mistake, he must be very lucky or we should stop listening to him.Programmers' resistance has not weakened. Since the ORCL drop of 40% anti-LLM opinions are censored and downvoted here. Many people have given up, and we always get articles from the same LLM influencers.
AdamWills46 days ago
[dead]
HellDunkel49 days ago
[flagged]
- danielbln49 days ago
  Must feel nice to let yourself be coddled by in-group/out-group thinking like that. "I've decided that AI is bad and useless, therefore anyone disagreeing must be an AI bro".
ctoth49 days ago
> The fundamental challenge in AI for the next 20 years is avoiding extinction.So nice to see people who think about this seriously converge on this. Yes. Creating something smarter than you was always going to be a sketchy prospect.All of the folks insisting it just couldn't happen or ... well, there have just been so many objections. The goalposts have walked from one side of the field to the other, and then left the stadium, went on a trip to Europe, got lost in a beautiful little village in Norway, and decided to move there.All this time though, the prospect of instantiating a something smarter than you (and yes, it will be smarter than you even if it's at human level because of electronic speeds...) This whole idea is just cursed and we should not do the thing.
- Mawr49 days ago
 > Creating something smarter than you was always going to be a sketchy prospect.Sure, but not so sure that this has any relevance to the topic at hand. You seem to be taking the assumption that LLMs can ever reach that level for granted.It may be possible that all it takes is scaling up and at some point some threshold gets reached past which intelligence emerges. Maybe.Personally, I'm more on board with the idea that since LLMs display approximately 0 intelligence right now, no amount of scaling will help and we need a fundamentally different approach if we want to create AGI.
- cheschire49 days ago
 "Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should."
feverzsj49 days ago
Seems they also want some AI money[0]. Guess, I'll keep using Valkey.[0] <a href="https://redis.io/redis-for-ai/" rel="nofollow">https://redis.io/redis-for-ai/</a>
- danielfalbo49 days ago
 > theyI'm not sure antirez is involved in any business decision making process at Redis Ltd.He may not be part of "they".
 - antirez49 days ago
 I'm not involved in business decisions and while I'm very AI positive I believe Redis as a company should focus on Redis fundamentals: so my piece has zero alignment on what I hope for the company.
- sibellavia49 days ago
 In any case, what would be the problem? The page you mentioned simply illustrates how the product can be used in a specific domain; it doesn't seem forced to me.
- bgwalter49 days ago
 Conflict of interest and disclosure posts are frequently downvoted.
 - tptacek49 days ago
 You mean flagged.Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.<a href="https://news.ycombinator.com/newsguidelines.html">https://news.ycombinator.com/newsguidelines.html</a>
 - bgwalter49 days ago
 Ah, so you just went through my history and downvoted everything in sight! Thanks for confirming.
 - tptacek49 days ago
 I don't follow? I didn't flag you; you were remarking on a previous comment alleging shillage from 'antirez, and I'm pointing out that the behavior you say is "downvoted" is actually a black-letter guideline violation. People flag those posts.Another one, though:Please don't comment about the voting on comments. It never does any good, and it makes boring reading.
 bgwalter49 days ago
 I can't help you if you repeatedly misinterpret me. Once you made the first response in this subthread, 4 or 5 of my comments went from 1 to 0 or -1. Cum hoc ergo propter hoc? Maybe.I'll design a system for the senate that enables outside voters to first turn down the microphone's volume of a speaker if he says that another senator works for company X and then removes him from the floor. That'll be a great success for democracy and "intellectual curiosity", which is also in the guidelines.