As much as I like Claude Code, Boris has done a lot of harm by encouraging software engineering practices that lead to slopware. We have two camps of people at work, the first camp are the agent goes brrr. They don't understand the code they write. They have loops running, agent orchestrators or agent hype du jour. The second camp is people who are inundated with PRs, are holding the line on quality, and just exhausted. We've also had some management pressures where they think people are wasting time looking at code. Perhaps because some podcast they might be listening to, somebody says coding is largely solved.<p>> I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.<p>This is going to be a net negative on software quality for people who take this up, in my opinion.<p>I call out Boris but I also don't think he's being malicious. He's at the center of an important technological revolution and it would be hard not to get excited. I just wished he advocated for a more balanced and a realistic perspective.
This sums up the dynamic: <a href="https://x.com/danhockenmaier/status/2021617680525172840" rel="nofollow">https://x.com/danhockenmaier/status/2021617680525172840</a>
Right, which basically means that judgment compounds. AI just makes the slope steeper in both directions.
I wonder how much of the turbo brain developers time is spent on reviewing the code of slop cannons and how frustrated they must feel.<p>Previously, everyone was limited in their blast radius but now AI has levelled the playing field where even a single slop cannon be the bottleneck of PR reviews or even the company itself if not right now, then later, as the codebase turns sloppified which will have its ramifications later on.<p>Everyone might suggest with the diagram to be the turbo brains but I believe that we also need to reduce the metrics that slop cannon is achieving (surprisingly LOC which is a pretty bad indicator) and to be more reasonable throughout in realizing such bottlenecks and to follow good security practices throughout.
Reading through the thread, it's striking how many people are feeling the same mix of excitement and exhaustion you describe. I'm in that camp too... the tools are incredible, but the pace and expectations around them can feel overwhelming.<p>I fully agree with what you say regarding Boris, but I would emphasize that I don't think he has malicious intention either. He still is doing his job, to showcase the features their product offers.
I am coming to the realization that this technology is something like the moving assembly line (though probably not as impactful as that), where it will allow the bosses to lower quality a bit while greatly speeding up the overall process. Instead of software objects being a low-volume, bespoke product (like automobiles before Ford), they will be a little worse in the specifics but there will be rivers of them available basically on-demand to whoever wants them (like automobiles after Ford). Just as Fordism drastically reshaped the nature of production and therefore the social contract, "Claudeism" will drastically reshape the political economy of at least the software industry.<p>I don't have a good sense of whether any of this holds outside software companies (it certainly seems less likely there), and I also don't know what the actual contours of Claudeism will look like. I'll leave that to actual social scientists, or perhaps future historians, but we should remember how Ford's seemingly droll improvement to assembling cars reverberated through history. I feel this more than most because I live in rust belt Michigan, but even elsewhere you are likely still feeling the effects: If nothing else, Detroit built the weapons to win two world wars, but also had major impacts on the design of our cities, on the labor movement, the social status of women and minorities, the nature of the healthcare system, the course of the Cold War and the history of finance -- the list is nearly endless.<p>AI hasn't won any wars yet, but even "drastically reshapes the nature of production in the software industry" is a big deal, at least for people working in software.
> We have two camps of people at work<p>I don't like the framing of this, but I suppose it's inevitable given how every other political issue has been framed to date.<p>There are many camps in between the two being described. There are those who use maybe $20 worth of unsubsidized tokens per month to accelerate their work. Full-time human in the loop, every LOC still reviewed by hand. Also, the developer who still does copy-paste in and out of ChatGPT. Those tribes exist too and are way more typical than these extremes.
Yes, I am exhausted. Most of my company is obsessed with agents, because everyone wants to be seen as AI first. There is little thought going into usage. No care for long term maintainability and quality. Our product is actively worse by many metrics, but no one cares because we marketing can say “agents”.<p>The sad part is that this technology is incredible. It’s us choosing to turn it into a slop cannon (and the labs sure seem to encourage this).<p>I want to leave the industry as soon as I can.
Same here, I think the tool is fantastic, it has helped me in lots of way professionally and for hobby projects involving skills I'm not experienced with but I'm absolutely exhausted.<p>The worst part is that I'm not exhausted from my work, I've been through a heavy burnout before and had to adapt my ways of working to not fall into that kind of exhaustion again. This is different, it's not a pressure to deliver anything specifically, no stupid deadlines, it's so much more vague than any other kind of work pressure I've been through in the past 20+ years that it makes it much harder to find ways to keep my head straight.<p>It's exhausting to be reviewing boatloads of PRs of very varied quality. I have relegated myself to block merges at the first sign of a bad/weird design instead of trying to understand why it might have needed a compromise in the design, many times people couldn't even explain to me why something was written the way it was... After many "it was Claude that decided this way" it became so tiring that I don't even ask anymore, I just leave a comment on the first blocker I find and reject the PR. If people are not putting any effort to review it before throwing it at me I will just return the favour.
This too shall pass.<p>The journey was as important as the destination if not more because it gave confidence in oneself, it made us grow. I mean to suggest, coding for the sake of coding<p>"I wish to code for myself, nothing else": (I had written this somewhere else on HN): <a href="https://news.ycombinator.com/item?id=48609962">https://news.ycombinator.com/item?id=48609962</a><p>If I may ask you and others, I assume that you are at a relatively decent position within your company and have somewhat say in it. Why aren't employees more honest about AI and concerns regarding it? I'd imagine that people must chalk it up as saying that there is push from investor side into using AI and management is forcing it but couldn't more effort be also redirected towards the fact that agents are still finnicky sometimes and that the sustainability of projects moving forward is going to be a major issue to investors given that they want sustainable growth.<p>It is my opinion that there's more hidden power in engineers that even we fail to realize and instead some of us like you are wanting to leave the industry, what a sad tail of tragic events, when the tech behind AI is good and could be helpful but oh holy, we are butchering it up so badly in some/so many regards perhaps even under the influence of AI companies and their messaging surrounding it.
> If I may ask you and others, I assume that you are at a relatively decent position within your company and have somewhat say in it. Why aren't employees more honest about AI and concerns regarding it?<p>We've tried it where I work at. People put effort to compile studies, to collect testimonials, catalog and organise the strengths and weaknesses based on engineers' experiences with the tooling.<p>The result was the person driving this effort getting asked by HR, their manager, and their skip manager to stop with the anti-AI rhetoric, that AI is the future of the industry and if not rolled out in this way we will be left behind.<p>> I'd imagine that people must chalk it up as saying that there is push from investor side into using AI and management is forcing it but couldn't more effort be also redirected towards the fact that agents are still finnicky sometimes and that the sustainability of projects moving forward is going to be a major issue to investors given that they want sustainable growth.<p>The C-level doesn't seem to care, they believe to be in an existential risk where any price paid to rollout these tools as wide and broadly as possible across the whole org is small compared to being left behind.<p>It's all fueled by executives and investors anxieties, there's very little left for reason when fear takes over.
Was the effort just directed be a single developer though. I have written elsewhere on HN[0] about something I have witnessed but I have seen that if only a single developer is actively against AI, the management actively believes that it is only that single person who is the bottleneck/issue as nobody else is complaining so it must all be right.<p>Although I get the overall statement more now as well.<p>> It's all fueled by executives and investors anxieties, there's very little left for reason when fear takes over.<p>This perhaps seems to be the greatest reason and I imagine that because all growth within economy is shown to be floating within AI related sector, they want piece of that pie. Do note that most of the issues with it is that the frothy evaluations and how much actual growth is happening downstream rather than just money shaking hands and the bubble nature of things.<p>but theoretically one can see that there is no winner in all of this no matter how deep one can fall down in layers of. Perhaps something at the layer of DRAM and chip production seem to be the one actually the most profitable at the moment and even they are just witnessing some temporary growth as nobody expects the prices to stay at this ceiling including said companies and yet they have incredibly floaty (but comparatively less so) valuations as compared to other AI things.<p>On one hand, I expect money to be rational yet on the other hand I am actively witnessing money to be irrational. Yet the whole industry is so muddled up now with so many people wanting the piece of cake of that investor's dollar that I imagine the market somewhat saturated.<p>Could something rationally not happen because if so, it seems that the thing might be that the burst of the bubble might make the only rational sense of adding realism in the market, but I imagine that it might spook the investors too much too and it might cost too much blood-shed in the markets.<p>Am I able to explain myself and do any of you believe that there is a rational course of action which can be taken to help stabilize the economy in some aspects?<p>[0]: <a href="https://news.ycombinator.com/item?id=48630281">https://news.ycombinator.com/item?id=48630281</a>
Inertia is on our side
I have given up on spending more time reviewing PRs and help fixing bad decisions made by AI because all it does is encouraging the same thing without learning. Next PR will most likely contain similar design / code problems.<p>I can't speak for all orgs but at the end of the day, the only metric that my org cares about is how AI is improving our work. Holding the line for quality is definitely a good thing, but when your org doesn't care about the pressure on PR reviewers, and the effort it took to fix those PR mistakes or their repeat, those reviews are only helping the loop case and that AI is without flaws. I am not saying AI is a bad thing all together, but when the org ignores those metrics, I am only contributing and helping to prove that the other side is right.
> I call out Boris but I also don't think he's being malicious.<p>From a market perspective, he's acting completely rationally in his own interests. Bottom line is that these companies need to do whatever they can to keep growing token consumption because that's their goal.<p>If the nation's drinking skyrocketed, we wouldn't be sitting here wondering why the CEO of Budweiser isn't advocating for temperance. His job is to move kegs, just like Boris' job is to move tokens.
I never understood this perspective. Just because a person's behavior is market-rational, it does not mean they can't be criticized for externalities.<p>That is, in fact, an important thing to do. It turns those externalities into public perception, which turns into market forces that adjust the behavior, if you want to think purely in market terms.<p>The analogy with Budweiser is not a good one. This would be the CEO of Budweiser actively pushing more drinking while the nation's drinking was increasing. And yes, people would be right, and effective, to oppose this (see Oxycontin).
> can't be criticized for externalities<p>As soon as you open up the externalities discussion, the wider question of increasing electricity prices and turbocharging global warming comes up, not to mention RAM prices. AI is a machine for turning negative externalities into stock prices.
Of course they can be criticized, we can criticize the ExxonMobil CEO the same way and get nowhere<p>The point is, blaming them is pointless, if it wasn't them it would be someone else. How do we react?
"If it's not them it would be someone else" probably should just be shorthanded to the banality of evil. But to answer your question, idk, sensible regulations? (Knowing full well this just kicks the buck of responsibility down the chain)
His job is to do what’s in the <i>long term</i> interests of his company.<p>[Edit: was thinking of the ‘CEO’. This doesn’t apply as cleanly to Boris.]
FWIW, the CEO doesn't seem to be the CEO? If I understand what I read properly, he has one direct report — his "chief of staff". So arguably the true chief executive in a functional sense (the person to whom the organisation ultimately reports through hierarchy) is his sister, who appears to be grounded in reality.<p>I suspect that the reason Anthropic is generating such developer-negative, toxic, insensitive influencer sentiments like these from Boris has to do with Dario being a fantasy-fiction-reading quasi-mascot who thinks it's his job to tell scary stories, who is being allowed to do just that by a sibling who perhaps prefers he's not involved in the day-to-day.
Respectfully disagree. From my view, the tech industry hasn't behaved in a way that regards long-term interests over short-term interests in a very, very long time. Much of the innovation is simply finding new and creative ways to shrink this loop even further, and vibe-coding/slop is just the latest manifestation of that.
> we wouldn't be sitting here wondering why the CEO of Budweiser isn't advocating for temperance<p>But we would start to wonder if the CEO of Budweiser started advocating binge drinking.
> This is going to be a net negative on software quality for people who take this up, in my opinion.<p>The silver lining appears to be that long term most people won't be able to afford producing slop at current rates.
Like Mario tried to do... But no one listened
I think at the end of this we'll have a new software engineering paradigm.<p>Mostly nobody now worry about binaries or instructions because those are for the compilers, even undefined behavior are mostly ignored.<p>You can either tailor the development pattern for LLM, or have LLM come write for the same old development pattern. I think there is going to be a difference.
Quoting the creator of CC holds little value in my opinion. I too call my product good.<p>> opting out of this fully machine-driven future may not be an option.<p>I am contemplating whether I want to stay inside this rat race.<p>I completely agree with the conclusion of this blog post, by the way. I feel uneasy, and I do not enjoy the work I deliver using LLMs. I think OP did a really good job on capturing at least my current state.
I and my friends go back and forth, every day, on whether coding with LLMs is a net plus or a net negative.<p>I'm at the point where I think it's dumb to not do it but also dumb to do it. I have no real answer.<p>I have settled on using LLMs for everything but to spend more time honing the quality and cleanliness with LLM passes afterwards than I generally would have taken to write it well myself in the first place. This is in some ways the worst of both worlds, but it somehow lets me bypass akrasia while still getting pretty good code out, so I consider it superior to how I worked before. I get more done in three months even if I get less done in a day.
I am with you here but don't get overly pessimistic: devising hooks and stopgaps and flows and constantly tuning what to watch out for does not only improve the quality of the LLM-output code. It hones and refines your own abilities.<p>CC has made some pretty dumb stuff in my projects but I don't resent those occurrences. They taught me (more accurately: reminded me, because I already knew but was not applying that knowledge too often) very valuable lessons on code quality -- that's still a dark area to this day and every ray of light on it is valuable for the future programming.<p>To me programming with LLMs made me a better programmer. But yes, I don't just rubber-stamp PRs.<p>It also finally allowed me to be less of a code monkey and more of an architect and a backend lead than before. Which I was really missing.
<p><pre><code> > I am contemplating whether I want to stay inside this rat race.
</code></pre>
I'm in the same boat. I'm hoping to go back to school in 2027 and be out of work that revolves around programming in 5 years.<p>I'm not enthusiastic about the field anymore, which sucks, because I used to love working in programming.
> I am contemplating whether I want to stay inside this rat race.<p>Same. I'm currently trying to find _my next thing_ and all anyone wants to talk about is how I'm using AI and it's absolutely maddening. It's become a lazy, lossy proxy for productivity. I've had a few intros for the types of orchestration engineering roles which are described in this post and they're just completely unappealing -- especially the prescriptive aspects. Like, the sort of JDs I'm seeing are variants of, "we want a back-end developer who has experience with XYZ but they must use agentic harnesses to do their work." Why does any <i>serious</i> person give a flying fuck how the end result is reached? The flip side of all this is that rates are also being driven through the floor by loop cowboys who are generating steaming piles of shit which are _good enough_ ... until they aren't. I'm being completely serious when I say that stocking shelves at Tractor Supply is becoming more appealing by the day and I also just thought to myself, "Maybe I should just join the Army while they'll still take me?"
> I feel uneasy, and I do not enjoy the work I deliver using LLMs.<p>I have basically stopped writing code in my spare time since the advent of AI. Before I felt like I was working on a classic car. Was it a practical use of my time? No. I could go out and download software that did what I wanted. Did I have fun doing it? Yes, the act of working on it was important, I felt I was still learning and improving as I did.<p>Nowadays I see people doing far more in a month than I could in a year and I feel like its all a waste, like I just spent the past few years transcribing a phonebook while standing next to a photocopier.<p>I don't know if that'll ever change. I can't even pretend I was doing something prestigious and artisan like watchmaking because I wasn't a good programmer beforehand.
This piece changed how I work with LLMs and made me much more optimistic about how "fun" it can be to work with them: <a href="https://nolanlawson.com/2026/05/25/using-ai-to-write-better-code-more-slowly/" rel="nofollow">https://nolanlawson.com/2026/05/25/using-ai-to-write-better-...</a><p>Before I would just throw prompts at the LLM and it'd end up building a pile of crap (but semi-working crap, and 100x faster than I ever could) - it was pretty depressing. Using tools like `grill-me` (or `grill-with-docs`) I feel like I'm actually building my understanding of the system and helping shape it, and the results are much better.
The fun part about that `grill-me` command is that when the questions are over, I've found that I can go right into implementation without needing to dump a PRD or some sort of broken up plan. Now this is obviously completely predicated on what you are asking it to grill you on. But for tasks that are semi complicated, it's fantastic.
I used to think I'll be into coding for the long haul, contributing to open source, and working on multi-year side projects.<p>Nearly all of that passion vanished this year, and I've been struggling to replace it. I know I'm much better than the machine now, but the lines are starting to blur, and some of the small puzzles of day-to-day have been completely automated away.<p>We've birthed a lot of puzzle solvers that enjoyed programming, and I'm sure many of them will move on to something else that scratches the same itch. I'm keen on learning what that will turn out to be.
Be your customer, write the software just for you, AI is so effective that you could do something meaningful for you just in spare tine.<p>Here is the similar perspective: <a href="https://isene.org/2026/05/Audience-of-One-Numbers.html" rel="nofollow">https://isene.org/2026/05/Audience-of-One-Numbers.html</a><p>I was misunderstood you if you intend to write code by hand, I still did, I use AI to learn by example, but I write the real code myself, AI can help me improve the code. I learned a lot.
I'm the opposite, couldn't be bothered to work on code outside of work. Barely did at work because I was more focused on wrangling a small army of shitty contractors (thanks strategic partner initiative for firing all of our small shop contractors and replacing them with morons from "offshore").<p>Now with LLMs I find myself doing small projects that interest me or have some utility for me outside of work, and doing a lot more development in the codebases at work outside of just review/docs/arch than I was before. Also making small tools that I find pleasant/useful but were not important enough to spend time on before.
Agreed - there was always a set of things I wanted to do that I knew the magic core for, but wanted a team of implementers for the curft, the 100k of actual testing harnesses, hyperparameter exploration, etc.. . I now have that team of implementers. All the problems seem research-y though - optimal binary transport systems that are zero-copy and compatible with languages, fast physical simulation optimizers, etc etc... So, things that all had a _LOT_ of busywork around the magic core.
> You Cannot Quite Opt Out<p>I am so over this. I cannot take anyone seriously that claims inevitability of their ideas, and how you must adopt them without "being left behind". If these tools are so good and so capable the result should be able to speak for themselves rather than this FOMO inducing, emotional language.
I couldn't agree more. Thus far I'm still objectively more productive than all of the AI enthusiasts I've worked with. I think a lot of the activity with these tools is coming from people who just enjoy using them more than they enjoyed coding. They feel more productive not because they are producing more but because they are producing somewhat less with much effort. It takes them roughly the same amount of time even if it changes the distribution of time spent on each task.<p>> and in recent weeks it has started to dominate the Twitter discourse.<p>As a general rule, I don't waste my time with the advice of people who still think Twitter is a source of wisdom.
the point of that section is that attackers and security researchers will use / are using loops, and you as the maintainer are not able to opt out of others doing this. an unwilling participant.
In my experience, some language like this is the result of witnessing it speak for itself.
You're being uncharitable. I don't read it as intentionally FOMO inducing. I read it as the exhausted sigh of resignation from someone who sees where the wind is blowing whether they like it or not. I see it as someone watching tech management and execs listening rapt as Boris pours the poison of AI maximalism in their ears. I read it as someone who sees developers around them either drinking from that same poisoned well or bowing under the pressure from those leaders to adopt AI or lose their livelihoods.<p>It is true that the author is incorrect: you can certainly opt out, but you won't be opting out of AI, you'll be opting out of the industry.
"the industry" is not some monolith, and treating it as such is no productive. There are all types of software and many ways in which it is created. If the companies that are "AI enabled" are <i>so</i> much better we should see some big changes soon. But I'm still waiting for products I use from "AI enabled" companies to start churning out features at unprecedented speed.
Loops work when you spend the proper amount of time to understand what you want ahead of time. The prerequisite is clarity — enough clarity that you could write a careful specification that you could hand off to a junior colleague.<p>Often, it takes 5-6 broken crappy versions of a thing until you understand that. There is no accelerating the 5-6 broken crappy versions - there’s no agent tech that’s going to help your meat brain avoid thinking time.<p>So most of my time is iterating between these two phases: I don’t understand what I want, I need to read and write and play with code, okay it’s been long enough I think I know what I want (it is extremely easy to deceive yourself) … okay now I do actually know what I want and I can write a loop.<p>Many people think they can jump ahead with agents. You cannot fake understanding or clarity. It is painfully obviously when someone skipped that meat brain understanding phase.
I had codex write a tool to extract all my pi sessions. (Had to filter out my prompts from the agents talking to subagents).<p>Then I had it analyze the patterns i was making and turned that into the flowchart for the outer guidance-creating-prompt.<p>I didn't have to spend too much time thinking what i wanted. I wanted it to do that.<p>The result is still mixed, and i'm not trusting it with delicate code bases, but for a game i've been building i dropped my check-in time to 1/5th i was previously spending on it.<p>Thats not a good thing per-se. I'm sure i'm missing good ideas by _not_ spending time with it. But previously I really had stagnated with my prompts becoming mechanical #now-do-this and #now-review-that with 90% of its suggestions being correct.<p>Just need to (automatically) remind it to "do the hard stuff first, clean up & refactor as you go" as well as a "reflect on your work" after its first return to get it to spill the beans on any crap left behind, and then process that in the guidance-creating-prompt to dish out new work.
Code is part of a shared and built understanding of an information system.<p>If these loopers mean we all have to move at this continuous wave of software happening, then we get to the highest levels of logical information system design and its all human judgement and balancing of business requirements to fit a given niche in a company or market. So all the programmers have to become business analysts/market researchers/businessmen...except the specific niches where AI tooling can't really clank well...or the end of the subsidized AI token era makes all this looping too expensive to continue. This feels like expert systems and symbolics lisps machines redux, where we briefly ran into the fact that its not so much the code itself not being able to do stuff, it's that your company's org always gets shipped, so if you can't change your company org, your software only has so much flexibility.<p>Dataflow diagrams and domain knowledge / domain modeling / ubiquitous languages may become the metalanguage that we start to use and set the standards for quality, functional, and non-functional standards and conventions. We make the "looper clankers" ensure that they fulfill that data / behavior / performance contracts before saying what "done" is, because "done" is no longer just code that compiles, code that builds, code that deploys, or even code that sits in production; it's code that fulfills all of the user requirements, operator requirements, and maintainer requirements. So, the language used may be required to make us all turn into business analysts and software architects more than syntax knowers. The revenge of UML and the return of declarative / logical design / BDD triumphing?<p>(Typo scan by gemma4-12b but I didn't let it alter my message)
>Yet even with a lot of manual steering, that type of code does not come out of LLMs naturally, and even if the code comes out naturally like that, they will still attempt to handle now impossible errors.<p>This is something I’ve struggled to fight against in many PR reviews. Especially once already written, convincing someone that their excessive null checking is harmful is an uphill battle. Short of better modeling (and languages that allow for sum types to enable it), I haven’t been able to come up with a universally convincing argument against this kind of “shotgun parsing.”<p>Maybe it really just isn’t that big of a deal? But when actually reading through and refactoring a codebase I’ve always found it frustrating to manage these unnecessary checks. Sometimes they’re nearly impossible to delete safely once present without first adding some kind of logging or broad investigation.
How impossible are we talking?<p>I tend to be a fairly defensive programmer - maybe nothing currently sends this function a negative value, but how hard is it for a future code change to alter that assumption? I always figured a clear error was best. It lets even someone unfamiliar with the code know what assumptions are being made about the valid range of inputs, so they don't have to consider impossible outliers.
And AI code reviews encourage overly delusional defensive paranoia. triple null checking deep inside a function is technically a real risk, but in practice should never be hit because you've checked for nulls in every function that calls or could call the function in question and is thus not necessarily worth guarding against.
> the right fix is not "handle every malformed case." ... [LLMs] will still attempt to handle now impossible errors.<p>This is the number one code smell from LLMs and I don't know why they are so obsessed with it. In python, it often comes as `hasattr` checks on types that are defined to have that attribute, in a code base that is fully type-checked.<p>Why do they do that? Is it from pre-training or re-enforcement? If that latter, can the labs please fix this?
Likely just that they err on the unnecessary error handling than missing error handling. They likely penalize runtime errors harshly in the training
I suspect it's mostly the training data. I am also on team "make illegal states unrepresentable". It may get talked about a lot on HN, but I'm still at the point that I'm surprised when I see a code base that I didn't write in the wild that does a really good job of it, either open source or at work. Most programmers still think in terms of picking up pieces and fixing errors at the point where the error message pops out rather than making it so the error can't happen and the data reflects that.<p>I say "mostly" because I think there's also a problem with AIs thinking this way in their current state. That last level of human understanding of a code base, where the human holistically understands the flow of those guarantees, is a challenge to give them right now. On the raw code level, this sort of thing often involves enough code to easily blow out their context window. Trying to summarize it in memories-style files has its own problems; just because there is text written down about the guarantees doesn't mean that the AI is going to get the right info out of it, any more than a human might from just reading the code. I won't say it's "impossible" to give an AI this understanding because I'm not sure it is, but it is a level of understanding of the code that even if you get them to have it, their practices tend to fight against it.<p>My own solution to this problem has largely been to give up on them getting this. I prompt a solution to the problem the way that most people do, then if I want to make bad illegal states unrepresentable I prompt the AI through the process of the necessary refactorings, unless it's so small that I just do it myself. Given a lot of code that uses maps/dicts and arrays and strings and ints, if you prompt it through making those more thoroughly typed, it's actually pretty good at it. I've not had a lot of luck getting good designs out of single prompts, even when I get detailed. Treating it as two separate tasks seems to work out well.<p>And watch the diffs on the types carefully; AI loves to sneak past a ".JustSetItAndIgnoreAllThePreAndPostConditions(string)" method. After all, I suspect there's plenty of training data of "types that are nicely structured to make error states unrepresentable and then a later maintainer came along and added a 'JustEffingDoIt' method that broke everything" in the field. One of the best defenses is to make sure that the type implementing these things is in its own file and you can easily look at all the methods it adds on those types and smack it when it does that. I've tried slathering warnings about not doing this and explaining the pre- and post-conditions being maintained in the docs but the change seems marginal.
Sorry to say but the solution is to stop using python. The models are trained to code defensively assuming historically representative python codebases. The models trust the types a lot more in languages where the canonical historical examples trust the types because the language is constructed around that premise.
It’s because it matches the patterns they are trained to follow. They don’t understand the code. They can’t reason about the actual logic flow. They can only work with patterns.
I'm a software developer from way back, using tools and languages that coding agents are far less familiar with.<p>So when I use an agent to write code, it's in languages <i>I'm</i> less familiar with, and often using libraries I know nothing about.<p>All to say, my part of the process often ends up being:<p>1. "Here's what I'm looking for, in detail"
2. "That's not right. Here's one way it's not right, and a specific example. Please fix that."
3. <i>Sometimes</i> I give suggestions for how what is going wrong might be happening, or conceptually how to work around the issue.
4. And iterate on 2-3 until the result is close enough.<p>That's a loop I'd love to automate.
I am 100% for fully agentic loops... for tasks other than engineering.<p>I'm not willing to outsource the <i>understanding how things work</i> part of myself. That part of myself is what got me into computing in the first place.<p>If this work becomes simply a matter of describing intent to a machine (probably through an Issue, like a user), and going to check on the result when you get the 'done' notification: I'm done.<p>It's possible to use the tools to do awesome things without letting go of full system understanding of the parts that you look after.
I keep thinking about at which point I should not force myself into the loop. As a developer I really like working on the code structure, making it clearer, thinking about good abstraction, breaking into modules, etc. I really take pleasure in it. At the same time I understand that at some point I am becoming the limiting factor.<p>If the point of the software is benefit people, should I still care about how the code looks.<p>Right now, I still think that the answer is yes, but in 3 years? in 10 years?
> My current status is that I have not had much success with this way of working for code I deeply care about<p>If something is judgement heavy, "code i care deeply about", then i don't really agree with the direction of travel here. Don't try to delegate decisions you care deeply about.<p>I do like the framing of agent loop vs harness loop, but only delegate stuff that you can accurately specify in advance, that usually means stuff that's repeatable in my case ("hey go see how i did X, do that but for Y"), and that inherently means stuff that's predictable.<p>For stuff where lack of my judgement as input is just going to cause me to say "no", we're down to collaborating in the "agent loop" as Armin puts it. And that's totally fine. It's fast, but also safe.<p>Remember before AI coding assistants, sometimes you'd get an engineer join your team who was SUPER productive, your peers would be jealous "oh yeah but you guys only got all that done because you have X on your team!" - they didn't live the curse of having that kind of person around - if you don't have them PERFECTLY aligned, then they run off at break neck speed in the wrong direction.
Part of the problem is that models don't have a strong sense of taste, part of the problem is that the context in which projects exist is incompletely represented in the LLM context, and part of the problem is that LLMs tend to be myopic.<p>The lack of taste can be mitigated to some degree by improved training, though taste is not a stationary distribution in humans (see trends/fads/etc), we can at least better track the cutting edge. I think this area still has low hanging fruit but frontier labs are more concerned with being able to solve problems than the style of the solution right now (for evidence of this just look at the Opus 4.5 -> 4.8 arc).<p>The problem of incomplete context is partly a human problem and partly a harness/interconnectivity problem.<p>LLM Myopia is a harder problem to solve just by virtue training models on question/answer pairs. Countering this requires emphasizing RL on solution paths rather than just prompt/response, which is doable but harder.
This.<p>> Present-day models tend to produce code that is too defensive, too complex, too local in its reasoning. They avoid strong invariants. They add fallbacks instead of making bad states impossible. They duplicate code, invent bad abstractions, and paper over unclear design with more machinery. Worse though: I so far see very little progress of this improving.<p>Context-smithing can help to a degree and cyclomatic-like complexity rules tend to make matters worse. So, you either roll up your sleeves or close your eyes and hope for the best. I've had limited success with the latter.
We used a “loop” before it was called that to drive MS-DOC support into Tritium. Based on that experience, I take issue with this:<p>“There are already <i>impressive</i> examples of large automatic porting efforts, including the reported work around moving parts of Bun from Zig to Rust.” (Emphasis added.)<p>It will be impressive if/when the Bun team is able to pick up and continue extending and supporting Bun. For us, MS-DOC remains read-only and probably perpetually buggy until we reimplement with a better understanding. Until then, it’s definitely not “impressive”. Functional? Maybe. Impressive, no.
In my own ham-fisted experiments with coding loops, one pathology I have noticed is that the LOC just spirals out of control. That's likely because of the layers of defensive fixes, etc., that get built. That inevitably causes context bloat (or at least navigational friction) and results in quality decline.<p>I wonder how many loop-related issues could be addressed by simply fixing a LOC budget, or assigning a cost in some way. Unclear how you would dial in the right numbers, though.
The issue is that whilst the loops will initially lead to good results they will be less and less as context gets bigger and bigger and tougher to understand for human and AI.<p>So it depends really on the size of your project.
We've had great success with agents thus far at my job. A year into Clauding and all our dev metrics are up while our downtime has remained steady.<p>Being an iOS engineer, much of my engineering cycle these days is going from Figma/PRD → spec → code. After being handed off to QA, we handle the bugs and product slips as they come through, while we simultaneously build/spec the upcoming addition. This is basically the same agile style that's been popular for 20y, just super-powered with agents.<p>How might someone accomplish the same goals using loops instead?
I personally have not had good luck with loops due to similar issues as the post author - but if you were to port your flow to "looping" it would be something like:<p>- An automation that periodically checks for PRD's at a given location that have not yet been implemented.<p>- If it sees one not implemented, it puts a lock on it (so other agents later don't pick it up while its still working) and implements the PRD in code, assuming it has the figma link and all specs required.<p>- When its done it makes a PR, waits for if it passes and even in some cases automatically merges into your staging/preview enironments and just pings you with a build/URL. You can then leave feedback or something and it can also also poll for pending feedback. Or you just mark it looks good, the agent then merges the PR, moves the PRD to implemented status, maybe even writes/updates docs and cleans up any temporary work.<p>- Repeat checking for new PRD's every T unit time. (10 minutes, 1 hour, etc)<p>This is how people say you should be looping - you never even cared or looked at the code, and also never prompted the agent yourself.<p>But I find most agents are often pretty bad still at replicating UI vs making something from scratch and most design specs are still not as detailed around how things look at all sizes, in all scenarios etc. Design seems to be one of those things that still requires a human to validate. And then all the things the post author mentions about it not being willing to apply hard constraints, minimize impossible states, validate at edges and prevent horrendous overchecking of things. etc.
Use appium or XCTest or swift testing; generate the tests first (failing) from the spec.<p>The loop is basically then a while loop:<p>While (tests fail) { trigger agent: spec, failures list }<p>for bugs, write failing tests.<p>Its basically TDD.<p>Loops do nothing useful beyond making the “spec -> code” step more “hands off” and let you be confident that the code you write does what is intended.<p>Obviously you see the issue: writing the loop harness is > effort than not having it…<p>…but the <i>idea</i> is that you run “spec first” and are totally hands off on the code, just updating the validation step and then waiting while the agent iterates over and over to solve for some solution that passes the loop harness.<p>People suggest that it is possible to go, eg. directly figma/jira to harness via (random tool here), saving even more time and invoking even fewer humans, but thats currently, as far as I can tell, actually just hype.<p>No one is actually doing that effectively.<p>Loops are currently carefully hand crafted, which makes them tedious and of questionable value imo.
Would you have a breakdown of costs/benefit? Can you say with certainty that this workflow has increased productivity so much that you are seeing profit increases that you wouldn't have otherwise noticed just by hiring more people?
Asking with no ill intention, I just crave for actual business cases that make sense, and yet no-one seems to be able to reliably produce that.
Generally interesting reflections here, yet I see the same kind of myopia and fatalism that is rampant in our (fashion) industry:<p>> yet I have no doubts that this looping future is going to be our future despite the fact that I presently resent it<p>Why would anyone concluded this? LLMs are just one kind of application of MLs to software production. There is a vast solution space for automating parts of software production. The idea that slop loops are the inevitable future because they happen to be accelerating output at the moment just seems profoundly short-sighted and lacking in vision.
Was everyone collectively lying over the past fifty years of software development when they repeatedly said more != better?<p>For specific use cases, performance and security and all sorts of tuning it could be truly amazing. But maybe loops should be like a tool we make a choice to use when optimal.<p>I just wonder if in the future we’ll come to realize that we don’t have to throw the baby out with the bath water. That you can take a beat to understand your code and do change management, and choose the right tool for the job, and curate and say no and have agency.<p>An observation might be - no one writes code like Google “you’re not google” is something that gets thrown around in software shops all the time. Why is it we all think we’re going to be writing code like Anthropic?
These new AI trends are very tiresome, very similar to 2021 crypto mania - both trigger a lot of FOMO. If we have loops that write code and we don't need to verify anything, why are the devs still here? What's point of even learning this new trick as a dev if you truly believe that this can be used without any intervention? If loops work then it follows that a loop of loop works too - why hire any people at all? Just run a bunch of loops and build a profitable business, but then what's your moat? Any person can now launch loops on top of loops.
A friendly reminder to just do 9 to 5 and touch lots of grass. None of this shit represents industry trends, majority of people still use chat interfaces and copy blocks of code. There’s zero early adopter advantage here, only FOMO and lots of anxiety.
I honestly wonder if this kind of stuff really brings something to the table. Like I use opus for sometime and certainly I can put it to good use and optimize some parts of my day to day job (programmer). But it fails so hard in such simple tasks that it seems to me that putting it in loop can't just magically make everything better, unassisted. Does anyone actually uses agents and loops to create new software, new technology? Has anyone created with those systems, software they couldn't produce otherwise technologically wise? Or is it at best just an accelerator, cutting off on the building time?
I think this is a common sentiment among heavy users of AI that also still cares about code quality.<p>I've built up a skill harness and review flow that makes Opus generate slop-free code 90% of the time. But the remaining 10% requires me to stay at the helm. Especially in the early stages.<p>I would love to use loops to automate more, but I couldn't do it with the current generation models.<p>And on the back of my mind I'm still evaluating the possible future where we are forced to API pricing. I'm currently paying $400 for Opus, and use around 1.5-2 billion tokens per day. This will cost around $20k/m with API pricing. And I don't want to even imagine the possible scenario of getting locked out of frontier models because of politics.<p>Will the models get better to cut me out of the loop completely? I believe so.
Will the open source models catch up tho SOTA models, and diversify from China-only? I hope so. Otherwise 2 superpowers will wield a soft power that can cripple the tech industries of all other countries.
This is really terrible advice right now for most people.<p>I've had to rip out a lot of pretty terrible code made by engineers who have tried this.<p>I don't disagree that eventually, "loops" when combined with unlimited tokens and amazing models in the hands of people who know how to set them up right will be amazing. But for the typical Claude Code user, it's disaster.<p>The problem is not that loops write bad code once. Humans do that too. The problem is that loops apply local pressure repeatedly: add a fallback, add a guard, special-case the failing input, quiet the exception, satisfy the test. Over time that selects for code that is more survivable in the short term but less intelligible in the long term.
The post suggests fear about a surge of increasing amounts of code by loops and loops of agents.<p>I don’t know if I like the current world without it though.<p>80% of different teams code the code is poorly tested. The code doesn’t handle data consistency or asynchronous code properly because the engineers don’t know better (and frankly don’t care enough).<p>Dependency handling is poorly managed leading to low quality operations with improper dashboards, alarms, and ops.<p>Badly managed processes leads to people doing monkey work signing off checklists rather than automation.<p>Frankly… why is keeping any of that good? It really pisses me off seeing people accept any of that low quality but that standard is the default and not the outlier.
This is a very fatalistic take. While I understand where it's coming from, I try not to share the same mindset: engineers getting increasingly distant from how things are getting built is not something that will "undoubtedly happen, whether we like it or not".<p>Also:<p>> Now there is obviously a question if this desire to understand the code is one that I will still have a few years from now.<p>I do not think we should be having doubts like this. Either you consider understanding the code you ship and allowing your future self to be able to work on the system you're building to be a value, or you don't. I, for one, do, and I do not think using LLMs and coding agents will affect my point of view on that.
I think there's 2 important, but separate, ideas in this post:<p>- Models are not good at or getting better at creating strong invariants, which his fundamental to good software<p>- It is unclear how to keep tabs on what the agent is doing, so you, a human, can intervene.<p>These are related, obviously: one of the highest-leverage things you can do is force you agent to use a strong, minimal set of types or data invariants or other constraints. They get much better when your codebase broadly supports this!<p>I do suspect they're separable, though.<p>If you had the right levers and visibility, you should be able to get the model to produce code that doesn't feel like slop. But every time I've had a model try to keep me in the loop, it inundates me with irrelevant decisions and busywork. Its inability to see what's structurally important still shows up, just differently.<p>[If the models get better at defining and respecting invariants, maybe there's a new flavor of slop, that's less obvious today.]
This is the best essay on agentic coding I've read. Clear thinking and writing, pragmatic about the future of agent-led coding.<p>If you usually skip straight to the comments, you might want to actually read this one.
> Present-day models tend to produce code that is too defensive, too complex, too local in its reasoning. They avoid strong invariants. They add fallbacks instead of making bad states impossible. They duplicate code, invent bad abstractions, and paper over unclear design with more machinery. Worse though: I so far see very little progress of this improving.<p>It’s almost as though these models were trained on a vast corpus of largely mediocre code. They will never outperform the median Github user - it is all they know, it is all they can do.
No.<p><i>(I mean, OK, so I can't just write that, because of some rule or another. But really, just no, at this point, surely? How is it we are so actively turning software development into some weird blend of cargo cults, homeopathy, prosperity gospel and penal treadmills? Chasing trend after trend from corporate influencers with motivated reasoning who are trying to create new cultures that entrench their weird inscrutable metered intelligence taps? Handing over this level of control to and for the benefit of the sociopathic or solipsistic executives of two companies that cannot demonstrate they will ever make a profit, backed by a handful of other companies who are stirring money around in a pot to make it look like they are still generating value? Can we stop? Please? Please?)</i><p>No.<p><i>(And also no: it was not, in fact, this bad before. No matter how you try to retcon it. It was not good. But it was also not this insanely nihilistic)</i>
Show me the billion dollar solopreneur startup, or the profit increase for companies and at that point I’ll start thinking that this tasteless high level wanking might make sense in some way
my experimental looping build on top of pi and zx
mostly pi deep seek and some skills ;-)
<a href="https://github.com/topce/pizx" rel="nofollow">https://github.com/topce/pizx</a>
Theres a deep insight in this post about the value of looping for throw away code to explore a problem space, rather than brute force a problem by just applying more tokens and hoping.<p>The more I play in this space, the more I’m drawn to the idea that some kind of back tracking constraint solver is a better solution than then the current naive while loop / brute force approach here.<p>The results I see are similar to what you get from a greedy brute force constraint solver; solves trivial problems, sometimes solves harder problems after a long time, takes too long to solve really hard problems; solutions are increasingly non optimal on average as complexity goes up.<p>We have so much existing knowledge about building good constraint solvers, if we could just figure out how to apply it here somehow.
> For now I have not moved past the point of comprehension being important to me.<p>Ah ! This is me too... at least for what I have to ship at work. Not so much for my toy/weekend projects. But it turns out agents are also good at explaining.
I think it's insane to suggest that software developers should ever get to the point where they don't even comprehend their code.<p>Before someone else says it, no I don't read the assembly code that is produced by my compilers. However, I can generally predict what kind of assembly will be produced, and the result is deterministic unlike LLMs. It seems like most vibe coders scoff at the idea of even looking at the code, and it just seems untenable to me when we're working with (usually correct) stochastic parrots.
One of the biggest problems with LLMs has turned out to be the cost of actually running them and this strategy functions as a usage multiplier.
Great article and good description of LLM code quality problems and problems that derive from that. And fair to not want a tidal wave of slop to displace your entire craft.<p>But this article is strangely lacking in foresight in terms of rapidly evolving model capabilities and output. One visual way to see this is to compare levels of SOTA video generation models. Look at outputs from Sora, to Veo, to Seedance 2.0, and now just released Seedance 2.5.<p>Or compare LLMs/VLMs as they have progressed: GPT-2, GPT-3, GPT-4, Opus, Fable/Mythos.<p>You can see the level of sloppiness or poor world understanding progress from comical nonsense to junior to senior with a few holes in their brain to an engineer you can actually almost trust to produce clean code if you mention the right guidelines in your instructions (such as avoiding overly local code).<p>As the model size/complexity increases, the intelligence increases, and so does code quality. We will also start specifically putting more high level code quality tasks into training datasets and training harnesses. I mean, Karpathy will probably see this article and make a huge dent in the issues without even larger models.<p>One thing people may not be aware of is that there is still a lot of room for hardware efficiency improvements and model size to grow. The compute-in-memory paradigm is just getting started in a way. Look at companies like Tensordyne and Mythic AI, but they are going to get blown out of the water by fully in-memory approaches.<p>For example look at the recent wurtzite ferroelectric nitrides breakthrough from the University of Michigan team (one of them tragically jumped from height after intense interrogation regarding national security concerns). The military is providing significant funding to move this towards development and scaling out of the lab.<p>That type or level of truly new paradigm system is going to boost efficiency by multiple orders of magnitude.<p>I know there are people who think Fable 5 was the end of the public LLM/VLM frontier moving, or that it is impossible to scale models further due to energy consumption. But there is zero chance that every high level VLM/LLM research team on the planet is going to stop publishing models or that the rapid progress in compute efficiency will stop.<p>Point being, within a year or two, the code coming out will be much cleaner. And within five or six years what you may see is that the leading models are 100+ trillion parameters and have sophisticated persistent context management etc. and they do not even produce application source code.<p>Instead, the database is in the context and is neurally rendered at 24 fps into whatever UI, schema and business logic you prompt it with in a broad way. The whole application is just precise thinking in an artificial brain ten times the complexity of an equivalent human brain.<p>And if you are disturbed by the current level of outsourcing for thinking to AI, it is just getting started. In a way it will be incredible, from another perspective horrific, but what I think we are seeing is the evolution of an ExoCortex. There will be an AI glasses stage where the integration is closer but still somewhat low bandwidth.<p>But sooner than later we are headed towards high bandwidth brain computer interfaces that make AI into an actual new cognitive layer.<p>So the waves of slop might make you feel sick, but that is nothing compared to the transhuman cyborgs powered by superhuman AI that are around the corner.
I'm willing to be persuaded otherwise: Looping seems to (currently) be a side effect of token subsidies.<p>If token costs are nil, then you can afford to run verification and generation through the same models. If token costs are high, then you will go broke verifying code sprawl.<p>Currently costs are (mostly) absent from the conversation, even though costs are what decide the limits which shape experience.<p>Also: Firms can be held liable for the products they sell, so if code cannot be reviewed then that code is essentially a law suit waiting to happen. I believe this is what customers will be demanding in the future: someone to hold accountable when things go wrong.
[flagged]
My own thoughts on this, with examples <a href="https://github.com/nfcampos/loop-dev/blob/main/README.md" rel="nofollow">https://github.com/nfcampos/loop-dev/blob/main/README.md</a>
There's _way_ more than one way to do "loops". I just asked one of my superviors/auditors to document how it's been working while monitoring a few other agents that have long-term goals:<p><a href="https://gist.github.com/rcarmo/4922b550ab48bf0b4246c77e606a5508" rel="nofollow">https://gist.github.com/rcarmo/4922b550ab48bf0b4246c77e606a5...</a>
Yeah I don't know. Don't get me wrong, the article points makes sense. But sometimes I think that we're going to stay near this current point of productivity for a little while.<p>Currently my org of 8 people use around 1000 euro worth of tokens per month. We've recently had a discussion near the water-cooler, that if the cost climbs 5x-10x it may be just more worth it to get more developers (we're EU based). While the tools work and are definitely nice, even in our little org with our little budget, using Opus 4.8 we've noticed code quality going down.<p>If I had to bet money, I'd bet that the models will get 30-50% more nice, around 2x more expensive and we will settle into some mode where we'll use llms for some tasks, manually doing others and calling places focusing on speed at any cost some funny name like "gulags, 996, sweatshops, etc" and collectively try to somewhat avoid those places, which will need to offer a premium to attract talent. Wishful thinking.