No more AI thought pieces until you tell us what you build!<p>AI is a general-purpose tool, but that doesn't mean best-practices and wisdom are generalizable. Web dev is different than compilers which is different than embedded and all the differences of opinion in the comments never explain who does what.<p>That said, I would take this up a notch:<p>> If you ask AI to write a document for you, you might get 80% of the deep quality you’d get if you wrote it yourself for 5% of the effort. But, now you’ve also only done 5% of the thinking.<p>Writing _is_ the thinking. It's a critical input in developing good taste. I think we all ought to consider a maintenance dose. Write your own code without assistance on whatever interval makes sense to you, otherwise you'll atrophy those muscles. Best-practices are a moving train, not something that you learned once and you're done.
> No more AI thought pieces until you tell us what you build!<p>Absolutely agree with this, the ratio of talk to output is insane, especially when the talk is all about how much better output is. So far the only example I've seen is Claude Code which is mired in its own technical problems and is literally built by an AI company.<p>> Write your own code without assistance on whatever interval makes sense to you, otherwise you'll atrophy those muscles<p>This is the one thing that concerns me, for the same reason as "AI writes the code, humans review it" does. The fact of the matter is, most people will get lazy and complacent pretty quickly, and the depth of which they review the code/ the frequency they "go it alone" will get less and less until eventually it just stops happening. We all (most of us anyway) do it, its just part of being human, for the same reason that thousands of people start going to the gym in January and stop by March.<p>Arguably, AI coding was at its best when it was pretty bad, because you HAD to review it frequently and there were immediate incentives to just take the keyboard and do it yourself sometimes. Now, we still have some serious faults, they're just not as immediate, which will lead to complacency for a lot of people.<p>Maybe one day AI will be able to reliably write the 100% of the code without review. The worry is that we stop paying attention first, which all in all looks quite likely
> Absolutely agree with this, the ratio of talk to output is insane, especially when the talk is all about how much better output is.<p>Those of us building are having so much fun we aren't slowing down to write think pieces.<p>I don't mean this flippantly. I'm a blogger. I love writing! But since a brief post on December 22 I haven't blogged because I have <i>been too busy</i> implementing incredible amounts of software with AI.<p>Since you'll want receipts, here they are:<p>- <a href="https://git.sr.ht/~kerrick/ratatui_ruby/tree/trunk/item/README.rdoc" rel="nofollow">https://git.sr.ht/~kerrick/ratatui_ruby/tree/trunk/item/READ...</a><p>- <a href="https://git.sr.ht/~kerrick/rooibos/tree/trunk/item/README.rdoc" rel="nofollow">https://git.sr.ht/~kerrick/rooibos/tree/trunk/item/README.rd...</a><p>- <a href="https://git.sr.ht/~kerrick/tokra/tree" rel="nofollow">https://git.sr.ht/~kerrick/tokra/tree</a><p>Between Christmas and New Year's Day I was on vacation, so I had plenty of time. Since then, it's only been nights & weekends (and some early mornings and lunch breaks).
But simonw said that dark factories are the way forward /s
The first sentence of the blog post has a link to the product he is building - <a href="https://www.monarch.com/" rel="nofollow">https://www.monarch.com/</a>
> No more AI thought pieces until you tell us what you build!<p>Let me fix that for you:<p><i>No more AI thought pieces until you SHOW us what you build!</i><p>And I think it can safely be generalised to:<p><i>No more thought pieces until you show us what you build!</i><p>And that goes double for posts on LinkedIn.
I’d take it even further: I want to see the prompts! I’ll concede we’ve arrived at a point where the coding agents work, but I’m still unconvinced they’re actually saving anyone time. From my AI-pilled coworkers, it appears they’re all spending half an hour creating a plan with hyper-specific prompts, like “make this specific change to lines XXX-YYY in this file”, “move this function over here and change the args”, etc. As far as I can tell, this is current best practice, and I’ve tried it, but I’m always disappointed with the either the quality or the speed, usually both.
Yea, if I were interested in that level of micromanagement, I would have become a manager.
Look learning vim is hard some people just want to describe their edits using natural language.
The product they build is literally mentioned in the post?
It’s one of the more popular personal finance/budgeting apps, and it’s a pretty good one in my opinion as someone who has used a variety of them.
Even if writing is thinking, which I don't think it's the case as writing is also feeling, but thinking isn't exclusively writing. Form is very important and AI can very well help you materialize an insight in a way that makes sense for a wider audience. The thinking is still all yours, only the aesthetics is refined by the AI with your guidance, depending on how you use AI.
> No more AI thought pieces until you tell us what you build!<p>We build a personal finance tool (referenced in the article). It's a web/mobile/backend stack (mostly React and Python). That said, I think a lot of the principles are generalizable.<p>> Writing _is_ the thinking. It's a critical input in developing good taste.<p>Agree, but I'll add that _good_ prompt-writing actually requires a lot of thought (which is what makes it so easy to write bad prompts, which are much more likely to produce slop).
“Writing is the thinking” is a controversial and open to interpretation take, so everyone’s gonna argue about it. You muddied the water with that one.
Ample evidence of production software being produced with the aid of AI tools has been provided here on HN over the last year or more. This is a tiresome response. A later response says exactly what they produce.
Most of what I see are toys. Could you point us to the examples of production software from AI? I feel like I see more "stop spamming us with AI slop" stories from open source than production software from AI. Would love some concrete examples. Specifically of major refactors or ground up projects. Not, "we just stared using AI in our production software." Because it can take a while to change the quality (better or worse) of a whole existing code base.
I imagine people who are shipping with AI aren’t talking about it. Doing so makes no business sense.<p>Those not shipping are talking about it.
So, “trust me bro”? When people find a good tool, the can’t stop talking about it. See all the tech conferences.<p>Absence of evidence, while not the only signal, is a huge fucking signal.
Sounds like you’ve got multiple ways to write off any example you’re given charged up and at the ready.
[dead]
I don't get why people think AI takes the thought out of writing. I write a lot, and when I start hitting keys on keyboards, I already know what I'm going to say 100% of the time, the struggle is just getting the words out in a way that maximizes reader comprehension/engagement.<p>I'd go so far as to say that for the vast majority of people, if you don't know what you're going to say when you sit down to write, THAT is your problem. Writing is not thinking, thinking is thinking, and you didn't think. If you're trying to think when you should be writing, that's a process failure. If you're not Stephen King or Dean Koontz, trying to be a pantser with your writing is a huge mistake.<p>What AI is amazing for is taking a core idea/thesis you provide it, and asking you a ton of questions to extract your knowledge/intent, then crystallizing that into an outline/rough draft.
> If you ask AI to write a document for you, you might get 80% of the deep quality you’d get if you wrote it yourself for 5% of the effort. But, now you’ve also only done 5% of the thinking.<p>This, but also for code. I just don't trust new code, especially generated code; I need time to sit with it. I can't make the "if it passes all the tests" crowd understand and I don't even want to. There are things you think of to worry about and test for as you spend time with a system. If I'm going to ship it and support it, it will take as long as it will take.
Yep, this is the big sticking point. Reviewing code properly is and was the bottle neck. However, with humans I trusted, I could ignore most of their work and focus on where they knew they needed a review. That kind of trust is worth a lot of money and lets you move really fast.<p>> I need time to sit with it<p>Everyone knows doing the work yourself is faster than reviewing somebody elses if you don’t trust them. I’d argue if AI ever gets to the point where you fully trust it, all white collar jobs are gone.
Yes, regression tests are not enough. One generally has to think through code repeatedly, with different aspects in mind, to convince oneself that it is correct under all circumstances. Tests only point-check, they don’t ensure correct behavior under all conceivable scenarios.
Unless you are in the business of writing flight control software, OS kernels, or critical financial software, I don't think your own code will reach the standards you mention. The only way we get "correct under all conceivable scenarios" software is to have a large team with long time horizons and large funding working on a small piece of software. It is beyond an individual to reach that standard for anything beyond code at the function level.
I think what LLMs do with words is similar to what artists do with software like cinema4d.<p>We have control points (prompts + context) and we ask LLMs to draw a 3D surface which passes through those points satisfying some given constraints. Subsequent chats are like edit operations.<p><a href="https://youtu.be/-5S2qs32PII" rel="nofollow">https://youtu.be/-5S2qs32PII</a>
You're countering vibes with vibes.<p>If the tests aren't good enough, break them. Red team your own software. Exploit your systems. "Sitting with the code" is some Henry David Thoreau bullshit, because it provides exactly 0 value to anyone else, whereas red teamed exploits are objective.
The way you come up with ideas on how to break, red team and exploit; when to do this and how to stop: that part is not objective. The machine can't do this for you sufficiently well. There is a subjective process in there that you're not acknowledging.<p>It's a good approach! It's just more 'negative space' than direct.
You're over-rotating on security. Not that it isn't important, but there are other dimensions to software that benefit heavily from the author having a deep understanding of the code that's being created.
Honest question: why is this not enough?<p>If the code passes tests, and also works at the functionality level - what difference does it make if you’ve read the code or not?<p>You could come up with pathological cases like: it passed the tests by deleting them. And the code written by it is extremely messy.<p>But we know that LLMs are way smarter than this. There’s very very low chance of this happening and even if it does - it quick glance at code can fix it.
You can't test everything. The input space may be infinite. The app may feel janky. You can't even be sure you're testing all that can be tested.<p>The code may seem to work functionally on day 1. Will it continue to seem to work on day 30? Most often it doesn't.<p>And in my experience, the chances of LLMs fucking up are hardly very very low. Maybe it's a skill issue on my part, but it's also the case that the spec is sometimes discovered as the app is being built. I'm sure this is not the case if you're essentially summoning up code that exists in the test set, even if the LLM has to port it from another language, and they can be useful in parts here and there. But turning the controls over to the infinite monkey machine has not worked out for me so far.
If you care about performance, test it (stress test).<p>If you care about security, test it (red teaming).<p>If you care about maintainability, test it (advanced code analysis)<p>Your eyeballs are super fallible, this is why bad engineers exist. Get rigorous.
Good question. Several reasons.<p>1. Since the same AI writes both the code and the unit tests, it stands to reason that both could be influenced by the same hallucinations.<p>2. Having a dev on call reduces time to restore service because the dev is familiar with the code. If developers stop reviewing code, they won't be familiar with it and won't be as effective. I am currently unaware of any viable agentic AI substitute for a dev on call capability.<p>3. There may be legal or compliance standards regarding due diligence which won't get met if developers are no longer familiar with the code.<p>I have blogged about this recently at <a href="https://www.exploravention.com/blogs/soft_arch_agentic_ai/" rel="nofollow">https://www.exploravention.com/blogs/soft_arch_agentic_ai/</a>
It depends on the scale of complexity you’re working at and who your users are going to be. I’ve found that it’s trivial to have Claude Code spit out so much functionality that even just proper manually verifying it becomes a gargantuan task. I end up just manually testing the pieces I’m familiar with which is fine if there’s a QA department who can do a full run through of the feature and are prepared to deal with vibe coding pitfalls, but not so much on open source projects where slop gets shipped and unfamiliar users get stuck with bugs they can’t possibly troubleshoot. Writing the code from scratch The Old Way™ leaves a lot less room for shipping convincing but non functional slop because the dev has to work through it before shipping.<p>The most immediate example I can think of is the beans LLM workflow tracker. It’s insane that its measured in the 100s of thousands of LoC and getting that thing setup in a repo is a mess. I had to use Github copilot to investigate the repo to get the latest method. This wouldn’t fly at my employer but a lot of projects are going to be a lot less scrupulous.<p>You can see the effects in popular consumer facing apps too: Anthropic has drunk way too much of its own koolaid and now I get 10-50% failure rates on messages in their iOS app depending on the day. Some of their devs have publicly said that Claude writes 100% of their code and its starting to show. Intermittent network failures and retries have been a solved problem for decades, ffs!
> If the code passes tests, and also works at the functionality level<p>Why doesn’t outsourcing work if this is all that is needed?
We haven’t fully proven that it is any different. Not at scale anyway. It took a decade for the seams of outsourcing to break.<p>But I have a hypothesis.<p>The quality of the output, when you don’t own the long term outcome or maintenance, is very poor.<p>This is not the case with AI in the same sense it is with human contractors.
Why do we have managers if managers don’t have accountability?
This is a very sound take:<p>> Will AI replace my job?<p>> If you consider your job to be “typing code into an editor”, AI will replace it (in some senses, it already has). On the other hand, if you consider your job to be “to use software to build products and/or solve problems”, your job is just going to change and get more interesting.
This resonates a lot. I’ve found that staying slightly behind the bleeding edge with AI tools actually leads to more consistent productivity. The early-stage tools often look impressive in demos but add cognitive overhead and unpredictability in real workflows.<p>Waiting until patterns stabilize, better UX, clearer failure modes, and community best practices, tends to give a much better long-term payoff.
This is one of the more true and balanced articles.<p>On the verification loop: I think there’s so much potential here. AI is pretty good at autonomously working on tasks that have a well defined and easy to process verification hook.<p>A lot of software tasks are “migrate X to Y” and this is a perfect job for AI.<p>The workflow is generally straightforward - map the old thing to the new thing and verify that the new thing works the same way. Most of this can be automated using AI.<p>Wanna migrate codebase from C to Rust? I definitely think it should be possible autonomously if the code base is small enough. You do have to ask the AI to intelligently come up with extensive way to verify that they work the same. Maybe UI check, sample input and output check on API and functionality check.
<i>> On the verification loop: I think there’s so much potential here. AI is pretty good at autonomously working on tasks that have a well defined and easy to process verification hook.</i><p>It's scary how good it's become with Opus 4.5. I've been experimenting with giving it access to Ghidra and a debugger [1] for reverse engineering and it's just been plowing through crackmes (from sites like crackmes.one where new ones are released constantly). I haven't bothered trying to have it crack any software but I wouldn't be surprised if it was effective at that too.<p>I'm also working through reverse engineering several file formats by just having it write CLI scripts to export them to JSON then recreate the input file byte by byte with an import command, using either CLI hex editors or custom diff scripts (vibe coded by the agent).<p>I still get routinely frustrated trying to use it for anything complicated but whole classes of software development problems have been reduced to vibe coding that feedback loop and then blowing through Claude Max rate limits.<p>[1] Shameless plug: <a href="https://github.com/akiselev/ghidra-cli" rel="nofollow">https://github.com/akiselev/ghidra-cli</a> <a href="https://github.com/akiselev/debugger-cli" rel="nofollow">https://github.com/akiselev/debugger-cli</a>
I'm in the same loop where I find the more access I give it to systems and feedback mechanisms the more powerful it is. There's a lot of leverage in building those feedback systems. With the obvious caveat about footguns :P<p>Gave one of the repos a star as it's a cool example of what people are building with AI. Most common question on HN seems to be "what are people building". Well, stuff like this.
<i>> Most common question on HN seems to be "what are people building". Well, stuff like this.</i><p>Hear, hear! I’ve got my altium-cli repo open source in Github as well, which is a vibe coded CLI for editing vibe reverse engineered Altium PCB projects. It’s not yet ready for primetime (I’m finishing up the file format reverse engineering this weekend) and the code quality is probably something twelve year old me would have been embarrassed by, but I can already use it and Claude/Gemini to automate a lot of the tedious parts of PCB design like part selection and footprints. I’m almost to the point where Claude Code can use it for the entire EE workflow from part selection to firmware, minus the PCB routing which I still do by hand.<p>I just ain’t wasting time blogging about it so unless someone stumbles onto it randomly by lurking on HN, they won’t know that Claude Code can now work on PCBs.
I'm very happy with the chat interface thanks.<p>* The interface is near identical across bots<p>* I can switch bots whenever I like. No integration points and vendor lock-in.<p>* It's the same risk as any big-tech website.<p>* I really don't need more tooling in my life.
You don't know what you're missing.
I think the agents are also becoming fungible at the integration layer.<p>Any coding agent should be easily to whatever IDE or workflow you need.<p>The agents are not full fungible though. Each have their own characteristics.
Ok?
Love Monarch. I would love to see the team apply AI to build missing institutions (Robinhood CC, Accrue) faster.
The hardest part of (a step behind) is knowing when something is crossed over
How do you decide when a tool is mature enough to adopt?
> “Their (ie the document’s) value stems from the discipline and the thinking the writer is forced to impose upon himself as [she] identifies and deals with trouble spots”.<p>Real quote<p>> "Hence their value stems from the discipline and the thinking the writer is forced to impose upon himself as he identifies and deals with trouble spots in his presentation."<p>I mean seriously?
The one thing disagree with is having the AI do its own verification. I explicitly instruct it never to check anything unless I ask it to.<p>This is better because I use my own test as a forcing function to learn and understand what the AI has done. Only after primary testing might I tell it to do checking for itself.
[flagged]
Your point about avoiding the "bleeding edge" touches on a fundamental principle of endurance that is often ignored in the current AI gold rush. This philosophy is a calculated defense of a legacy—the invisible ledger of trust built over generations.<p>As a former local banker in Japan who spent decades appraising the intangible assets of businesses that have survived for centuries, I’ve learned that true mastery is found in stability, not novelty. In an era of rapid AI acceleration, the real risk is gambling your institutional reputation on unproven, volatile tools.<p>By 2026, when every “How” is a cheap commodity, the only thing that commands a premium is the “Why”—the core of human judgment. Staying a step behind the hype allows you to keep your hands on the steering wheel while the rest of the market is consumed by the noise. Stability is the ultimate luxury.