Something I noticed building multi-agent pipelines: the ablation compounds. Had a 4-step pipeline - summarize, expand, review, refine - and by step 3 everything had the same rhythm and vocabulary. Anchoring the original source text explicitly at each step helped, but only partially.<p>The more interesting cause I think: RLHF is the primary driver, not just the architecture. Fine-tuning is trained on human preference ratings where "clear," "safe," and "inoffensive" consistently win pairwise comparisons. That creates a training signal that literally penalizes distinctiveness - a model that says something surprising loses to one that says something expected. Successful RLHF concentrates probability mass toward the median preferred output, basically by definition.<p>Base models - before fine-tuning - are genuinely weirder. More likely to use unusual phrasing, make unexpected associative leaps, break register mid-paragraph. Semantic ablation isn't a side effect of the training process, it's the intended outcome of the objective.<p>Which makes the fix hard: you can't really prompt your way out of it once a model is heavily tuned. Temperature helps a little but the distribution is already skewed. Where we've gotten better results is routing "preserve the voice" tasks to less-tuned models, and saving the heavily RLHF'd models for structured extraction and classification where blandness is actually what you want.
This is a good statement of what I suspect many of us have found when rejecting the rewriting advice of AIs. The "pointiness" of prose gets worn away, until it doesn't say much. Everything is softened. The distinctiveness of the human voice is converted into blandness. The AI even says its preferred rephrasing is "polished" - a term which specifically means the jaggedness has been removed.<p>But it's the jagged edges, the unorthodox and surprising prickly bits, that tear open a hole in the inattention of your reader, that actually gets your ideas into their heads.
I think it’s essential to realize that AI is a tool for mainstream tasks like composing a standard email and not for the edges.<p>The edges are where interesting stuff happens. The boring part can be made more efficient. I don’t need to type boring emails, people who can’t articulate well will be elevated.<p>It’s the efficient popularization of the boring stuff. Not much else.
I think that mostly depends on how good a writer you are. A lot of people aren't, and the AI legitimately writes better. As in, the prose is easier to understand, free of obvious errors or ambiguities.<p>But then, the writing is also never great. I've tried a couple of times to get it to write in the style of a famous author, sometimes pasting in some example text to model the output on, but it never sounds right.
I find most people can write way better than AI, they simply don’t put in the effort.<p>Which is the real issue, we’re flooding channels not designed for such low effort submissions. AI slop is just SPAM in a different context.
I am really conflicted about this because yes, I think that an LLM can be an OK writing aid in utilitarian settings. It's not going to teach you to write better, but if the goal is just to communicate an idea, an LLM can probably help the average person express it more clearly.<p>But the critical point is that you need to stay in control and own the result. And a lot of people just delegate the entire process to an LLM: "here's a thought I had, write a blog post about it"; "write a design doc for a system that does X", "write a book about how AI can change your life". And then they ship it and then outsource the process of actually checking the output to others.<p>It also results in the creation of content that, frankly, shouldn't exist because it has no reason to exist - and the fact that writing takes effort used to be a property that limited the production of slop. The number of online content that doesn't say anything at all has absolutely exploded in the past 2-3 years.
I think it is also fairly similar to the kind of discourse a manager in pretty much any domain will produce.<p>He lacks (or lost thru disuse) technical expertise on the subject, so he uses more and more fuzzy words, leaky analogies, buzzwords.<p>This maybe why AI generated content has so much success among leaders and politicians.
Mediocrity as a Service
> But it's the jagged edges, the unorthodox and surprising prickly bits, that tear open a hole in the inattention of your reader, that actually gets your ideas into their heads.<p>This brings to mind what I think is a great description of the process LLMs exert on prose: sanding.<p>It's an algorithmic trend towards the median, thus they are sanding down your words until they're a smooth average of their approximate neighbors.
I'm sure this can be corrected by AI companies.
The question is… why? What is the actual human benefit (not monetary).
Just let my work have a soul, please.
[flagged]
I personally think “generative AI” is a misnomer. More I understand the mathematics behind machine learning more I am convinced that it should not be used to generate text, images or anything that is meant for people to consume, even if it is the most blandest of email. Sometimes you might get lucky, but most of the time you only get what the most boring person in the most boring cocktail party would say if forced to be creative with a gun pointed to his head. It can help in multitude of other ways, help human in the creative process itself, but generating anything even mildly creative by itself… I’ll pass.
Bible Scholar and youtube guy Dan McClellan had an amazing "high entropy" phrase that slayed me a few days ago.<p><a href="https://youtu.be/605MhQdS7NE?si=IKMNuSU1c1uaVCDB&t=730" rel="nofollow">https://youtu.be/605MhQdS7NE?si=IKMNuSU1c1uaVCDB&t=730</a><p>He ended a critical commentary by suggesting that the author he was responding to should think more critically about the topic rather than repeating falsehoods because "they set off the tuning fork in the loins of your own dogmatism."<p>Yeah, AI could not come up with that phrase.
> The AI identifies unconventional metaphors or visceral imagery as "noise" because they deviate from the training set's mean.<p>That's certainly a take. In the translation industry (the primogenitor and driver for much of the architecture and theory of LLMs) they're known for making extremely unconventional choices to such a degree that it actively degrades the quality of translation.
The "AI voice" is everywhere now.<p>I see it on recent blog posts, on news articles, obituaries, YT channels. Sometimes mixed with voice impersonation of famous physicists like Feynman or Susskind.<p>I find it genuinely soul-crushing and even depressing, but I may be over sensitive to it as most readers don't seem to notice.
> The "AI voice" is everywhere now.<p>Maybe I'm going crazy but I can smell it in the OP as well.
Yes, I get more and more visceral reactions to it. I'm reminded of JPEG artifacts - unnoticeable in 1993!
YES this hits the nail on something I've been trying to express for some time now. Semantic ablation: love it, going to use that a lot not now when arguing why someone's ChatGPT-washed email sucks.<p>Semantic ablation is also why I'm doubtful of everyone proclaiming that Opus 4 would be AGI if we just gave it the right agent harness and let all the agents run free on the web. In reality they would distill it to a meaningless homogeneous stew.
Yes I noticed this as well. I was last writing up a landing page for our new studio. Emotion filled. Telling a story. I sent it through grok to improve it. It removed all of the character despite whatever prompt I gave. I'm not a great writer, but I think those rough edges are necessary to convey the soul of the concept. I think AI writing is better used for ideation and "what have I missed?" and then write out the changes yourself.
> I think AI writing is better used for ideation<p>It shocks me when proponents of AI writing for ideation aren't concerned with *Metaphoric Cleansing* and *Lexical Flattening* (to use two of the terms defined in the article)<p>Doesn't it concern you that the explanation of a concept by the AI may represent only a highly distorted caricature of the way that concept is actually understood by those who use it fluently?<p>Don't get me wrong, I think that LLMs are very useful as a sort of search engine for yet-unknown terms. But once you know *how* to talk about a concept (meaning you understand enough jargon to do traditional research), I find that I'm far better off tracking down books and human authored resources than I am trying to get the LLM to regurgitate its training data.
I wonder how much of it could be prompted away.<p>For example the anthropic Frontend Design skill instructs:<p>"Typography: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics; unexpected, characterful font choices. Pair a distinctive display font with a refined body font."<p>Or<p>"NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character." 1<p>Maybe sth similar would be possible for writing nuances.<p>1 <a href="https://github.com/anthropics/skills/blob/main/skills/frontend-design/SKILL.md" rel="nofollow">https://github.com/anthropics/skills/blob/main/skills/fronte...</a>
I'd like to see some concrete examples that illustrate this - as it stands this feels like an opinion piece that doesn't attempt to back up its claims.<p>(Not necessarily disagreeing with those claims, but I'd like to see a more robust exploration of them.)
Have you not seen it any time you put any substantial bit of your own writing through an LLM, for advice?<p>I disagree pretty strongly with most of what an LLM suggests by way of rewriting. They're absolutely appalling writers. If you're looking for something beyond corporate safespeak or stylistic pastiche, they drain the blood out of everything.<p>The skin of their prose lacks the luminous translucency, the subsurface scattering, that separates the dead from the living.
Kaffee: Corporal, would you turn to the page in this book that says where the mess hall is, please?<p>Cpl. Barnes: Well, Lt. Kaffee, that's not in the book, sir.<p>Kaffee: You mean to say in all your time at Gitmo, you've never had a meal?<p>Cpl. Barnes: No, sir. Three squares a day, sir.<p>Kaffee: I don't understand. How did you know where the mess hall was if it's not in this book?<p>Cpl. Barnes: Well, I guess I just followed the crowd at chow time, sir.<p>Kaffee: No more questions.
It <i>is</i> an opinion piece. By a dude working as a "Professor of Pharmaceutical Technology and Biomaterials at the University of Ferrara".<p>It has all the tropes of not understanding the underlying mechanisms, but repeating the common tropes. Quite ironic, considering what the author's intended "message" is. Jpeg -> jpeg -> jpeg bad. So llm -> llm -> llm must be bad, right?<p>It reminds me of the media reception of that paper on model collapse. "Training on llm generated data leads to collapse". That was in 23 or 24? Yet we're not seeing any collapse, despite models being trained mainly on synthetic data for the past 2 years. That's not how any of it works. Yet everyone has an opinion on how bad it works. Jesus.<p>It's insane how these kinds of opinion pieces get so upvoted here, while worth-while research, cool positive examples and so on linger in new with one or two upvotes. This has ceased to be a technical subject, and has moved to muh identity.
Yeah, reading the other comments on this thread this is a classic example of that Hacker News (and online forums in general) thing where people jump on the chance to talk about a topic driven purely by the headline without engaging with the actual content.<p>(I'm frequently guilty of that too.)
Even if that isn't the case, isn't it the fact the AI labs don't want their models to be edgy in any creative way, choose a middle way (buddhism) so to speak. Are there AI labs who are training their models to be maximally creative?
> Yet we're not seeing any collapse, despite models being trained mainly on synthetic data for the past 2 years.<p>Maybe because researchers learned from the paper to avoid the collapse? Just awareness alone often helps to sidestep a problem.
No one did what the paper actually proposed. It was a nothing burger in the industry. Yet it was insanely popular on social media.<p>Same with the "llms don't reason" from "Apple" (two interns working at Apple, but anyway). The media went nuts over it, even though it was littered with implementation mistakes and not worth the paper it was(n't) printed on.
This isn't new to AI. The same kind of thing happens in movie test screenings, or with autotune. If something is intended for a large audience, there's always an incentive to remove the weird stuff.
Race to the middle really sums up how I feel about AI.
This matches what I saw when I tried using AI as an editor for writing.<p>It wanted to replace all the little bits of me that were in there.
The original title of the article is: "Why AI writing is so generic, boring, and dangerous"<p>Why was the title of of the link on HackerNews updated to remove the term "Dangerous"?<p>The term was in the link on HackerNews for the first hour or so that this post was live.
I wonder why AI labs have not worked on improving the quality of the text outputs. Is this as the author claims a property of the LLMs themselves? Or is there simply not much incentive to create the best writing LLM?
The argument is that the best writing is the unexpected, while an LLM's function is to deliver the expected next token.
Even more precisely, human writing contains unpredictability that is either more or less intention (what might be called authors intent), as well as much more subconsciously added (what we might call quirks or imprinted behavior).<p>The first requires intention, something that as far as we know, LLMs simply cannot truly have or express. The second is something that can be approximated. Perhaps very well, but a mass of people using the same models with the same approximationa still lead to loss of distinction.<p>Perhaps LLMs that were fully individually trained could sufficiently replicate a person's quirks (I dunno), but that's hardly a scalable process.
Yeah, that makes banana.
I remember an article a few weeks back[1] which mentioned the current focus is improving the technical abilities of LLMs. I can imagine many (if not most) of their current subscribers are paying for the technical ability as opposed to creative writing.<p>This also reminded me that on OpenRouter, you can sort models by category. The ones tagged "Roleplay" and "Marketing" are probably going to have better writing compared to models like Opus 4 or ChatGPT 5.2.<p>[1]: <a href="https://www.techradar.com/ai-platforms-assistants/sam-altman-admits-openai-screwed-up-the-writing-quality-on-chatgpt-5-2-and-promises-future-versions-wont-neglect-it" rel="nofollow">https://www.techradar.com/ai-platforms-assistants/sam-altman...</a>
I mean there's tons of better-writing tools that use AI like Grammarly etc. For actual general-purpose LLMs, I don't think there's much incentive in making it write "better" in the artistic sense of the world... if the idea is to make the model good at tasks in general and communicate via language, that language should sound generic and boring. If it's too artistic or poetic or novel-like, the communication would appear a bit unhinged.<p>"Update the dependencies in this repo"<p>"Of course, I will. It will be an honor, and may I say, a beautiful privilege for me to do so. Oh how I wonder if..." vrs "Okay, I'll be updating dependencies..."
I wish it would just say "k, updated xyz to 1.2.3 in Cargo.toml" instead of the entire pages it likes to output. I don't want to read all of that!
I mean, no one is asking for artistic writing, just not some obvious AI slop. The fact that we all can now easily determine that some text has been written / edited by AI is already an issue. No amount of prompting can help.
Yeah but thats not what I am saying. I am saying its default writing style is for communicating with the user, not producing content/text hence it has that distinctive style we all recognise. If you want AI writing thats not slop, there are tools that are trying to do that but the default LLM writing style is unlikely to change imo.
That's like asking why McDonald's doesn't improve the quality of their hamburger. They can, but only within the bounds of mass produced cheap crap that maximizes profit. Otherwise they'd be a fundamentally different kind of company.
How much money would it take for me to take an open weight model, treat it nice, and go have some fun? Maybe some thousands, right?
Couldn't you simply increase the temperature of the model to somewhat mitigate this effect?
I kind of think of that as just increasing the standard deviation. Its been a while since I experimented with this, but I remember trying a temp of 1 and the output was gibberish, like base64 gibberish. So something like 0.5 doesn't necessarily seem to solve this problem, it just flattens the distribution and makes the output less coherent, with rarer tokens, but still the same underlying distribution.
you have to know that your "simply" is carrying too much weight. here's some examples of why just temperature is not enough, you need to run active world models <a href="https://www.latent.space/p/adversarial-reasoning" rel="nofollow">https://www.latent.space/p/adversarial-reasoning</a>
When applied to insightful writing, that is much more likely to dull the point rather than preserve or sharpen it.
Could we invert a sign somewhere and get the opposite effect?<p>(Obviously a different question from "is an AI lab willing to release that publicly” ;)
I think they can fix all that but they can't fix the fact that the computer has no intention to communicate. They could imbue it with agency to fix that too, but I much prefer it the way things are.
Because you simply can't engineer creativity. Maybe you can describe where it comes from, in a circuitous, abstract way with mathematics (and ultimately run face first into <i>ħ</i> and then run in circles for eternity). But to engineer it, you'd have to start over from the first principles of the stuff of the cosmos. One's a map and the other the territory.
Those transformations happen to mirror what happens to human intelligence when you take antipsychotics. Please know the risks before taking them. They are innumerable and generally irreversible.
> Semantic ablation is the algorithmic erosion of high-entropy information. Technically, it is not a "bug" but a structural byproduct of greedy decoding and RLHF (reinforcement learning from human feedback).<p>> Domain-specific jargon and high-precision technical terms are sacrificed for "accessibility." The model performs a statistical substitution, replacing a 1-of-10,000 token with a 1-of-100 synonym, effectively diluting the semantic density and specific gravity of the argument.<p>> The logical flow – originally built on complex, non-linear reasoning – is forced into a predictable, low-perplexity template. Subtext and nuance are ablated to ensure the output satisfies a "standardized" readability score, leaving behind a syntactically perfect but intellectually void shell.<p>What a fantastic description of the mechanisms by LLMs erase and distort intelligence!<p>I agree that AI writing is generic, boring and dangerous. Further, I only think someone could feel this way if they don't have a genuine appreciation for writing.<p>I feel strongly that LLMs are positioned as an anti-literate technology, currently weaponized by imbeciles who have not and will never know the joy of language, and who intend to extinguish that joy for any of those around them who can still perceive it.
As a writer who has been published many times and edited many other writers for publication... It seems like AI can't make stylistic determinations. It is generally good with spelling and grammar but the text it generates is very homogeneous across formats. It's readable but it's not good, and always full of fluff like an online recepie harvesting clicks. It's kind of crap really. If you just need filler it's ok, but if you want something pleasand you definitely still need a human.
Sematic ablation... that's some technobable.
Going off search results, it seems to be a new coinage. I found mostly references to TFA, along with an (ironically obviously AI-written) guide with suggestions for getting LLMs to avoid the issue (just generic "traditional" advice for tuning their output, really). The guide was apparently published today, and I imagine that it's a deliberate response to TFA. But FWIW the term "semantic ablation" does seem to me like something that newer models could invent<p>At any rate, it seems to me like a reasonable label for what's described:<p>> Semantic ablation is the algorithmic erosion of high-entropy information. Technically, it is not a "bug" but a structural byproduct of greedy decoding and RLHF (reinforcement learning from human feedback).<p>> ...<p>> When an author uses AI for "polishing" a draft, they are not seeing improvement; they are witnessing semantic ablation.<p>The metaphor is very apt. Literal polishing is removal of outer layers. Compared to the near-synonym "erosion", "ablation" connotes a deliberate act (ordinarily I would say "conscious", but we are talking about LLMs here). Often, that which is removed is the nuance of near-synonyms (there is no pause to consider whether the author intended that nuance). I don't know if the "character" imparted by broader grammatical or structural choices can be called "semantic", but that also seems like a big part of what goes missing in the "LLM house style".<p>Bluntly: getting AI to "improve" writing, as a fully generic instruction, is naturally going to pull that writing towards how the AI writes by default. Because <i>of course</i> the AI's model of "writing quality" considers that style to be "the best"; <i>that's why it uses it</i>. (Even "consider" feels like anthropomorphizing too much; I feel like I'm hitting the limits of English expressiveness here.)
Meh. Semantic Ablation - but toward a directed goal. If I say "How would Hemingway have said this, provided he had the same mindset he did post-war while writing for Collier's?"<p>Then the model will look for clusters that don't fit what the model consider's to be Hemingway/Colliers/Post-War and suggest in that fashion.<p>"edit this" -> blah<p>"imagine Tom Wolfe took a bunch of cocaine and was getting paid by the word to publish this after his first night with Aline Bernstein" -> probably less blah
the word choice here is so obtuse as to trigger my radar for "is this some kind of parody where this itself was AI generated". it appears to be entirely serious, which is disappointing, it could have been high art.<p>the words TFA is looking for is mode collapse <a href="https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse" rel="nofollow">https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-...</a> and the author could herself learn to write more clearly.
> What began as a jagged, precise Romanesque structure of stone is eroded into a polished, Baroque plastic shell<p>Not to detract from the overall message, but I think the author doesn't really understand Romanesque and Baroque.<p>(as an aside, I'd most likely associate Post-Modernism as an architectural style with the output of LLMs - bland, regurgitative, and somewhat incongruous)
As someone longtime involved in software development, can we call this "best practices" instead of some like "semantic ablation" that nobody understands?
I think you might be missing the point of the article.<p>I agree that the term "semantic ablation" is difficult to interpret<p>But the article describes three mechanisms by which LLMs consistently erase and distort information (Metaphoric Cleansing, Lexical Flattening, and Structural Collapse)<p>The article does not describe best practices; it's a critique of LLM technology and an analysis of the issues that result from using this technology to generate text to be read by other people.
> The model performs a statistical substitution, replacing a 1-of-10,000 token with a 1-of-100 synonym<p>Do we see this in programming too? I don't think so? Unique, rarely used API methods aren't substituted the same way when refactoring. Perhaps that could give us a clue on how to fix that?
Nonsense. I’ve written bland prose for a story and AI made it much better by revising it with a prompt such as this: “Make the vocabulary and grammar more sophisticated and add in interesting metaphors. Rewrite it in the style of a successful literary author.”<p>Etc.
[dead]