<p><pre><code> You're not just using a tool — you're co-authoring the science.
</code></pre>
This README is an absolute headache that is filled with AI writing, terminology that doesn't exist or is being used improperly, and unsound ideas. For example, it focuses a lot on doing "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?), which is an absolute fool's errand because such behavior is trained into the model as a whole and would not be found in any particular layer. I can only assume somebody vibe-coded this and spent way too much time being told "You're absolutely right!" bouncing back the worst ideas
> "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?)<p>This is not what an ablation study is. An ablation study removes and/or swaps out ("ablates") different <i>components</i> of an architecture (be it a layer or set of layers, all activation functions, backbone, some fixed processing step, or any other component or set of components) and/or in some cases other aspects of training (perhaps a unique / different loss function, perhaps a specialized pre-training or fine-tuning step, etc) in order to attempt to better understand which component(s) of some novel approach is/are actually responsible for any observed improvements. It is a very broad research term of art.<p>That being said, the "Ablation Strategies" [1] the repo uses, and doing a Ctrl+F for "ablation" in the README does not fill me with confidence that the kind of ablation being done here is really achieving what the author claims. All the "ablation" techniques seem "Novel" in his table [2], i.e. they are unpublished / maybe not publicly or carefully tested, and could easily not work at all. From later tables, I am not convinced I would want to use these ablations, as they ablate rather huge portions of the models, and so probably do result in massively broken models (as some commenters have noted in this thread elsewhere).<p>[1] <a href="https://github.com/elder-plinius/OBLITERATUS?tab=readme-ov-file#ablation-strategies" rel="nofollow">https://github.com/elder-plinius/OBLITERATUS?tab=readme-ov-f...</a><p>[2] <a href="https://github.com/elder-plinius/OBLITERATUS?tab=readme-ov-file#novel-techniques-2025-2026" rel="nofollow">https://github.com/elder-plinius/OBLITERATUS?tab=readme-ov-f...</a><p>EDIT: As another user mentions, "ablation" has a specific additional narrower meaning in some refusal analyses or when looking at making guardrails / changing response vectors and such. It is just a specific kind of ablation, but see e.g. <a href="https://huggingface.co/blog/mlabonne/abliteration" rel="nofollow">https://huggingface.co/blog/mlabonne/abliteration</a>
I don't know if this particular tool/approach is legit, but LLM ablation is definitely a thing: <a href="https://arxiv.org/abs/2512.13655" rel="nofollow">https://arxiv.org/abs/2512.13655</a>
Hmm, pliny is amazing - if you kept up with him on social media you’d maybe like him
<a href="https://x.com/elder_plinius" rel="nofollow">https://x.com/elder_plinius</a>
I don't know. I scrolled through his recent Tweets and he's sharing things like this $900 snake oil device that "finds nearby microphones" and "sends out AI-generated cancellation signals" to make them unable to record your voice : <a href="https://x.com/aidaxbaradari/status/2028864606568067491" rel="nofollow">https://x.com/aidaxbaradari/status/2028864606568067491</a><p>Try to think for a moment about how a device would "find nearby microphones" or how it would use an AI-generated signal to cancel out your voice at the microphone. This should be setting of BS alarms for anyone.<p>It seems the Twitter AI edgey poster guy is getting meta-trolled by another company selling fake AI devices
The parent comment makes no reference to or comment on the author of the README.<p>It just says "the README sucks." Which, I'm inclined to agree, it does.<p>LLM-generated text has no place in prose -- it yields a negative investment balance between the author and aggregate readers.
Amazing as in his stuff actually works?<p>I just hear him promoting OBLITERATUS all day long and trying to get models to say naughty things
If this qualifies as "amazing" in 2026 then Karpathy and Gerganov must be halfway to godhood by now.
> For example, it focuses a lot on doing "ablation studies", by which it means removing random layers of an already-trained model, to find the source of the refusals(?), which is an absolute fool's errand because such behavior is trained into the model as a whole and would not be found in any particular layer.<p>That doesn't mean there couldn't be a "concept neuron" that is doing the vast majority of heavy lifting for content refusal, though.
It's not just a headache, it's bad
Ironic to see this comment when Pliny, the author of this codebase, is one of the most sophisticated LLM jailbreakers/red-teamers today. So presumptive and arrogant!
Alternately, it's intentional. It very effective filters out people with your mindset. You can decide if that's a good thing or not.
You don't know what you are talking about. Obviously refusal circuitry does not live in one layer, but the repo is built on a paper with sound foundations from an Anthropic scholar working with a DeepMind interpretability mentor: <a href="https://scholar.google.com/citations?view_op=view_citation&hl=en&user=NgyIgX4AAAAJ&citation_for_view=NgyIgX4AAAAJ:qjMakFHDy7sC" rel="nofollow">https://scholar.google.com/citations?view_op=view_citation&h...</a>