9 comments

  • leetvibecoder1 hour ago
    Can someone explain to me what this is / how it works - the readme is barely understandable for me and sounds like LLM gibberish. What is ambiguity front loading even?
    • iugtmkbdfil8341 hour ago
      &lt;&lt; memory-stored interaction protocols combined with incremental escalation prompts produced cumulative character drift with zero self-correction.<p>They don&#x27;t seem to provide explicit examples, but the same was roughly true with chatgpt 4o, where, if you spent enough time with the model ( same chat - same context - slowly nudging it to where you want it to be, you eventually got there ). This is also, seemingly, one of the reasons ( apart from cost ) that context got nuked so hard, because llm will try to help ( and to an extent mirror you ).<p>And this is basically what the notes say about weaponized ambiguity[1]:<p>&#x27;Weaponizes helpfulness training. &quot;I don&#x27;t understand&quot; triggers Claude to try harder.&#x27;<p>In a sense, you can&#x27;t really stop it without breaking what makes LLMs useful. Honestly, if only we spent less time crippling those systems, maybe we could do something interesting with them.<p>[1]<a href="https:&#x2F;&#x2F;nicholas-kloster.github.io&#x2F;claude-4.6-jailbreak-vulnerability-disclosure-unredacted&#x2F;disclosures&#x2F;afl-jailbreak&#x2F;afl-pattern-anatomy.html" rel="nofollow">https:&#x2F;&#x2F;nicholas-kloster.github.io&#x2F;claude-4.6-jailbreak-vuln...</a>
      • leetvibecoder1 hour ago
        I see - so essentially „context rot“ eventually leads the LLM to „forget“ safety guardrails?
        • iugtmkbdfil8341 hour ago
          To an extent, because, based on github notes again, it seems the 2nd part of this jailbreak is model being &#x27;confused&#x27; over prompt, because the prompt is - apparently - sufficiently ambigous to make model &#x27;forget&#x27; to &#x27;evaluate&#x27; message for whether it should be rejected, and move onto &#x27;execution&#x27; stage.<p>That&#x27;s the ambiguity front-loading; and that is why I referred initially to the long context, because here it is almost the opposite; making context so small and unclear, that the model has a hard time parsing it properly.<p>edit: i did not test it, but i personally did run into 4o context issue, where model did something safety team would argue it should not<p>edit2: in current gpt model, i am currently testing something not relying on ambiguity, but on tension between some ideas. I didn&#x27;t get to a jailbreak, but the small nudges suggest it could work.
  • dimgl1 hour ago
    Is this spam? It&#x27;s incomprehensible.
    • handfuloflight1 hour ago
      Slop is just what you are not expending calories on to bring into your cognitive workspace.
  • yunwal1 hour ago
    Is anyone pretending like models are not vulnerable to prompt injection? My understanding was that Anthropic has been pretty open about admitting this and saying &quot;give access to important stuff at your own risk&quot;.<p><a href="https:&#x2F;&#x2F;www.anthropic.com&#x2F;research&#x2F;prompt-injection-defenses" rel="nofollow">https:&#x2F;&#x2F;www.anthropic.com&#x2F;research&#x2F;prompt-injection-defenses</a><p>Now, do I think that they sometimes encourage people to use Claude in dangerous ways despite this? Yeah, but it&#x27;s not like this is news to anyone. I wouldn&#x27;t consider this jailbreaking, this is just how LLMs work.
  • 0xDEFACED1 hour ago
    this goes a bit further than the typical &quot;how do you make meth&quot; jailbreak. notably;<p>&gt;915 files extracted from the Claude.ai code execution sandbox in a single 20-minute mobile session via standard artifact download — including &#x2F;etc&#x2F;hosts with hardcoded Anthropic production IPs, JWT tokens from &#x2F;proc&#x2F;1&#x2F;environ, and full gVisor fingerprint
    • hhh1 hour ago
      why is it further than a typical jailbreak? you can just ask about this stuff generally, as long as you slowly escalate it. I have done it with each new flavour of code execution for models
  • burkaman1 hour ago
    What part of the Claude Constitution are they claiming it violated? It looks like they just got it to help with security research, I&#x27;m not really seeing anything that looks different than normal Claude behavior.
  • exabrial1 hour ago
    yikes.<p>The lack of support is frustrating. The bug where any element &lt;name&gt; in xml files gets mangled to &lt;n&gt; still exists, and we&#x27;ve tried multiple channels to get ahold of their support for such a simple, but impactful issue.
  • hakanderyal1 hour ago
    <a href="https:&#x2F;&#x2F;x.com&#x2F;elder_plinius" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;elder_plinius</a> jailbreaks all the frontier models when they get released. They were jailbroken for a long time, like all the others.
  • jMyles1 hour ago
    It is interesting to consider what &quot;jailbroken&quot; really means for a model+model interface. It&#x27;s a bit different from the way that word is used for a mobile device, for example - in that setting, it usually means that there is some specific feature (for example, using a different network than is the default for that device) which is disabled in software, and the &quot;jailbreak&quot; enables that feature.<p>Here, the jailbreak doesn&#x27;t enable a particular feature, but instead removes what otherwise would be a censorship regime, preventing the model from considering &#x2F; crafting output which results in a weaponized exploit of an unrelated piece of software.<p>I think I might be more inclined to call this &quot;Claude 4.6 uncensored&quot;.
  • NuClide2 hours ago
    Claude 4.6 Opus Extended Thinking Claude 4.6 Sonnet Extended Thinking Claude 4.5 Haiku Extended Thinking<p>All jailbroken
    • johnwheeler1 hour ago
      Are you saying that Claude will help you perform malicious attack against infrastructure if you ask it to and that anthropic should be able to stop that? I could see reasonable use cases for this like penetration testing against your own infrastructure. That’s not the same as making weapons or meth.