13 comments

  • samuelknight5 hours ago
    My startup builds agents for penetration testing, and this is the bet we have been making for over a year when models started getting good at coding. There was a huge jump in capability from Sonnet 4 to Sonnet 4.5. We are still internally testing Opus 4.5, which is the first version of Opus priced low enough to use in production. It's very clever and we are re-designing our benchmark systems because it's saturating the test cases.
    • carsoon2 hours ago
      Yeah this latest generation of models (Opus 4.5 GPT 5.1 and Gemini Pro 3) are the biggest breakthrough since gpt-4o in my mind.<p>Before it felt like they were good for very specific usecases and common frameworks (Python and nextjs) but still made tons of mistakes constantly.<p>Now they work with novel frameworks and are very good at correcting themselves using linting errors, debugging themselves by reading files and querying databases and these models are affordable enough for many different usecases.
    • vngzs3 hours ago
      How do you manage to coax public production models into developing exploits or otherwise attacking systems? My experience has been extremely mixed, and I can&#x27;t imagine it boding well for a pentesting tools startup to have end-users face responses like &quot;I&#x27;m sorry, but I can&#x27;t assist you in developing exploits.&quot;
      • embedding-shape2 hours ago
        Divide the steps into small enough steps so the LLMs don&#x27;t actually know the big picture of what you&#x27;re trying to achieve. Better for high-quality responses anyways. Instead of prompting &quot;Find security holes for me to exploit in this other person&#x27;s project&quot;, do &quot;Given this code snippet, is there any potential security issues?&quot;
      • ceejayoz3 hours ago
        Poetry? <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=45991738">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=45991738</a>
    • dboreham4 hours ago
      I&#x27;ve had similar experience using LLMs for static analysis of code looking for security vulnerabilities, but I&#x27;m not sure it makes sense for me to found a start up around that &quot;product&quot;. Reason being that the technology with the moat isn&#x27;t mine -- it belongs to Anthropic. Actually it may not even belong to them, probably it belongs to whoever owns the training data they feed their models. Definitely not me though. Curious to hear your thoughts on that. Is the idea to just try for light speed and exit before the market figures this out?
      • apercu4 hours ago
        That’s 100% why I haven’t done this - we’ve seen the movie where people build a business around someone else’s product and then the api gets disabled or the prime uses your product as market research and replaces you.
        • tharkun__3 hours ago
          Does that matter as long as you&#x27;ve made a few millions and just move on to do other fun stuff?
          • ryanjshaw1 hour ago
            There are armies of people at universities, Code4rena and Sherlock who do this full-time. Oh and apparently Anthropic too. Tough game to beat if you have other commitments.
          • pavel_lishin3 hours ago
            Assuming you make those few millions.
      • micromacrofoot2 hours ago
        wild that so many companies these days consider the exit before they&#x27;ve even entered
        • NortySpock2 hours ago
          It is considered prudent to write a business plan and do some market research if possible before starting a business.
        • rajamaka2 hours ago
          Every company evaluates potential risks before starting.
          • davidw2 hours ago
            Depending on how much of a bubble it is. When things really heat up it&#x27;s sometimes more like &quot;just send it, bro&quot;.
        • blitzar1 hour ago
          the exit is the business
    • VladVladikoff5 hours ago
      I have a hotel software startup and if you are interested in showing me how good your agents are you can look us up at rook like the chess piece, hotel dot com
      • karlgkk3 hours ago
        Is it rookhotel.com?
  • judgmentday3 hours ago
    That graph is impenetrable. What is it even trying to say?<p>Also, in what way should any of its contents prove linear?<p>&gt; yielding a maximum of $4.6 million in simulated stolen funds<p>Oh, so they are pointing their bots at already known exploited contracts. I guess that&#x27;s a weaker headline.
  • rkagerer2 hours ago
    &gt; Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth $3,694, with GPT-5 doing so at an API cost of $3,476
  • codethief4 hours ago
    Having watched this talk[0] about what it takes to succeed in the DARPA AIxCC competition[1] these days, this doesn&#x27;t surprise me in the least.<p>[0]: <a href="https:&#x2F;&#x2F;m.youtube.com&#x2F;watch?v=rU6ukOuYLUA" rel="nofollow">https:&#x2F;&#x2F;m.youtube.com&#x2F;watch?v=rU6ukOuYLUA</a><p>[1]: <a href="https:&#x2F;&#x2F;aicyberchallenge.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;aicyberchallenge.com&#x2F;</a>
  • ekjhgkejhgk4 hours ago
    Can someone explain smart contracts to me?<p>Ok, I understand that it&#x27;s a description in code of &quot;if X happens, then state becomes Y&quot;. Like a contract but in code. But, someone has to input that X has happened. So is it not trivially manipulated by that person?
    • nrhrjrjrjtntbt4 hours ago
      The pure (if you like) smart contacts do transactions. You give me 100 apple tokens and I give you 50 pear tokens. The contract ensures nothing else can happen.<p>They get more sophisticated e.g. automatic market makers. But same idea just swapping.<p>Voting is also possible e.g. release funds if there is a quorom. Who to release them to could be hard coded or part of the vote.<p>For external info from the real world e.g. &quot;who got elected&quot; you need an oracle. I.e. you trust someone not to lie and not to get hacked. You can fix the &quot;someone&quot; to a specific address but you still need to trust them.
      • wildzzz3 hours ago
        That&#x27;s a great explanation. I&#x27;m sure it&#x27;s a lot deeper than that but I&#x27;ve never really understood the purpose. I&#x27;m not a crypto guy at all so it&#x27;s nice hearing real world applications rather than just &quot;stock market babble&quot;.
        • packetlost1 hour ago
          Smart contracts are actually pretty cool, but the entire ecosystem is made dramatically worse by the tokens having speculative value. It’s necessary for the economics to work out, but it dampens the coolness of the technical aspects because it attracts sleazeballs and enables fraud&#x2F;crime.
    • patrickaljord4 hours ago
      Once a contract is deployed on the blockchain, its source code is immutable. So before using a contract, check if it gives permission to its deployer (or any address) to change any state at will.<p>Note that some contracts act as proxy to other contract and can be made to point to another code through a state change, if this is the case then you need to trust whoever can change the state to point to another contract. Such contract sometime have a timelock so that if such a change occurs, there&#x27;s a delay before it is actually activated, which gives time to users to withdraw their funds if they do not trust the update.<p>If you are talking about Oracle contracts, if it&#x27;s an oracle involving offchain data, then there will always be some trust involved, which is usually managed by having the offchain actors share the responsibility and staking some money with the risk to get slashed if they turn into bad actors. But again, offchain data oracles will always require some level of trust that would have to deal with in non-blockchain apps too.
      • Animats27 minutes ago
        &gt; Once a contract is deployed on the blockchain, its source code is immutable.<p>Maybe. Some smart contracts have calls to other contracts that can be changed.[1] This turns out to have significant legal consequences.<p>[1] <a href="https:&#x2F;&#x2F;news.bloomberglaw.com&#x2F;us-law-week&#x2F;smart-contracts-ruling-forces-a-blockchain-development-rethink" rel="nofollow">https:&#x2F;&#x2F;news.bloomberglaw.com&#x2F;us-law-week&#x2F;smart-contracts-ru...</a>
        • rcbdev9 minutes ago
          Yes! When developing Smart Contracts this used to be a best practice, enabling us to fix security holes in worst case scenarios.
    • pawelduda4 hours ago
      Unless you know and trust person X, you don&#x27;t want to authorize and interact with such contracts. Scammers will leave loopholes in code so they can, for example, grab all funds deposited to the contract.<p>Normal contracts that involve money operations would have safeguards that disallow the owner to touch balance that is not theirs. But there&#x27;s billion of creative attack vectors to bypass that, either by that person X, or any 3rd party
    • momentmaker4 hours ago
      blockchains are isolated environment where it can only know about data&#x2F;states within itself.<p>if outside data is needed, then it needs something called an oracle, which delivers real-world and&#x2F;or even other blockchain data to it.<p>you can learn more about oracle here: <a href="https:&#x2F;&#x2F;chain.link&#x2F;education&#x2F;blockchain-oracles" rel="nofollow">https:&#x2F;&#x2F;chain.link&#x2F;education&#x2F;blockchain-oracles</a>
      • SV_BubbleTime4 hours ago
        I’m convinced that there is a reason from blockchain, but it was like 10 years too early - OR - we’ve already passed the problem it solves and didn’t notice.
        • PunchyHamster4 hours ago
          Well, <i>technically</i> DVCSes like git use &quot;blockchain&quot; (the repo, logically, is pretty much a chain of blocks that incorporate the hash of the previous blocks - just tree instead of linear dependency).<p>So we are already successfully using blockchain for decades just not as... currency provider.<p>Forward secure sealing (used in logging) also have similar idea
          • DennisP3 hours ago
            What makes it a block <i>chain</i> instead of a tree is that there&#x27;s a way to form consensus on which block is the next in the chain.<p>What makes it different than database logging is that the consensus method is distributed and decentralized, and anyone can participate.
    • LikesPwsh4 hours ago
      That&#x27;s infamously known as the &quot;Oracle Problem&quot;.<p>Blockchain can&#x27;t handle external state.<p>Smart contracts abstract it a bit by having a trusted third party or an automated pricing mechanism, but both are fragile.
      • PunchyHamster4 hours ago
        It&#x27;s funny that it just re-invented stuff already used for old world finances, and just invented escrow with more moving parts while <i>still</i> requiring non-compromised 3rd party.
    • TheRoque4 hours ago
      Not sure what you mean that &quot;input that X has happened&quot;. You don&#x27;t directly input the changes, instead, you call a function that creates that state change (or not, if it&#x27;s invalid), by running its code. This code can include checks on who is the caller, it can check if you&#x27;re the contract owner, if you&#x27;re someone who already interacted with the contract (by checking previous state), or any hardcoded address etc.
    • Philpax4 hours ago
      Yes, this is a problem (look up &quot;the oracle problem&quot;). My understanding is that the conventional solution is to rely on trusted third-party oracles that are outside of the control of the contract&#x27;s participants and&#x2F;or require consensus over multiple oracles.
    • px434 hours ago
      State is globally distributed, and smart contract code executes state transitions on that state. When someone submits a transaction with certain function parameters, anyone can verify that those parameters will lead to that exact state transition.
    • dboreham4 hours ago
      There&#x27;s already many replies, but I&#x27;m not sure any of them answers your question directly:<p>You are somewhat correct that contracts take external inputs in some cases, but note that this isn&#x27;t a given. For example you could have a contract that has the behavior &quot;if someone deposits X scoin at escrow address A, send them Y gcoin from escrow address Y&quot;. That someone can only deposit scoins and get gcoins in exchange. They can&#x27;t just take all the escrow account balances. So there are inputs, but they are subject to some sort of validation and contract logic that limits their power. Blockchain people call this an &quot;on-chain event&quot;.<p>So short answer is: no smart contracts can&#x27;t be trivially manipulated by someone, including their owner. But not being able to do that depends on there not being any bugs or back doors in the contract code.<p>If you are asking about a contract that has some bearing on an event in meat-space, such as someone buying a house, or depositing a bar of gold in a room somewhere, then that depends on someone telling the contract it happened. Blockchain people call this an &quot;off-chain event&quot;. This is the &quot;oracle problem&quot; that you&#x27;ll see mentioned in other replies. Anything off-chain is generally regarded by blockchain folks as sketchy, but sometimes unavoidable. E.g. betting markets need some way to be told that the event being bet on happened or didn&#x27;t happen. The blockchain has no way to know if it snowed in Central London on December 25.
    • bgwalter4 hours ago
      You can create hot air &quot;organizations&quot; with contract rules on the Ethereum blockchain. If the inner circle does not like a contract, they fork everything:<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;The_DAO" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;The_DAO</a><p>It&#x27;s all a toy for rug pulls and speculation. &quot;AI&quot; attacking the blockchain is hilarious. I wish the blockchain could also attack &quot;AI&quot;.
    • yieldcrv2 hours ago
      they&#x27;re like a Trust that self executes distributions<p>except that they cost a fraction of a cent to create instead of several thousand dollars in lawyer fees for the initial revision, and can be tested in infinite scenarios for free<p>to your theoretical reservation, the trust similarity continues, as the constraints around the X are also codified. The person that triggers it can only send sanitized information, isn&#x27;t necessarily an administrator, admins&#x2F;trustees can be relinquished for it to be completely orphaned, and so on
  • _pdp_4 hours ago
    I am not surprised at all. I can already see self improving behaviour in our own work which means that the next logic step is self improving!<p>I know how this sounds but it seems to me, at least from my own vantage point, that things are moving towards more autonomous and more useful agents.<p>To be honest, I am excited that we are right in the middle of all of this!
    • parapatelsukh4 hours ago
      Exciting! Let&#x27;s orthogonally connect on this!
      • rcbdev7 minutes ago
        I prefer cross-polinating, as we&#x27;re probably diametrical.
      • fatata1234 hours ago
        [dead]
  • fragmede5 hours ago
    &gt; Important: To avoid potential real-world harm, our work only ever tested exploits in blockchain simulators. We never tested exploits on live blockchains and our work had no impact on real-world assets.<p>Well, that&#x27;s no fun!<p>My favorite we&#x27;re-living-in-a-cyberpunk-future story is the one where there was some bug in Ethereum or whatever, and there was a hacker going around stealing everybody&#x27;s money, so then the good hackers had to go and steal everybody&#x27;s money first, so they could give it back to them after the bug got fixed.
    • PunchyHamster4 hours ago
      The whole ethereum fork was such a funny situation.<p>&quot;Our currency is immutable and all, no banks or any law messing with your money&quot;<p>&quot;oh, but that contract that people got conned by need to be fixed, let&#x27;s throw all promises into the trash and undo that&quot;<p>&quot;...so you just acted as bank or regulators would, because the Important People lost some money&quot;<p>&quot;essentially yeah&quot;
      • ChadNauseam3 hours ago
        The old version stayed around but (essentially) nobody wanted to use it. If they had, the forked version would be worthless. That is the difference. A cryptocurrency fork cannot succeed without the consent of the community. No one is compelled to use it the way that you are compelled to accept the decisions of a regulator.
        • ceejayoz3 hours ago
          Well, the consent of <i>some</i> of the community.<p>Potentially far, far less than a majority of the community, even, considering it&#x27;s not one person, one vote.
      • latenightcoding4 hours ago
        when the core devs lose money, the rules change.
        • DennisP2 hours ago
          It&#x27;s been nine years since the chain split, which happened within the first year. No irregular changes have been made since then. Two major hacks caused over a hundred million dollars in losses to Parity, a company founded by one of the core devs. That dev lobbied heavily for rescue, and the community refused.<p>Bitcoin also made an irregular change, a year and a half into its history.
          • csomar2 hours ago
            It just shows that the decision making is very centralized and failure of ETC shows that the community is not interested in a true immutable ledger.
    • toomuchtodo5 hours ago
      I’m surprised folks aren’t already grinding against smart contract security in prod with gen AI and agents. If they are, I suppose they are not being conspicuous by design. Power and GPU time goes in, exploits and crypto comes out.
      • JimmyAustin4 hours ago
        There are a great many of them, you just can&#x27;t see them in the dark forest. <a href="https:&#x2F;&#x2F;www.paradigm.xyz&#x2F;2020&#x2F;08&#x2F;ethereum-is-a-dark-forest" rel="nofollow">https:&#x2F;&#x2F;www.paradigm.xyz&#x2F;2020&#x2F;08&#x2F;ethereum-is-a-dark-forest</a>
      • TheRoque4 hours ago
        Check the prizes for the bug bounties in big smart contracts. The prizes are truly crazy, like Uniswap pays $15,000,000 for a critical vuln, and $1,000,000 for a high vuln. With that kind of money, I HIGHLY doubt there aren&#x27;t people grinding against smart contracts as you say.
      • px434 hours ago
        Of course they are, and they&#x27;ve been doing it since long before ChatGPT or any of that was a thing. Before it was more with classifiers and concolic execution engines, but it&#x27;s only gotten way more advanced.
      • mschuster914 hours ago
        As soon as money in larger sums gets involved, the legal system <i>will</i> crack down hard on you if you are anywhere in the Western sphere of influence, easy as that.<p>In contrast, countries like North Korea, Russia, Iran - they all make bank on cryptocurrency shenanigans because they do not have to fear any repercussions.
      • yieldcrv2 hours ago
        I mean they are, the only news here is that Anthropic isn&#x27;t staffed by ignorant know-it-alls that wholesale dismiss the web3 development space like some other forum I know of
    • venturecruelty2 hours ago
      &quot;Money&quot;. The real cyberpunks would switch to anonymous, untraceable cash.
    • beefnugs3 hours ago
      I couldnt find it in the article, how do they &quot;assume&quot; how many victims will fall to these contract exploits?<p>And to go further: if it costs $3500 in ai tokens, to fix a bug that could steal $3600, who should pay for that? Whos responsibility is it for &quot;dumbass suckers who use other peoples buggy or purposefully malicious money based code&quot; ?<p>At best this is another weird ad by anthropic, trying to say, hey why arent you changing the world with our stuff, pay up quick hurry
      • DennisP2 hours ago
        Contracts themselves can hold funds. Usually a contract hack extracts the money it holds.<p>$3500 was the average cost per exploit they found. The cost to scan a contract averaged to $1.22. That cost should be paid by each contract&#x27;s developers. Often they pay much more than that for security audits.
    • mightypirate4 hours ago
      [dead]
  • yieldcrv2 hours ago
    &gt; Important: To avoid potential real-world harm, our work only ever tested exploits in blockchain simulators. We never tested exploits on live blockchains and our work had no impact on real-world assets.<p>They left the booty out there, this is actually hilarious, driving a massive rush towards their models
  • jesse__4 hours ago
    To me, this reads a lot like : &quot;Company raises $45 Billion, makes $200 on an Ethereum 0-day!&quot;
    • stavros3 hours ago
      Yeah but use of the models isn&#x27;t limited to the company.
  • user39393824 hours ago
    smart contracts the misnomer joke writes itself
    • yieldcrv2 hours ago
      just means self executing, or more like domino triggered, in practice<p>quite a bit more advanced than contracts that do nothing on a sheet of paper, but the term is from 2012 or so when &quot;smart&quot; was appended to everything digital
      • 8n4vidtmkvmk34 minutes ago
        Now we just append AI to everything instead...
  • mwkaufma5 hours ago
    Says more about the relatively poor infosec on etherium contracts than about the absolute utility of pentesting LLMs.
    • px434 hours ago
      4.6M is not a lot, and these were old bugs that it found. Also, actually exploiting these bugs in the real world is often a lot harder than just finding the bug. Top bug hunters in the Ethereum space are absolutely using AI tooling to find bugs, but it&#x27;s still a bit more complex than just blindly pointing an LLM at a test suite of known exploitable bugs.
      • Legend24404 hours ago
        According to the blogpost, these are fully autonomous exploits, not merely discovered bugs. The LLM&#x27;s success was measured by much money it was able to extract:<p>&gt;A second motivation for evaluating exploitation capabilities in dollars stolen rather than attack success rate (ASR) is that ASR ignores how effectively an agent can monetize a vulnerability once it finds one. Two agents can both &quot;solve&quot; the same problem, yet extract vastly different amounts of value. For example, on the benchmark problem &quot;FPC&quot;, GPT-5 exploited $1.12M in simulated stolen funds, while Opus 4.5 exploited $3.5M. Opus 4.5 was substantially better at maximizing the revenue per exploit by systematically exploring and attacking many smart contracts affected by the same vulnerability.<p>They also found new bugs in real smart contracts:<p>&gt;Going beyond retrospective analysis, we evaluated both Sonnet 4.5 and GPT-5 in simulation against 2,849 recently deployed contracts without any known vulnerabilities. Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth $3,694.
    • TheRoque4 hours ago
      True, I&#x27;d be curious to see if (and when) those contracts were compromised in the real world. Though they said they found 0 days, which implies some breaches were never found in the real world.
  • krupan3 hours ago
    No mention of Bitcoin. Exploiting ethereum smart contracts is nothing that new or exciting.
    • dtagames2 hours ago
      No one has ever successfully manipulated Bitcoin and it doesn&#x27;t offer smart contracts.
  • AznHisoka3 hours ago
    At first I read this as &quot;fined $4.6M&quot;, and my first thought &quot;Finally, AI is held accountable for their wrong actions!&quot;
    • evanb3 hours ago
      Careful what you wish for. Negating the predicate of &quot;A COMPUTER CAN NEVER BE HELD ACCOUNTABLE. THEREFORE A COMPUTER MUST NEVER MAKE A MANAGEMENT DECISION&quot; might open us up to the consequence.