7 comments

  • Retr0id4 hours ago
    &gt; As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark<p>And what does opus score with &quot;regular&quot; browser harnesses?
    • 9wzYQbTYsAIc3 hours ago
      90% easy or 90% average?
      • theredsix2 hours ago
        90% average with 85.51% hard!
        • 9wzYQbTYsAIc2 hours ago
          Nice! Will take a look at this for my homelab - was debating using crawl.cloudflare.com to try it out, as browser rendering was my next stretch goal.
    • esafak3 hours ago
      <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;osunlp&#x2F;Online_Mind2Web_Leaderboard" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;osunlp&#x2F;Online_Mind2Web_Leaderb...</a>
      • Retr0id3 hours ago
        Hm I can&#x27;t see Opus 4.6 on there
        • theredsix2 hours ago
          I tweeted at the OSUNLP and they&#x27;re backed up on eval validation. In the meantime, here&#x27;s the benchmark repo with the saved runs and also instructions on how to run it locally. <a href="https:&#x2F;&#x2F;github.com&#x2F;theredsix&#x2F;abp-online-mind2web-results" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;theredsix&#x2F;abp-online-mind2web-results</a>
  • giancarlostoro4 hours ago
    Interesting, I wonder if this would help with other projects too, one project that comes to mind is archivebox, I don&#x27;t know if they still have the issue I&#x27;m thinking of, but archivebox eventually had the Chrome instances (as the meme goes) basically consume all available RAM. If by freezing execution this could stop that, it could be useful for more than just AI agents.
    • theredsix2 hours ago
      Yeah, I noticed CPU use goes to near zero during the pausing phase. You can also trigger pause via REST&#x2F;MCP so a script can take advantage of these abilities as well.
  • gregpr072 hours ago
    Love it! From first principles: this kinda answers the &quot;do we really even need CDP&quot; I always have in my head building browser use...
    • theredsix2 hours ago
      Totally, I feel that CDP was designed for a different category of automations.
  • theredsix5 hours ago
    Op here, happy to answer any question!
    • esafak3 hours ago
      How does it compare with <a href="https:&#x2F;&#x2F;agent-browser.dev&#x2F;" rel="nofollow">https:&#x2F;&#x2F;agent-browser.dev&#x2F;</a> ? It would be great if you could add it to your table: <a href="https:&#x2F;&#x2F;github.com&#x2F;theredsix&#x2F;agent-browser-protocol?#comparison" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;theredsix&#x2F;agent-browser-protocol?#compari...</a>
      • theredsix2 hours ago
        agent-browser&#x27;s biggest selling point is a CLI wrapper around CDP&#x2F;puppeteer for context management. It&#x27;ll have mostly the same pros&#x2F;cons as CDP on the table.
      • theredsix1 hour ago
        Updated the table!
  • octoclaw2 hours ago
    [dead]
  • bhekanik3 hours ago
    [dead]
  • webpolis1 hour ago
    [dead]
    • sebmellen20 minutes ago
      Does it feel good to be botting HN with ads for your own product?<p>I&#x27;m so sick of reading OpenClaw comments! No activity for 7 months, and then in the past day, five comments from an LLM pitching your tool. What are you doing man? This degrades the quality of HN so badly.
    • theredsix1 hour ago
      Great insight! ABP exposes display resolution controls right now. I&#x27;ve noticed almost zero reCAPTCHAs during testing compared puppeteer stealth or other packages. Regarding the freezing mechanic, virtualtime is paused as well and the entire browser clock is captured so it would be very hard for a page&#x27;s JavaScript to notice the time drift unless they were querying an external API clock.