13 comments

  • binalpatel23 days ago
    Cool to see lots of people independently come to &quot;CLIs are all you need&quot;. I&#x27;m still not sure if it&#x27;s a short-term bandaid because agents are so good at terminal use or if it&#x27;s part of a longer term trend but it&#x27;s definitely felt much more seamless to me then MCPs.<p>(my one of many contribution <a href="https:&#x2F;&#x2F;github.com&#x2F;caesarnine&#x2F;binsmith" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;caesarnine&#x2F;binsmith</a>)
    • cosinusalpha23 days ago
      I am also not sure if MCP will eventually be fixed to allow more control over context, or if the CLI approach really is the future for Agentic AI.<p>Nevertheless, I prefer the CLI for other reasons: it is built for humans and is much easier to debug.
      • cjp23 days ago
        Have a look at code mode mcp. <a href="https:&#x2F;&#x2F;github.com&#x2F;universal-tool-calling-protocol&#x2F;code-mode" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;universal-tool-calling-protocol&#x2F;code-mode</a>
      • binalpatel23 days ago
        100% - sharing CLIs with the agent has felt like another channel to interact with them once I’ve done it enough, like a task manager the agent and I can both use using the same interface
    • fudged7123 days ago
      Thank you for posting binsmith, I&#x27;ve built something similar over the past few days and you&#x27;ve made some great decisions in here
    • 0x696C696123 days ago
      MCP let&#x27;s you hide secrets from the LLM
      • pylotlight23 days ago
        you can do same thing with cli via env vars no?
        • verdverm23 days ago
          Yes, I&#x27;m using Dagger and it has great secret support, obfuscating them even if the agent, for example, cats the contents of a key file, it will never be able to read or print the secret value itself<p>tl;Dr there are a lot of ways to keep secret contents away from your agent, some without actually having to keep them &quot;physically&quot; separate
    • desireco4223 days ago
      Hey this looks cool. So each agent or session is one thread. Nice. I like it.
  • the_mitsuhiko23 days ago
    At this point I&#x27;m fully down the path of the agent just maintaining his own tools. I have a browser skill that continues to evolve as I use it. Beats every alternative I have tried so far.
    • dtkav23 days ago
      Same. Claude Opus 4.5 one-shots the basics of chrome debug protocol, and then you can go from there.<p>Plus, now it is personal software... just keep asking it to improve the skill based on you usage. Bake in domain knowledge or business logic or whatever you want.<p>I&#x27;m using this for e2e testing and debugging Obsidian plugins and it is starting to understand Obsidian inside and out.
      • chrisweekly23 days ago
        Cool! Have you written more about this? (EDIT: from your profile, is that what <a href="https:&#x2F;&#x2F;relay.md" rel="nofollow">https:&#x2F;&#x2F;relay.md</a> is about?)
        • dtkav23 days ago
          <a href="https:&#x2F;&#x2F;relay.md" rel="nofollow">https:&#x2F;&#x2F;relay.md</a> is a company I&#x27;m working on for shared knowledge management&#x2F; AI context for teams, and the Obsidian plugin is what i am driving with my live-debug and obsidian-e2e skills.<p>I can try to write it up (I am a bit behind this week though...), but I basically opened claude code and said &quot;write a new skill that uses the chrome debug protocol to drive end to end tests in Obsidian&quot; and then whenever it had problems I said &quot;fix the skill to look up the element at the x,y coordinate before clicking&quot; or whatever.<p>Skills are just markdown files, sometimes accompanied by scripts, so they work really naturally with Obsidian.
          • chrisweekly23 days ago
            Hey FWIW Relay is AWESOME!! The granular sharing of a given dir within a vault (vs the whole thing) finally solves the split-brain problem of personal (private) vault on my own hardware vs mandated use of a company laptop... it&#x27;s fast, intuitive, and SOLVES this long-time thorn in my side. Thanks for creating it, high five, hope it leads to massive success for you! :)
            • dtkav22 days ago
              Thank you for the kind words &lt;3
        • dtkav21 days ago
          Sorry it took me a while. Hopefully this helps:<p><a href="https:&#x2F;&#x2F;notes.danielgk.com&#x2F;Obsidian&#x2F;Obsidian+E2E+testing+Claude+Skill" rel="nofollow">https:&#x2F;&#x2F;notes.danielgk.com&#x2F;Obsidian&#x2F;Obsidian+E2E+testing+Cla...</a>
          • chrisweekly18 days ago
            Thanks! It does help, it&#x27;s a great blog. You shld consider posting a &quot;show hn&quot;.
    • cosinusalpha23 days ago
      Do you experience any context pollution with that approach?
      • dtkav21 days ago
        Writing your own skill is actually a lot better for context efficiency.<p>Your skill will be tuned to your use case over time, so if there&#x27;s something that you do a lot you can hide most of the back-and-forth behind the python script &#x2F; cli tool.<p>You can even improve the skill by saying &quot;I want to be more token efficient, please review our chat logs for usage of our skill and factor out common operations into new functions&quot;.<p>If anything context waste&#x2F;rot comes from documentation of features that other people need but you don&#x27;t. The skill should be a sharp knife, not a multi-tool.
      • the_mitsuhiko22 days ago
        Not really. less bad than the mcps i used.
    • kinduff23 days ago
      whats the name of the skill?
      • lgas23 days ago
        why would that matter?
  • gregpr0723 days ago
    Creator of Browser Use here, this is cool, really innovative approach with ARIA roles. One idea we have been playing around with a lot is just giving the LLM raw html and a really good way to traverse it - no heuristics, just BS4. Seems to work well, but much more expensive than the current prod ready [index]&lt;div ... notation
    • cosinusalpha23 days ago
      Thanks!<p>I actually tried a raw HTML when I was exploring solutions. It worked for &quot;one-off&quot; tasks, but I ran into major issues with replayability on modern SPAs.<p>In React apps, the raw DOM structure and auto-generated IDs shift so frequently that a script generated from &quot;Raw HTML&quot; often breaks 10 minutes later. I found ARIA&#x2F;semantics to be the only stable contract that persists across re-renders.<p>You mentioned the raw HTML approach is &quot;expensive&quot;. Did you feed the full HTML into the context, or did you create a BS4 &quot;tool&quot; for the LLM to query the raw HTML dynamically?
  • TheTaytay23 days ago
    I really like this idea!<p>I’d like to see this other browser plugin’s API be exposed via your same CLI, so I don’t have to only control a separate browser instance. <a href="https:&#x2F;&#x2F;github.com&#x2F;remorses&#x2F;playwriter" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;remorses&#x2F;playwriter</a> (I haven’t investigated enough to know how feasible it is, but as I was reading about your tool, I immediately wanted to control existing tabs from my main browser, rather than “just” a debug-driven separate browser instance.)
    • cosinusalpha23 days ago
      Thanks! To clarify: webctl allows you to manually interact with the browser window at any time. It even returns &quot;manual interaction&quot; breakpoints to stdout if it detects an SSO&#x2F;login wall.<p>But I agree, attaching to the OS &quot;daily driver&quot; instance specifically would be a nice addition.
  • randito23 days ago
    If you look at Elixir keynote for Phoenix.new -- a cool agentic coding tool -- you&#x27;ll see some hints about a browser control using a API tool call. It&#x27;s called &quot;web&quot; in the video.<p>Video: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;ojL_VHc4gLk?t=2132" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;ojL_VHc4gLk?t=2132</a><p>More discussion: <a href="https:&#x2F;&#x2F;simonwillison.net&#x2F;2025&#x2F;Jun&#x2F;23&#x2F;phoenix-new&#x2F;" rel="nofollow">https:&#x2F;&#x2F;simonwillison.net&#x2F;2025&#x2F;Jun&#x2F;23&#x2F;phoenix-new&#x2F;</a>
  • renegat0x023 days ago
    A little bit different, but also allows to scrape efficiently. Json http communication rather than cli.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;rumca-js&#x2F;crawler-buddy" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rumca-js&#x2F;crawler-buddy</a><p>More like a framework for other mechanisms
  • philipbjorge23 days ago
    This looks remarkably similar to <a href="https:&#x2F;&#x2F;github.com&#x2F;vercel-labs&#x2F;agent-browser" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;vercel-labs&#x2F;agent-browser</a><p>How is it different?
    • cosinusalpha23 days ago
      To be honest, I hadn&#x27;t seen that one yet!<p>The main difference is likely the targeting philosophy. webctl relies heavily on ARIA roles&#x2F;semantics (e.g. role=button name=&quot;Save&quot;) rather than injected IDs or CSS selectors. I find this makes the automation much more robust to UI changes.<p>Also, I went with Python for V1 simply for iteration speed and ecosystem integration. I&#x27;d love to rewrite in Rust eventually, but Python was the most efficient way to get a stable tool working for my specific use case.
    • hugs23 days ago
      vibium clicker, too. <a href="https:&#x2F;&#x2F;github.com&#x2F;VibiumDev&#x2F;vibium&#x2F;blob&#x2F;main&#x2F;CONTRIBUTING.md#using-clicker" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;VibiumDev&#x2F;vibium&#x2F;blob&#x2F;main&#x2F;CONTRIBUTING.m...</a><p>&quot;browser automation for ai agents&quot; is a popular idea these days.
  • desireco4223 days ago
    How are you holding session if every command is issues through cli? I assume this is essential for automation.
    • cosinusalpha23 days ago
      A background daemon holds the session state between different CLI calls. This daemon is started automatically on the first webctl call and auto-closes after a timeout period of inactivity to save resources.
      • desireco4223 days ago
        I see, nice. Is there a way to run multiple sessions?
        • cosinusalpha23 days ago
          Yes, you can create isolated environments using the &quot;--session NAME&quot; flag.<p>It isolates cookies and local storage for that specific run. Since it&#x27;s a V1 release, there might be some edge cases in the session isolation - if you hit any, please open an issue!
  • grigio23 days ago
    is there a benchmark? there are a lot of scraping agents nowdays..
    • cosinusalpha23 days ago
      I don&#x27;t have an objective benchmark yet. I tried several existing solutions, especially the MCP servers for browser automation, and none of them were able to reproducibly solve my specific task.<p>An objective benchmark is a great idea, especially to compare webctl against other similar CLI-based tools. I&#x27;ll definitely look into how to set that up.
  • Agent_Builder23 days ago
    [dead]
    • cosinusalpha23 days ago
      I actually think the CLI approach helps with those boundaries. Because webctl commands are discrete and pipeable (e.g. webctl snapshot | llm | webctl click), the &quot;authority&quot; is reset at every step of the pipeline. It feels easier to audit a text stream of commands than a socket connection that might be accumulating invisible context.
  • AI-love20 days ago
    [dead]