10 comments

  • mountainriver22 hours ago
    I also did computer agents with a vc backed startup, ran into the same issues, and we built a fairly similar thing at one point.<p>It’s useful but it has limitations, it seems to only work well in environments that are perfectly predictable otherwise it gets in the way of the agent.<p>I think I prefer RL over these approaches but it requires a bit more data.
  • realitysballs23 hours ago
    Funny, we are working to implement this same logic in our in-house financial categorization agent. When we have a repeat prompt it goes to a json that stores answers and only goes to AI for edge cases.<p>It’s a good idea
    • edunteman22 hours ago
      Awesome to hear you’ve done similar. JSON artifacts from runs seem to be a common approach for building this in house, similar to what we did with the muscle mem. Detecting cache misses is a bit hard without seeing what the model sees, part of what inspired this proxy direction.<p>Thanks for the nice words!
  • rajit21 hours ago
    We spoke to a number of browser agent companies who said deterministic RPA with an AI fallback was their &quot;secret&quot; :)
    • edunteman21 hours ago
      Very, very common approach!<p>Wrote more on that here: <a href="https:&#x2F;&#x2F;blog.butter.dev&#x2F;the-messy-world-of-deterministic-agents">https:&#x2F;&#x2F;blog.butter.dev&#x2F;the-messy-world-of-deterministic-age...</a>
      • toobulkeh15 hours ago
        What a great overview!<p>I’d love your thoughts on my addition, autolearn.dev — voyager behind MCP.<p>The proxy format is exactly what I needed!<p>Thanks
  • barapa22 hours ago
    We often will repeat calls to try again. Or sometimes we make the same call multiple times to get multiple answers and then score or merge them.<p>Is this used only in cases where you assume the answer from your first call is correct?
    • edunteman21 hours ago
      I’d love your opinion here!<p>Right now, we assume first call is correct, and will eagerly take the first match we find while traversing the tree.<p>One of the worst things that could currently happen is we cache a bad run, and now instead of occasional failures you’re given 100% failures.<p>A few approaches we’ve considered - maintain a staging tree, and only promote to live if multiple sibling nodes (messages) look similar enough. Decision to promote could be via tempting, regex, fuzzy, semantic, or LLM-judged - add some feedback APIs for a client to score end-to-end runs so that path could develop some reputation
      • toobulkeh15 hours ago
        I’d assume RL would be baked in to the request structure. I’m surprised OAI spec doesn’t include it, but I suppose you could hijack a conversation flow to do so
  • invisibleink22 hours ago
    interesting. is the answer not context specific most of the time? even if I ask LLM the same question again and again the answer depends on the context.<p>what are some use cases where you need deterministic caching?
  • Jayakumark22 hours ago
    What local models will it work with ? Also what will be the pricing for local llms?
    • edunteman20 hours ago
      Good question, I imagine you’d need to set up an ngrok endpoint to tunnel to local LLMs.<p>In those cases perhaps an open source (maybe even local) version would make more sense. For our hosted version we’d need to charge something, given storage requirements to run such a service, but especially for local models that feels wrong. I’ve been considering open source for this reason.
  • puppycodes22 hours ago
    I like the pricing model but I&#x27;m skeptical it will last.
    • edunteman22 hours ago
      I feel the same - we’ll use it as long as we can since it’s customer aligned but I wouldn’t be surprised if competitive or COGs costs force us to change in the future.
  • ronbenton23 hours ago
    Interesting... is it legal?
    • edunteman22 hours ago
      I couldn’t see how it wouldn’t be, as it’s a free market opt-in decision to use Butter
      • ronbenton22 hours ago
        it wouldn&#x27;t be the first API service to disallow someone from selling a cache layer for their API. After all, this should likely result in OpenAI (or whatever provider) making less money
        • edunteman20 hours ago
          Ah yes that makes sense, have heard of those cases too but hadn’t put much thought into it. Thanks for pointing it out!
          • RestartKernel20 hours ago
            I&#x27;ve seen the OpenRouter guys here on HN before, so you can probably ask them what to look out for.
  • banjwoorri18 hours ago
    [dead]
  • robofanatic23 hours ago
    So instead of OpenAI I should pay butter?
    • edunteman22 hours ago
      It’s bring-your-own-key, so any calls proxied to OpenAI just end up billing directly to your account as normal.<p>You’d only pay Butter for calls that <i>don’t</i> go to the provider. That’d be a separate billing account with butter.