I also did computer agents with a vc backed startup, ran into the same issues, and we built a fairly similar thing at one point.<p>It’s useful but it has limitations, it seems to only work well in environments that are perfectly predictable otherwise it gets in the way of the agent.<p>I think I prefer RL over these approaches but it requires a bit more data.
Funny, we are working to implement this same logic in our in-house financial categorization agent. When we have a repeat prompt it goes to a json that stores answers and only goes to AI for edge cases.<p>It’s a good idea
Awesome to hear you’ve done similar. JSON artifacts from runs seem to be a common approach for building this in house, similar to what we did with the muscle mem. Detecting cache misses is a bit hard without seeing what the model sees, part of what inspired this proxy direction.<p>Thanks for the nice words!
We spoke to a number of browser agent companies who said deterministic RPA with an AI fallback was their "secret" :)
We often will repeat calls to try again. Or sometimes we make the same call multiple times to get multiple answers and then score or merge them.<p>Is this used only in cases where you assume the answer from your first call is correct?
I’d love your opinion here!<p>Right now, we assume first call is correct, and will eagerly take the first match we find while traversing the tree.<p>One of the worst things that could currently happen is we cache a bad run, and now instead of occasional failures you’re given 100% failures.<p>A few approaches we’ve considered
- maintain a staging tree, and only promote to live if multiple sibling nodes (messages) look similar enough. Decision to promote could be via tempting, regex, fuzzy, semantic, or LLM-judged
- add some feedback APIs for a client to score end-to-end runs so that path could develop some reputation
interesting. is the answer not context specific most of the time? even if I ask LLM the same question again and again the answer depends on the context.<p>what are some use cases where you need deterministic caching?
What local models will it work with ? Also what will be the pricing for local llms?
Good question, I imagine you’d need to set up an ngrok endpoint to tunnel to local LLMs.<p>In those cases perhaps an open source (maybe even local) version would make more sense. For our hosted version we’d need to charge something, given storage requirements to run such a service, but especially for local models that feels wrong. I’ve been considering open source for this reason.
I like the pricing model but I'm skeptical it will last.
Interesting... is it legal?
[dead]
So instead of OpenAI I should pay butter?