2 comments

  • deet9 minutes ago
    We&#x27;ve found that methods like this substantially increase the quality and reliability of coding agent output. The ability to run code in a sandbox, drive an interactive session using a browser or API calls or other apps, and visually confirm output via vision models all adds up to plugging a big hole in the feedback loop for agent modifying a complex codebase.<p>At Kinelo we&#x27;ve had agents go as far as interactively testing how our product responds in video calls by launching our full stack in a set of docker containers (app, api, db, queues, etc.), all inside a larger sandbox, populating test data, connecting the mock system to a real video call solution like Google meet, and injecting audio and video to test the response. End-to-end, like a real user flow.<p>It&#x27;s not perfect yet, but if you are a skeptic on the ability for AI agents to productively modify a complex product, I&#x27;d highly encourage you to play with a setup like this before ossifying your conclusions.
  • Elzair47 minutes ago
    I wonder how long it will take for someone to pwn this?