TREX: An AI code reviewer that runs your code

16 points by dakshgupta3 hours ago

2 comments

deet9 minutes ago
We've found that methods like this substantially increase the quality and reliability of coding agent output. The ability to run code in a sandbox, drive an interactive session using a browser or API calls or other apps, and visually confirm output via vision models all adds up to plugging a big hole in the feedback loop for agent modifying a complex codebase.<p>At Kinelo we've had agents go as far as interactively testing how our product responds in video calls by launching our full stack in a set of docker containers (app, api, db, queues, etc.), all inside a larger sandbox, populating test data, connecting the mock system to a real video call solution like Google meet, and injecting audio and video to test the response. End-to-end, like a real user flow.<p>It's not perfect yet, but if you are a skeptic on the ability for AI agents to productively modify a complex product, I'd highly encourage you to play with a setup like this before ossifying your conclusions.
Elzair47 minutes ago
I wonder how long it will take for someone to pwn this?