Square Minus Square – A coding agent benchmark

25 points by Topfi42 days ago

1 comments

wariatus36 days ago
Have you tried to equip those agents with an access to grounded vision model to analyse that image?<p>In my experience most models can’t understand such imput properly<p>I am now experimenting with Molmo2 and it looks promising