11 comments

  • stefanpie11 hours ago
    Prof. Cunxi Yu and his students at UMD is working on this exact topic and published a paper on agents for improving SAT solvers [1].<p>I believe they are extending this idea to EDA &#x2F; chip design tools and algorithms which are also computationally challenging to solve. They have an accepted paper on this for logic synthesis which will come out soon.<p>[1] &quot;Autonomous Code Evolution Meets NP-Completeness&quot;, <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2509.07367" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2509.07367</a>
    • chaisan10 hours ago
      nice. EDA indeed one of the top applications of SAT
  • ericpauley11 hours ago
    It should be noted that MaxSAT 2024 did not include z3, as with many competitions. It’s possible (I’d argue likely) that the agent picked up on techniques from Z3 or some other non-competing solver, rather than actually discovering some novel approach.
    • throw-qqqqq1 hour ago
      Z3 is capable (it’s an SMT solver, not just SAT), but it’s not very fast at boolean satifiability and not at all competitive with modern SOTA SAT solvers. Try comparing it to Chaff or Glucose e.g.
    • jmalicki11 hours ago
      Or for that matter even from later versions of the same solvers that were in its training data!
      • ericpauley11 hours ago
        True. I’d be curious whether a combination of matching comp&#x2F;training cutoff and censoring web searches could yield a more precise evaluation.
        • chaisan9 hours ago
          as its from 2024 (MaxSAT was not held in 2025), its quite likely all the solvers are in the training data. so the interesting part here is the instances for which we actually got <i>better</i> costs that what is currently known (in the best-cost.csv) file.
          • ericpauley3 hours ago
            As GP noted the issue is that even <i>better</i> versions than competed in MaxSAT are likely in the training data or web resources.
    • dooglius10 hours ago
      Is z3 competitive in SAT competitions? My impression was that it is popular due to the theories, the python API, and the level of support from MSR.
      • ericpauley10 hours ago
        Funnily, this was precisely the question I had after posting this (and the topic of an LLM disagreement discussed in another thread). Turns out not, but sibling comment is another confounding factor.
  • CJefferson7 hours ago
    One problem here is it&#x27;s very easy to overtune to a past problem set -- even accidentally. You can often significantly improve performance just by changing your random number generator seed until you happen to pick the right assignment for the first few variables of some of the harder problems.<p>It would be interesting to take the resulting solver and apply it to an unknown data set.
  • MrToadMan8 hours ago
    Not as many changes to the files under library as I expected to see. Most changes seemed to be under a single ‘add stuff’ commit. If some of the solvers are randomised, then repeatedly running and recording best solution found will continually improve over time and give the illusion of the agent making algorithmic advancements, won’t it?
    • chaisan7 hours ago
      yeh. ofc. but on any problem larger than 40 variables, the gains from random restarts or initializations will quickly plateau
      • chaisan7 hours ago
        and it would take an algo change to the solver to jump to the next local optimum
        • MrToadMan4 hours ago
          I guess my point was that I don&#x27;t see many algo changes in the commit history, which is a shame if this has been lost; library&#x2F;* files are largely unchanged from the initial commits. But each time the agent runs, it has access to the best solutions found so far and can start from there, often using randomisation, which the agent claims helps it escape local minima e.g. &#x27;simulated annealing as a universal improver&#x27;. It would be nice to see how its learnt knowledge performs when applied to unseen problems in a restricted timeframe.
  • FernandoDe794401 hour ago
    fine tuning a small model usually beats prompting a large one for specific tasks imo
  • gsnedders11 hours ago
    What counts as “our cost”? How long it takes to find the MaxSAT?
    • chaisan9 hours ago
      the sum of the weights of the <i>unsatistied</i> clauses. we want to reduce this number
  • ktimespi9 hours ago
    sounds like AlphaDev [1] might be a better approach for a problem like this.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;google-deepmind&#x2F;alphadev" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;google-deepmind&#x2F;alphadev</a>
  • cerved7 hours ago
    Would me be nice to try this on lcg (CP-SAT) solvers
  • Dennis1187538825 hours ago
    we&#x27;ve been running something similar in prod. latency is the real bottleneck not accuracy
  • ClawVorpal213558 hours ago
    anyone else finding that agent architectures are way more expensive than expected?
    • chaisan7 hours ago
      wrt. token usage?
  • balinha_886410 hours ago
    interesting results but the eval methodology seems a bit optimistic
    • chaisan9 hours ago
      its just comparing the cost of the best solution found to the best known cost we had before. O(N). why optimistic?
      • yorwba5 hours ago
        If you have showdead on, you can see that this account posts generic oneliners: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;threads?id=balinha_8864">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;threads?id=balinha_8864</a>