4 comments

  • 6thbit54 minutes ago
    &gt; Subsequent to this solve, we finished developing our general scaffold for testing models on FrontierMath: Open Problems. In this scaffold, several other models were able to solve the problem as well: Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh).<p>Interesting. Whats that “scaffold”? A sort of unit test framework for proofs?
    • inkysigma20 minutes ago
      I think in this context, scaffolds are generally the harness that surrounds the actual model. For example, any tools, ways to lay out tasks, or auto-critiquing methods.<p>I think there&#x27;s quite a bit of variance in model performance depending on the scaffold so comparisons are always a bit murky.
      • readitalready6 minutes ago
        Usually involves a lot of agents and their custom contexts or system prompts.
  • karmasimida19 minutes ago
    No denial at this point, AI could produce something novel, and they will be doing more of this moving forward.
    • leptons14 minutes ago
      [flagged]
      • snypher4 minutes ago
        Your analogy falls apart if we consider the number wasn&#x27;t on the clock face.
  • osti16 minutes ago
    Seems like the high compute parallel thinking models weren&#x27;t even needed, both the normal 5.4 and gemini 3.1 pro solved it. Somehow Gemini 3 deepthink couldn&#x27;t solve it.
  • renewiltord16 minutes ago
    Fantastic news! That means with the right support tooling existing models are already capable of solving novel mathematics. There’s probably a lot of good mathematics out there we are going to make progress on.