Maxproof

(arxiv.org)

107 points by ilreb6 hours ago

7 comments

  • daquisu4 hours ago
    &quot;I thought it was interesting and a bit underappreciated that the fraction of gold medalists at the 2025 IMO (72&#x2F;630 = 11.4%) is the highest it’s been since 1981.<p>Crudely, IMO gold medals are awarded to the highest-scoring 1&#x2F;12 of contestants.1 However, because scores are integers up to 42 and there’s no provision for tiebreaking, it’s possible for a lot of contestants to be tied around the threshold. In that case, either all of them get a gold medal or none do, and the fraction of gold medalists might deviate substantially from 1&#x2F;12. That’s what happened this year: 46 contestants all won a gold medal by scoring exactly 35 points.<p>In fact, bizarrely, 35 is the mode of the scores this year; the last time the modal score was a gold medal score was in 1994. And, of course, 35 is the same score claimed by AI systems from Google, OpenAI, and others.&quot;<p>From <a href="https:&#x2F;&#x2F;blog.vero.site&#x2F;post&#x2F;imo-2025" rel="nofollow">https:&#x2F;&#x2F;blog.vero.site&#x2F;post&#x2F;imo-2025</a>
    • quibono4 hours ago
      I was under the impression that IMO is conducted in an official &quot;exam&quot; capacity, on site and in a very formal setting. So I find it hard to believe _direct_ LLM usage would be a factor Then again - it very well could be a factor in the training and preparation? I imagine &quot;Write me a prep document for the IMO&quot; will surface all kinds of interesting things from the training set.
      • quietbritishjim4 hours ago
        &gt; And, of course, 35 is the same score claimed by AI systems from Google, OpenAI, and others.<p>This is the part of the quote your6 replying about.<p>You seemed to take &quot;of course&quot; as an implication that the contestants used LLMs, and that&#x27;s why they got the same score as the LLMs.<p>I took it to mean: since this was the modal score, there seemed to be 35 points worth of significantly easier answers (relatively speaking) than the remaining points, so it&#x27;s not a surprise that LLMs got the same easier bits right. (Though I doubt all contestants got their points on exactly the same answers.)<p>But it&#x27;s certainly unclear what exactly the author meant.
    • dooglius1 hour ago
      This is not bizarre, it&#x27;s a reflection of how the IMO is scored: 6 questions with scores from 0-7 but partial credit is rare. It&#x27;s really a score of 5&#x2F;6.
  • pfannl4 hours ago
    The real AGI test is apparently not solving the IMO, but getting caught in the same scoring traffic jam as 46 teenagers.
  • thierrydamiba4 hours ago
    Is the harness more valuable than the weights?
  • korbonits3 hours ago
    Proves the need for more formal verification :)
  • minimaxir4 hours ago
    not a good day to be named Max
  • thatsgcasey1 hour ago
    [dead]
  • uyuyuy4 hours ago
    [dead]