Maxproof

(arxiv.org)

107 points by ilreb6 hours ago

7 comments

daquisu4 hours ago
"I thought it was interesting and a bit underappreciated that the fraction of gold medalists at the 2025 IMO (72/630 = 11.4%) is the highest it’s been since 1981.Crudely, IMO gold medals are awarded to the highest-scoring 1/12 of contestants.1 However, because scores are integers up to 42 and there’s no provision for tiebreaking, it’s possible for a lot of contestants to be tied around the threshold. In that case, either all of them get a gold medal or none do, and the fraction of gold medalists might deviate substantially from 1/12. That’s what happened this year: 46 contestants all won a gold medal by scoring exactly 35 points.In fact, bizarrely, 35 is the mode of the scores this year; the last time the modal score was a gold medal score was in 1994. And, of course, 35 is the same score claimed by AI systems from Google, OpenAI, and others."From <a href="https://blog.vero.site/post/imo-2025" rel="nofollow">https://blog.vero.site/post/imo-2025</a>
- quibono4 hours ago
 I was under the impression that IMO is conducted in an official "exam" capacity, on site and in a very formal setting. So I find it hard to believe _direct_ LLM usage would be a factor Then again - it very well could be a factor in the training and preparation? I imagine "Write me a prep document for the IMO" will surface all kinds of interesting things from the training set.
 - quietbritishjim4 hours ago
 > And, of course, 35 is the same score claimed by AI systems from Google, OpenAI, and others.This is the part of the quote your6 replying about.You seemed to take "of course" as an implication that the contestants used LLMs, and that's why they got the same score as the LLMs.I took it to mean: since this was the modal score, there seemed to be 35 points worth of significantly easier answers (relatively speaking) than the remaining points, so it's not a surprise that LLMs got the same easier bits right. (Though I doubt all contestants got their points on exactly the same answers.)But it's certainly unclear what exactly the author meant.
- dooglius1 hour ago
 This is not bizarre, it's a reflection of how the IMO is scored: 6 questions with scores from 0-7 but partial credit is rare. It's really a score of 5/6.
pfannl4 hours ago
The real AGI test is apparently not solving the IMO, but getting caught in the same scoring traffic jam as 46 teenagers.
thierrydamiba4 hours ago
Is the harness more valuable than the weights?
korbonits3 hours ago
Proves the need for more formal verification :)
minimaxir4 hours ago
not a good day to be named Max
thatsgcasey1 hour ago
[dead]
uyuyuy4 hours ago
[dead]