3 comments
Is it me or they very carefully do not report performance on GPT-5.4 Pro, only the default GPT-5.4? They also very carefully left Anthropic models out of their comparison.<p>I went back to the BixBench benchmark which they mentioned. I couldn't find official results for Anthropic models, but I found a project taking Opus 4.6 from 65.3% to 92.0% (which would be above GPT-Rosalind) with nearly 200 carefully crafted skills [1]. There also appears to be competitive competitor models with scores on par with this tuned GPT.<p>[1] <a href="https://github.com/jaechang-hits/SciAgent-Skills" rel="nofollow">https://github.com/jaechang-hits/SciAgent-Skills</a>
I'm all for naming things in honor of Rosalind Franklin, but this seems like incredible misplaced hubris instead.
[flagged]
Is society's behavior determined by the administration? Odd way to live your life. This model is a tool, not a servant, but in any case I think paying homage to someone who made incredible contributions is a positive. Eye of the beholder, I suppose.