Cursor Introduces Composer 2.5

(cursor.com)

106 points by asar13 hours ago

21 comments

goyozi32 minutes ago
I kind of want to try it, to see if and how far they can take an open model and improve it but I really don’t miss the Cursor user experience. Constant UI changes, half-baked features, smaller and smaller limits, useless AI change attribution; I think I’ll wait for others to report if it’s any good.
memoryleakgame9 hours ago
If these benches from their site hold up (they likely wont)Wouldn't this compress ai revenue like 15x quicklyIf they really have a 4.7 opus high equivalent at 1/16 the cost wouldn't this significantly effect all the current capex and planingMaybe they are getting elon to cover cost
- infecto8 hours ago
 The way I have read their benchmark results is that they trained a model to work insanely well in their coding workflow. It’s not a general purpose model.One of the surprisingly hardest problems to solve is to get a model to use the tools you give it access to.
- zackify8 hours ago
 this thing is so awesome on fast mode, so far i am impressed, some of its observations feel similar to opus.i use gpt 5.5 and opus 4.7 a lot every day, if i can get good results at this speed, hopefully the usage level holds up on my team plan haha
- 2001zhaozhao8 hours ago
 > compress ai revenue like 15xthat roughly just puts it on par with OpenAI and Anthropic subscriptions in terms of pricing per token
m_mueller50 minutes ago
It's a bit confusing to me why they'd make this 'fast' version the default, as it appears to be much more expensive than Composer 2. Wasn't it supposed to be a very cheap alternative to SOTA models?
asar13 hours ago
The model is (like Composer 2) based on Kimi K2.5 and they claim SOTA performance for 1/10th of the cost. The tweet also mentions that they've started a new model from scratch on Colossus 2 (xAI/SpaceX Cluster). Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.
- antirez53 minutes ago
 How much the RL they are doing really improves Kimi K2.5 is to be seen. So, right now, the ground truth is that they combined what they had with a strong open weights model. The RL improvement may be both marginal (since may folks report strong results with vanilla K2.6) and may mostly bias the model towards coding tasks: when a model like this is trained to be generalist, there is a tension between being good at one thing and the other, in terms of SFT and RL. You can see this in the DeepSeek v4 Flash training report for instance but it is a known fact. So if you have the GPUs and a decent RL pipeline that does not run the model you can indeed specialize it a bit more for a given task at the expenses of tasks people will not do inside Cursor. But, so far, the measurable reality is that Cursor uses an open weight model like most could do, and the RL story could be partilly a marketing move to call to Composer 2.5 more than a real strong gain, given that there is no way to verify and K2.5 was already strong. And we also know that they had to partner to do the training, which is also not a good news.
- onlyrealcuzzo12 hours ago
 > Really impressive how they've made this jump from being called the vscode fork with no moat just a couple of months ago.Impressive, yes. But they still don't have a moat...
 - infecto8 hours ago
 I am not sure we should dismiss what they have today. Nobody has yet to come close with a full package ide that works well for coding. Is that not a moat? It is easy for my to in my head discount it, thinking that I could build something myself but between autocomplete and their workflow for agent use, it feels like they have some tangible moat emerging.
 - alach119 hours ago
 Isn't a large user base and the data collected from those users a moat of sorts?
 - onlyrealcuzzo9 hours ago
 A moat is when you have something other's can't easily get.Every MAG 7 / FAANG company already has more users and more data...That's not a moat.That's traction.
 - wilg1 hour ago
 That's not X.That's Y.
 DonHopkins23 minutes ago
 I fear the day that large parts of perfectly valid English language and punctuation are off limits for humans to use because LLMs use them too (having learned them from humans), and somebody will always whine and post low effort "slop" comments that are much more annoying and less useful than the slop itself, or even human written text that happens to match somebody's hyper-sensitive slop detector.Plus you are always running the risk of being rude and insulting when incorrectly labeling text actually written by humans as slop — making a jackass of yourself — and opening yourself up to being trolled by humans purposefully inserting em-dashes and catch phrases just to trigger you. That's not clever. That's gullible.How much cognitive and physical effort and time do you put into trying to figure out if everything you read is slop, then complaining about it? If that's your job or calling in life, you could be easily replaced with AI.If you object to low effort slop, then how about instead of posting low effort whines about slop, you put in the actual effort to do something about it, and rewrite the slop in a way that won't trigger your slop detector, then post that instead, to train AI not to write slop.Is your problem that it's slop, or that it's AI generated? Because your whining about low effort AI generated slop without contributing the conversation or addressing the point you're replying to is just low effort human generated slop.Don't post slop while complaining about slop.
 - AussieWog939 hours ago
 Honestly the data itself is probably worth heaps even in the company itself collapses. Early attention engineering when humans were still in the loop!!!
 - NitpickLawyer39 minutes ago
 > Early attention engineering when humans were still in the loopExactly. Cursor was the first product used by tons of devs on real codebases. Just the signal "acceptance rate" is huge and can't be easily captured w/ synthetic data.
 - kkukshtel11 hours ago
 And its still just a vscode fork
- wg01 hour ago
 This was the only way forward.
- liuliu12 hours ago
 Since the frontier is only 8-month ahead of DeepSeek, it is hard to see how model training can be a moat as all the tricks are available from open labs in China. You really just need <100m to bootstrap at this point.
- Lionga13 hours ago
 They are still a vscode fork with no moat? Like they lost about 70% of users in half a year which goes to show how there is not even the tiniest of moat.
 - GenerWork12 hours ago
 I feel like they've been targeting enterprise pretty hard. I know my company uses them, and the companies that hire us also use Cursor.
 - Squarex1 hour ago
 All enterprises I know use GitHub copilot as they already have Office, Teams, … wonder how will it change with the recent pricing changes
 - kvetching7 hours ago
 Cursor will definitely win the enterprise for coding. Enterprises aren't going to trust a TUI
 - esafak1 hour ago
 Why not? That makes no sense to me.
- whywhywhywhy12 hours ago
 It's still a VsCode fork just now with a Kimi fine tune and still no moat...I won't debate that it turns out none of this mattered when it came to being as successful company though and kinda makes anyone who tried to roll their own instead of fork look a little silly.
 - hkleppe43 minutes ago
 "No moat", well...How I see this is that its so important to bundle the model with the right tooling.Like a racecar, having the best engine doesn't help if the rest of the car lacks other winning properties (reliability, aerodynics etc).So for Cursor, which IMO, they put themself in a strong position by having both a solid IDE __and__ a solid+cost efficient model. Those two working great in combination for the task they are designed to solve (coding) is more important than benchmarks
- aurareturn12 hours ago
 I doubt it's a brand new model. It's likely just Kimi K2.5 further trained on coding.
 - enraged_camel12 hours ago
 They didn't say it's a new model... in fact they said exactly what you just said.
PUSH_AX13 hours ago
They set themselves up for flack when they use whatever these evals are… they did the same for composer 2 which was evaled in close competition with frontier models, spoiler alert, it wasn’t even close in practice.So now 2.5 is supposed to compete with opus 4.7? Sure…
- tuo-lei12 hours ago
 they say it themselves in the post - behavior dimensions "not well captured by existing benchmarks". that was the exact problem with composer 2. not dumber on individual tasks, just bad at session-level decisions like when to stop editing, how much context to carry forward, when to re-read a file vs assume. you don't catch any of that in an isolated eval.
- infecto8 hours ago
 As I have said before in prior composer threads. The proof is in the usage. I am inclined to somewhat believe the results as I use composer and also take the results for the given context. It’s not a general purpose sota model. It’s a model that runs inexpensively in their coding workflow that is creating results similar to opus or gpt.
- criemen12 hours ago
 Well is that a statement about the quality of Opus 4.7 or about compose 2.5? :P
uf00lme1 hour ago
I wonder why they didn’t train off Kimi 2.6, I hope is it because they already had a good base and not that they messed up that relationship.
- re-thc40 minutes ago
  That's 3.0
everfrustrated13 hours ago
Full details <a href="https://cursor.com/blog/composer-2-5" rel="nofollow">https://cursor.com/blog/composer-2-5</a>
- dang1 hour ago
 Thanks! Link belatedly changed above.
big-chungus41 hour ago
Can you please train Qwen 3.5 like 0.8B to 9B using the same training techniques
granzymes9 hours ago
Surprised this got pushed off the front page so quickly! It’s exciting to see what the Cursor team has been able to do with significantly fewer resources than the frontier labs.I do wish they weren’t joining xAI. Something tells me there will be a contingent of researchers that departs Cursor if that merger is consummated.
- dang1 hour ago
 It set off the flamewar detector, a,k.a. the overheated discussion detector. We'll turn that off.
 - granzymes1 hour ago
 Thanks, dang! The blog post[1] might be a better source than the twitter thread. Also I regret my typo above (lab -> labs) but too late now![1] <a href="https://cursor.com/blog/composer-2-5" rel="nofollow">https://cursor.com/blog/composer-2-5</a>
 - dang1 hour ago
 Thanks! I had been just about to add that maybe the link wasn't the most informative. We've switched it now from <a href="https://twitter.com/cursor_ai/status/2056415413077233983" rel="nofollow">https://twitter.com/cursor_ai/status/2056415413077233983</a>.As for the typo, s's are cheap and I've added one :)
jtwaleson12 hours ago
Ok this might be weird but I've moved everyone in my 4 person team to our team plan and costs seem to have sky rocketed compared to the individual plans. Where before most people spent 20-100 USD, now the total bill is more like 1k USD. I haven't gone into the details but it feels like I'm being scammed.
- DedlySnek1 hour ago
 My company is shifting us from Cursor to Claude due to increased costs.
- danbrooks11 hours ago
 Check which model you're using.The fast version of composer is the default now (which costs ~x3 as much).
- infecto8 hours ago
 Keep in mind I believe there is a larger buffer given to personal plans. If they have 50% extra with the personal plan you now only get 25%.
- PUSH_AX12 hours ago
 My cursor costs sky rocketed recently too
vanuatu12 hours ago
It's always great that more companies are throwing their hat in the ring, especially focusing on value (latency + intelligence + cost)
lukebrichey10 hours ago
this feels super bullish on cursor/spacexai's ability to train a frontier level model. could be truly SOTA on coding given that their RL data is this powerful
jdlyga13 hours ago
It's a bit odd that they're not comparing it against Sonnet
- jjice13 hours ago
  I don't think so. They're comparing it to the highest tier available models from Anthropic and OpenAI. Generally speaking, Opus is better than Sonnet in almost every way, so why have the redundancy?
  - 38362936484 minutes ago
    Price to performance?
- CodingJeebus12 hours ago
  The tweet specifies that the new model is geared towards long-running tasks, which is what you'd use a model like Opus for anyway.
re-thc12 hours ago
Did they just upgrade Kimi 2.5 to 2.6?
- lukebrichey10 hours ago
  still uses 2.5
polski-g9 hours ago
I don't know why their model isn't on Openrouter yet. They must not have enough capacity to offer it.
svclaws13 hours ago
Their previous Composer was already marketed as a cheap model capable of competing with SOTA on most tasks. The evals they shared back then backed this up but in my day-to-day usage it fell short across the board. Canceled my cursor subscription and switched to Claude Code a few weeks ago. It has its own shortcomings but in terms of model capability and UX quality Cursor will have a hard time competing in the long term. Elon Musk will be a very good way out for them.
sergiotapia13 hours ago
Congratulations on the launch! I'm interested in trying Cursor but it's very confusing what I should buy. What does the Pro $20 plan get me in usage if I only use Composer 2.5? How fast is the model?
- darkwi11ow12 hours ago
 I use $20 plan on daily basis for more than a year now, and have yet to exhaust that limit. The plan includes $20 in api costs for non-Cursor premium models and $20 for Composer and Auto models provided by Cursor themselves.That said, I am pretty old-fashioned coder and use LLM mostly to overcome the blank page problem, which means I review and often rewrite LLM output by hand and avoid prompt loops for a single task.People who are aiming to not read code any more might find this $20 plan lacking for their needs, however for my needs it fits perfectly.
 - kaizoku15612 hours ago
 The limits are probably even higher than that, i seem to get about 100$+ of usage on composer and about 45-50 usd on non composer models
ChrisArchitect12 hours ago
Non-x link: <a href="https://cursor.com/blog/composer-2-5" rel="nofollow">https://cursor.com/blog/composer-2-5</a> (<a href="https://news.ycombinator.com/item?id=48182126">https://news.ycombinator.com/item?id=48182126</a>)
contextcost5 minutes ago
[dead]
scuderiaseb13 hours ago
[dead]