MiniMax M2.1: Built for Real-World Complex Tasks, Multi-Language Programming

(minimaxi.com)

228 points by 11044 days ago

25 comments

viraptor43 days ago
I've played with this a bit and it's ok. I'd place it somewhere around sonnet 4.5 level, probably below. But with this aggressive pricing you can just run 3 copies to do the same thing, choose the one that succeeded and still come out way ahead with the cost. Not as great as following instructions as Claude models and can get lost, but still "good enough".I'm very happy with using it to just "do things". When doing in depth debugging or a massive plan is needed, I'd go with something better, but later going through the motions? It works.
gcanyon43 days ago
Would it kill them to use the words "AI coding agent" somewhere prominent?"MiniMax M2.1: Significantly Enhanced Multi-Language Programming, Built for Real-World Complex Tasks" could be an IDE, a UI framework, a performance library, or, or...
- spoaceman777743 days ago
 It's not an AI coding agent. It's an LLM that can be used for whatever you'd like, including powering coding agents.
 - pdyc43 days ago
 That reinforces OP’s point that it isn’t clear from their wording. I initially thought it was a speech model, then I saw Python, etc., and it took me a bit more reading to understand what it actually is
 - gcanyon43 days ago
 HA! I almost added a disclaimer to the original message that I wasn't certain in my identification, hence the request/complaint that they didn't make it clear. But I figured the message would be more effective if I "confidently got it wrong" rather than asking, so I went with it.
 - martin-t43 days ago
 Some sad irony: just like saying the wrong thing is more likely to get you a reply, using a poor title gets them more engagement.
 - gcanyon43 days ago
 Maybe :-(
- tw198443 days ago
 its main Chinese competitor GLM is like making 50 cents USD each in the past 6 months from its 40 million "developer users", calling your flagship model "AI coding agent" is like telling investors "we are doing this for fun, not for money".
Tepix43 days ago
The weights got released on huggingface now.<a href="https://huggingface.co/MiniMaxAI/MiniMax-M2.1" rel="nofollow">https://huggingface.co/MiniMaxAI/MiniMax-M2.1</a>
kachapopopow43 days ago
I think people should stop comparing to sonnet, but to opus instead since it's so far ahead on producing code I would actually want to use (gemini 3 pro tends to be lacking in generalization and wants things to be using it's own style rather than adapting).Whatever benchmark opus is ahead in should be treated as a very important metric of proper generalization in models.
- azuanrb43 days ago
 I generally prefer Sonnet as comparison too. Opus, as good as it is, is just too expensive. The "best" model is the one I can use, not the one I can't afford.These days, by default I just use Sonnet/Haiku. In most cases it's more than good enough for me. It's plenty with $20 plan.With MiniMax, or GLM-4.7, some people like me are just looking for Sonnet level capability at much cheaper price.
 - mjburgess43 days ago
 Are you using GLM-4.7? I've just spent a fortune on Opus, and I heard GLM was close -- but after integrating it into cursor, it seems to spin forever, loose tool use, and generates partial? plans. I did look into using it with the claude cli tool, so it could be cursor specific -- but I havent had the best experience despite going for the pro plan with them. Any advise on how you're using GLM effectively? If at allAt the moment Opus is the only model i can trust even when it generates "refactoring work", it can do the refactoring.
 - azuanrb43 days ago
 I’m on the Lite plan. For coding, I still prefer Claude because the models are simply better. I mainly use CLI tools like Claude Code and OpenCode.I’m also managing a few projects and teams. One way I’m getting value from my GLM subscription is by building a daily GitHub PR summary bot using a GitHub Action. It’s good enough for me to keep up with the team and to monitor higher-risk PRs.Right now I’m using GLM more as an agent/API rather than as a coding tool. Claude works best for agentic coding for me.I’m on Claude $20 plan and I usually start with Haiku, then I switch to Sonnet or Opus for harder or longer tasks.
 - sumedh43 days ago
 > I did look into using it with the claude cli tool, so it could be cursor specificClaude Code with GLM seems ok to me, I just it use it as a backup LLM if in case I hit usage limits but for some light refactoring it did the job well.Are you also facing issues with Claude Code and GLM?
 - kachapopopow42 days ago
 No matter the price they're far cheaper than a developer and opus / gemini 3 pro are both at a level where they're really useful pair programmers and opus at times can be given a spec to implement and it will do it after 30 minutes with no input from me.
 - baq43 days ago
 are you counting price per token or price per successful task? I'm pretty sure opus 4.5 is cheaper per task than sonnet in some use cases.
 - azuanrb43 days ago
 Per successful tasks. The result are mixed. Like you mentioned, it can be cheaper but only in some use cases. I'm only on the $20 plan. If I use Opus and it's not as efficient for my current tasks, I'll burn through my limit pretty fast. Ended up can't use any anymore for the next few hours.Whereas with Sonnet/Haiku, I'm much more guaranteed to have 100% AI assistance throughout my coding session. This matters more to me right now. Just a tradeoff I'm willing to make.
 - andai43 days ago
 Opus is 3x cheaper now.I think it's still not on the $20 plan tho which is sad.
 - azuanrb43 days ago
 Available since few weeks ago.> Claude Opus 4.5, our frontier coding model, is now available in Claude Code for Pro users. Pro users can select Opus 4.5 using the /model command in their terminal.Opus 4.5 will consume rate limits faster than Sonnet 4.5. We recommend using Opus for your most complex tasks and using Sonnet for simpler tasks.
 - sheepscreek43 days ago
 Use Claude Opus in Antigravity. Google is very generous with the limits. The best part is, if you hit your limit, you can switch to Gemini Pro High.I think Google is able to do this because they host Claude on their own TPUs in their datacentres (probably for Vertex AI customers). So they can undercut just about anyone include Anthropic on costs!No matter which model you start with, having the other frontier model as a backup is fantastic. Essentially you’re getting 2x the limit.
 - WiSaGaN43 days ago
 It is now. But the limit on $20 plan is quite low and easy to use up.
jondwillis44 days ago
> MiniMax has been continuously transforming itself in a more AI-native way. The core driving forces of this process are models, Agent scaffolding, and organization. Throughout the exploration process, we have gained increasingly deeper understanding of these three aspects. Today we are releasing updates to the model component, namely MiniMax M2.1, hoping to help more enterprises and individuals find more AI-native ways of working (and living) sooner.This compresses to: “We are updating our model, MiniMax, to 2.1. Agent harnesses exist and Agents are getting more capable.”A good model and agent harness, pointed at the task of writing this post, might suggest less verbosity and complexity— it comes off as fake and hype-chasing to me, even if your model is actually good. I disengage there.I saw yall give a lightning talk recently and it was similarly hype-y. Perhaps this is a translation or cultural thing.
- tw198443 days ago
 so when MiniMax released a pretty capable model, you choose to ignore the model itself and just focus a single sentence they wrote in the release note and started bad mouthing it.is it a cultural thing?
 - pembrook43 days ago
 It’s called bikeshedding and yes it’s a cultural thing on HN. [1]Most people here are big company worker bees where they take zero risks and do very little of substance.In these organizations, it’s common for large groups of people to get together in “meetings” and endlessly nitpick surface-level details of unimportant things while completely missing the big picture because it’s far too complex to allow for easy opinions or smart-sounding critique.[1] <a href="https://en.wikipedia.org/wiki/Law_of_triviality" rel="nofollow">https://en.wikipedia.org/wiki/Law_of_triviality</a>
 - jondwillis43 days ago
 It’s the first thing in this press release. Start with garbage? I’m going to assume it’s all garbage.
 - tw198442 days ago
 you are more than welcomed to call it garbage or whatever else you like. they will just catch up fast and eat your lunch in 6-12 months time.btw, full weights are now available for download.
 - jondwillis42 days ago
 I’m not even sitting at the table. I’m a spectator. What’s your argument/allegiance here? The original article is hyped-up drivel. The model could be amazing and that’s still the case.
 - simlevesque43 days ago
 If I use a software I need to trust it.
 - tw198443 days ago
 a model is not software, it is a bunch of weights.you are more than welcomed to pick whatever model or software you choose to trust, that is totally fine. However, that is vastly different from bad mouthing a model or software just because its release note contains a single sentence you don't like.
 - LoganDark43 days ago
 The API is software. You don't get the weights.
 logicprog43 days ago
 The weights are open.
 homarp43 days ago
 here <a href="https://huggingface.co/MiniMaxAI/MiniMax-M2.1" rel="nofollow">https://huggingface.co/MiniMaxAI/MiniMax-M2.1</a>GGUF <a href="https://huggingface.co/unsloth/MiniMax-M2.1-GGUF" rel="nofollow">https://huggingface.co/unsloth/MiniMax-M2.1-GGUF</a>
 LoganDark42 days ago
 Huh, I couldn't find that in the article when I posted my comment. I checked again now and it's there.
- zaptrem43 days ago
 Not sure it’s a cultural thing since most of the copy coming out of DeepSeek has been pretty straightforward.
tomcam44 days ago
I still can’t figure out what it does
- esafak43 days ago
  It's an LLM for coding.
- yinuoli43 days ago
  It's a neural network model, and it could generate text following a given text.
- prmph44 days ago
  You are not alone
- tucnak43 days ago
  You should ask ChatGPT.
- dist-epoch43 days ago
  Money, it does money
  - tomcam41 days ago
    NOW I UNDERSTAND
gempir43 days ago
Very anecdotal but for me this model has very weak prompt adherence. I compared it a tiny bit to gemini flash 3.0 and simple things like "don't use markdown tables in output" was very hard to get with m2.1Took me like 5 prompt iterations until it finally listened.But it's very good, better than flash 3.0 in terms of code output and reasoning while being cheaper.
p5v43 days ago
Has anyone used this in earnest with something like OpenCode? Over the past few months I’ve tested a dozen models that were claimed to be nearly as good Claude Code or Codex, but the overall experience when using them with OpenCode was close to abysmal. Not even a single one was able to do a decent code editing job on a real-world codebase.
- t1amat43 days ago
  With M2, yes - I’ve used it in Claude Code (e.g. native tool calling), Roo/Cline (e.g. custom tool parsing), etc. It’s quite good and for some time the best model to self-host. At 4bit it can fit on 2x RTX 6000 Pro (e.g. ~200GB VRAM) with about 400k context at fp8 kv cache. It’s very fast due to low active params, stable at long context, quite capable in any agent harness (its training specialty). M2.1 should be a good bump beyond M2, which was undertrained relative to even much smaller models.
Invictus043 days ago
How is everyone monitoring the skill/utility of all these different models? I am overwhelmed by how many they are, and the challenge of monitoring their capability across so many different modalities.
- redman2543 days ago
 <a href="https://www.swebench.com" rel="nofollow">https://www.swebench.com</a><a href="https://swe-rebench.com" rel="nofollow">https://swe-rebench.com</a><a href="https://livebench.ai/#/" rel="nofollow">https://livebench.ai/#/</a><a href="https://eqbench.com/#" rel="nofollow">https://eqbench.com/#</a><a href="https://contextarena.ai/?needles=8" rel="nofollow">https://contextarena.ai/?needles=8</a><a href="https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/" rel="nofollow">https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...</a><a href="https://artificialanalysis.ai/leaderboards/models" rel="nofollow">https://artificialanalysis.ai/leaderboards/models</a><a href="https://gorilla.cs.berkeley.edu/leaderboard.html" rel="nofollow">https://gorilla.cs.berkeley.edu/leaderboard.html</a><a href="https://github.com/lechmazur/confabulations" rel="nofollow">https://github.com/lechmazur/confabulations</a><a href="https://dubesor.de/benchtable" rel="nofollow">https://dubesor.de/benchtable</a><a href="https://help.kagi.com/kagi/ai/llm-benchmark.html" rel="nofollow">https://help.kagi.com/kagi/ai/llm-benchmark.html</a><a href="https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard" rel="nofollow">https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard</a>
 - Alifatisk43 days ago
 I’d stick to artificial analysis
 - pylotlight43 days ago
 That has many of its own problems as well.
- spoaceman777743 days ago
 This is the best summary, in my opinion. You can also see the individual scores on the benchmarks they use to compute their overall scores.It's nice and simple in the overview mode though. Breaks it down into an intelligence ranking, a coding ranking, and an agentic ranking.<a href="https://artificialanalysis.ai/" rel="nofollow">https://artificialanalysis.ai/</a>
 - Invictus042 days ago
 Unfortunately it's completely unusable on mobile
 - spoaceman777742 days ago
 Works fine for me, but you could also just turn on desktop view in your mobile browser if it isn't big enough on your screen.I use Firefox Mobile, so perhaps there is a difference on Chromium-based browsers?
esafak43 days ago
> It exhibits consistent and stable results in tools such as Claude Code, Droid (Factory AI), Cline, Kilo Code, Roo Code, and BlackBox, while providing reliable support for Context Management mechanisms including Skill.md, Claude.md/agent.md/cursorrule, and Slash Commands.One of the demos shows them using Claude Code, which is interesting. And the next sections are titled 'Digital Employee' and 'End-to-End Office Automation'. Their ambitions obviously go beyond coding. A sign of things to come...
- atombender43 days ago
 Claude doesn't officially support using other, non-Anthropic models, right? So did they patch the code or fake the Claude API, or some other hack to get around that?
 - homarp43 days ago
 you have a few 'claude' proxies on githubllama.cpp recently added Anthropic API support <a href="https://github.com/ggml-org/llama.cpp/pull/17570" rel="nofollow">https://github.com/ggml-org/llama.cpp/pull/17570</a>
- jimmydoe43 days ago
 they are going IPO in HKEX in a few weeks. some hype up are necessary, not too far fetched imo, pretty much same as anthropic playbook.
 - tw198443 days ago
 anthropic playbook does include the false claim publicly made by its CEO that "in six months AI would be writing 90 percent of code". he made that claim 10 months ago. it is a criminal offence for intentionally misleading investors in many countries.MiniMax is like 100x more honest.
 - fluoridation43 days ago
 Does it come as misleading if you honestly believe what you're saying but are simply mistaken?
 - sumedh43 days ago
 > in six months AI would be writing 90 percent of codeAre you still writing code by hand?
m00dy43 days ago
I used gemini-3-pro-preview on Deepwalker [0]. It was good, then switched to gemini-3-flash, It's ok. It gets the job done. Looking for some alternatives such as GLM and Minimax. Very curious about their agentic performance. Like long running tasks with reasoning.[0]: <a href="https://deepwalker.xyz" rel="nofollow">https://deepwalker.xyz</a>
sosodev43 days ago
I’ve spent a little bit of time testing Minimax M2. It’s quite good given the small size but it did make some odd mistakes and struggle with precise instructions.
- viraptor43 days ago
  This is an announcement for M2.1 not M2. It got a decent bump in agent capabilities.
stpedgwdgfhgdd43 days ago
Internal Server Error
- 01-_-43 days ago
  me too
big-chungus442 days ago
can you please fix the login, when I try to log in, it says Unable to process request due to missing initial state. This may happen if browser sessionStorage is inaccessible or accidentally cleared. Some specific scenarios are - 1) Using IDP-Initiated SAML SSO. 2) Using signInWithRedirect in a storage-partitioned browser environment.
jdright44 days ago
<a href="https://www.minimax.io/news/minimax-m21" rel="nofollow">https://www.minimax.io/news/minimax-m21</a>
- big-chungus442 days ago
 whats the difference
mr_o4743 days ago
I won't say it's same on the level of claude models but it's definitely good at coming up with frontend designs
integricho43 days ago
Their site crashes my phone browser while scrolling. Is that the expected quality of output of their product?
- Tepix43 days ago
  Should a website be able to crash a browser?
- jedisct143 days ago
  If a website can crash your browser, the problem is your browser...
sillyboi43 days ago
Internal server error..
p-e-w44 days ago
One of the cited reviews goes:“We're excited for powerful open-source models like M2.1 […]”Yet as far as I can tell, this model isn’t open at all. Not even open weights, nevermind open source.
- viraptor43 days ago
 It's scheduled for release. They jumped the gun with the news. But at far as we know, it's still coming out, just like M2.
 - p-e-w43 days ago
 I don’t get it. What’s the holdup? Uploading a model to Hugging Face isn’t exactly difficult.
- NitpickLawyer43 days ago
 Repo made public a few minutes ago:<a href="https://huggingface.co/MiniMaxAI/MiniMax-M2.1" rel="nofollow">https://huggingface.co/MiniMaxAI/MiniMax-M2.1</a>
- bearjaws44 days ago
 Yeah I don't see anyway to download this, ollama has it as cloud only.
boredemployee43 days ago
Internal Server Error
erdemo43 days ago
The intro video is so cringe as their AI agent name.
Yash1643 days ago
[dead]
GavinNewsom43 days ago
[flagged]
maximgeorge43 days ago
[dead]
monster_truck44 days ago
That they are still training models against Objective-C is all the proof you need that it will outlive Swift.When is someone going to vibe code Objective-C 3.0? Borrowing all of the actual good things that have happened since 2.0 is closer than you'd think thanks to LLVM and friends.
- viraptor44 days ago
 Why would they not? Existing objective-c apps will still need updates and various work. Models are still trained on assembler for architectures that don't meaningfully exist today as well.
- victorbjorklund43 days ago
 I’m sure you can find some COBOL code in many of the training sets. Not sure I would build my next startup using COBOL.