Show HN: Mysti – Claude, Codex, and Gemini debate your code, then synthesize

(github.com)

216 points by bahaAbunojaim46 days ago

40 comments

d4rkp4ttern42 days ago
A workflow I find useful is to have multiple CLI agents running in different Tmux panes and have one consult/delegate to another using my Tmux-CLI [1] tool + skill. Advantage of this is that the agents’ work is fully visible and I can intervene as needed.[1] <a href="https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#tmux-cli-terminal-automation" rel="nofollow">https://github.com/pchalasani/claude-code-tools?tab=readme-o...</a>
- vidarh42 days ago
 Have you considered using their command line options instead? At least Codex and Claude both support feeding in new prompts in an ongoing conversation via the command line, and can return text or stream JSON back.
 - d4rkp4ttern42 days ago
 You mean so-called headless or non-interactive mode? Yes I’ve considered that but the advantage communication via Tmux panes is that all agent work is fully visible and you can intervene as needed.My repo has other tools that leverage such headless agents; for example there’s a resume [1] functionality that provides alternatives to compaction (which is not great since it always loses valuable context details): The “smart-trim” feature uses a headless agent to find irrelevant long messages for truncation, and the “rollover” feature creates a new session and injects session lineage links, with a customizable extraction of context for the task to be continued.[1] <a href="https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#resume-options--managing-context" rel="nofollow">https://github.com/pchalasani/claude-code-tools?tab=readme-o...</a>
- petesergeant42 days ago
 I've had good success with a similar workflow, most recently using it to help me build out a captive-wifi debugger[0]. In short, it worked _pretty_ well, but it was quite time intensive. That said, I think removing the human from the loop would have been insanity on this: lots of situations where there were some very poor ideas suggested that the other LLMs went along with, and others where one LLM was the sole voice of reason against the other two.I think my only real take-away from all of it was that Claude is probably the best at prototyping code, where Codex make a very strong (but pedantic) code-reviewer. Gemini was all over the place, sometimes inspired, sometimes idiotic.0: <a href="https://github.com/pjlsergeant/captive-wifi-tool/tree/main" rel="nofollow">https://github.com/pjlsergeant/captive-wifi-tool/tree/main</a>
 - bahaAbunojaim42 days ago
 This is exactly why I built Mysti because I used that flow very often and it worked well, I also added personas and skills so that it is easy to customize the agents behavior and if you have any ideas to make the behavior better then please don’t hesitate to share! Happy to jump on a call and discuss it as well
- bikeshaving42 days ago
 I have a similar workflow except I haven’t put time into the tooling - Claude is adept at TMUX and it can almost even prompt and respond to ChatGPT except it always forgets to press Enter when it sends keys. Have your agents been able to communicate with each other with tmux send-keys?
 - theturtletalks42 days ago
 I had the same issue. Subagents are nice but the LLM calling them can’t have a back and forth conversation. I tried tmux-cli and even other options like AgentAPI[0] but the same issue persists, the agent can’t have a back and forth with the tmux pane.To people asking why would you want Claude to call Codex or Gemini, it’s because of orchestration. We have an architect skill we feed the first agent. That agent can call subagents or even use tmux and feed in the builder skill. The architect is harnessed to a CRUD application just keeping track of what features were built already so the builder is focused on building only.0. <a href="https://github.com/coder/agentapi" rel="nofollow">https://github.com/coder/agentapi</a>
 - d4rkp4ttern42 days ago
 Yes this and other edge cases is why I made the Tmux-CLI wrapper. Yes they use send-keys with suitable delays etc
 - zingar42 days ago
 What are you asking/expecting Claude to do with tmux?
 - bikeshaving42 days ago
 I find that asking Claude to develop and Codex to review the uncommitted changes will typically result in high-value code, and eliminate all of Claude’s propensity to perpetually lie and cheat. Sometimes I also ideate with Claude and then ask Claude to get ChatGPT’s opinion on the matter. I started by copy-pasting responses but I found tmux to be a nice way to get rid of the middleman.
 - joshstrange41 days ago
 What does tmux add here? Or how does it allow you to do that? I’m sorry I’m just missing it I’m sure. I don’t use tmux a lot so I don’t know all its potential.
 aoeusnth140 days ago
 It lets Claude directly type into Codex as if it were the user, or vice versa
 zingar40 days ago
 And you’re finding that it can’t do that without tmux?
- bahaAbunojaim42 days ago
 I will look it up indeed
- wild_egg42 days ago
 What does Tmux-CLI add on top of regular tmux?Everything in the "What Claude Code Can Do With Tmux-CLI" section is already easily possible out of the box with vanilla tmux
 - d4rkp4ttern42 days ago
 You're right that vanilla tmux can do all of this, if a human were to use it. tmux-cli exists because LLMs frequently make mistakes with raw tmux: forgetting the Enter key, not adding delays between text and Enter (causing race conditions with fast CLI apps), or incorrect escaping.It bakes in defaults that address these: Enter is sent automatically with a 1-second delay (configurable), pane targeting accepts simple numbers instead of session:window.pane, and there's built-in wait_idle to detect when a CLI is ready for input. Basically a wrapper that eliminates the common failure modes I kept hitting when having Claude Code interact with other terminal sessions.
- throwaway12345t42 days ago
 This is cool, if Codex or Gemini CLI is supported it would be good to have a section in the readme indicating shortcomings etc (may have missed)
 - tikimcfee42 days ago
 The idea works well with or without direct integration. You can have a cli agent read arbitrary state of any tmux session and have it drive work through it. I use it for everything from dev work to system debugging. It turns out a portable and callable binary with simple parameters is still easier to use for agents than protocols and skills: <a href="https://github.com/tikimcfee/gomuxai" rel="nofollow">https://github.com/tikimcfee/gomuxai</a>
 - d4rkp4ttern42 days ago
 There’s no special support needed; it’s just a bash command that any CLI agent can use. For agents that have skills, the corresponding skill helps leverage more easily. I’ll add that to the README
 - bahaAbunojaim42 days ago
 Claude code, Gemini and codex are all supported but need more testing so I would really value the feedback, bug reports and contributions as well :DContributions will be highly appreciated and credited
- sharifabdel42 days ago
 What prompted you to build this?
 - d4rkp4ttern42 days ago
 I have both Codex and Claude subs so I wanted one to be able to consult the other. Also it’s useful when you have a cli script that an agent is iterating on, so it can test it. Another use case is for a CLI agent to run a debugger like PDB in another pane, though I haven’t used it much.
 - bahaAbunojaim42 days ago
 I used to get stuck sometimes with Claude and needing a different agent to take a look and the switch back and forth between those agents is a headache and also you won’t be able to port all the context so thought this might help solve real blockers for many devs on larger projects
csar42 days ago
Getting feedback on a plan or implementation is valuable because you get a fresh set of eyes. Using multiple models may help though it always feels a bit silly to me (if nothing else you’re increasing non-determinism because you know have to understand 2 LLM’s quirks).But the “playing house” approach of experts is somewhere between pointless and actively harmful. It was all the rage in June and I thought people abandoned that later in the summer.If you want the model to eg review code instead of fixing things, or document code without suggesting improvements (for writing docs), that’s useful. But there’s. I need for all these personas.
- bahaAbunojaim42 days ago
 The way it works is that each agent think independently, discuss the solution and each agent opinion then one will synthesize a solution.
 - csar42 days ago
 I understand. My point is that the personas are generally not a good idea and that there are much simpler and more predictable ways of getting better results.
 - jacob01942 days ago
 I get where you're coming from, especially since role playing was so vital in early models in a way that is no longer necessary, or even harmful; however, when designing a complex system of interactions, there's really no way around it. And as humans we do this constantly, putting on a different hat for different jobs. When I'm wearing my developer hat, I have to reason about the role of each component in a system, and when I use an agent to serve in that role, by curating it's context and designating rules for how I want it to behave, I'm assigning it a persona. What's more, I may prime the context user and assistant messages, as examples of how I want it to respond. That context becomes the agent's personality--it's persona.
 - bahaAbunojaim41 days ago
 Spot on
tombert41 days ago
I so want to like these vibe coding agents, and sometimes I do, but it really does kind of suck the joy out of things.What I was hoping would be that I could effectively farm out work to my metaphorical AI intern while I get to focus on fun and/or interesting work. Sometimes that is what happens and it makes me very happy when it does. A lot of the time, however, it generates code that is wrong, or incomplete (while claiming it is complete), and so I end up having to babysit the code, either by further prompting or just editing the code.And then it makes a lot of software engineering become "type prompt, sit and wait a minute, look at the code, repeat", which means I'm decidedly not focusing the fun part of the project and instead I'm just larping as a manager who backseat codes.A friend of mine said that he likes to do this backwards: he writes a lot of the code himself and then he uses Claude Code to debug and automate writing tedious stuff like unit tests, and I think that might make it a little less mind numbing.Also, very tangential, and maybe my prompting game isn't completely on point here, but Codex seems decidedly bad at concurrent code [1]. I was working on some lock-free data store stuff, and Codex really wanted to add a bunch of lock files that were wholly unnecessary. Oh, and it kept trying to add mutexes into Rust, no matter how many times I tell it I don't want locks and it should use one-shot channels instead. To be fair, when I went and fixed the functions myself in a few spots and then told it to use that as an example, it did get a little better.[1] I think this particular case is because it's trained on example code from Github and most code involving concurrency uses locks (incorrectly or at least sub-optimally). I guess this particular problem may be more of the fault of American universities teaching concurrent programming incorrectly at the undergrad level.
- bahaAbunojaim41 days ago
 I find it useful to let one agent come up with a plan after a review and another agent implementing the plan. For example, Gemini reviewing the code, codex writing a plan and then Claude code implementing it
 - mycall41 days ago
 What about the reverse, after Claude code implements it, let Gemini/Codex do a code review for bugs and architecture revisions? I found it is important to prompt to only make absolutely minimal changes to the working code, or unwanted code clobbering will happen.
 - bahaAbunojaim40 days ago
 That works great too. Will be adding the ability to tag another agent in a near release
cheema3342 days ago
I created a simple skill in Claude Code CLI that collaborates with Codex CLI. It is just a prompt saved in the skill format. It uses subagents as well.Honest question. How is Mysti better than a simple Claude skill that does the same work?
- achille42 days ago
 Could you share your skill and workflow? does claude launch codex in a tmux session?
- bahaAbunojaim42 days ago
 The skill would allow Claude Code CLI to call Codex CLI but then Claude Code CLI will need to pass context to Codex which would require writing the context "which causes latency" and this process of writing the context will provide limited context to Codex and also eat up from the main context window. Mysti shares the context which is very different from passing context as a parameter.
- Johnny_Bonk41 days ago
 could you share the skill please, id lke to try it, maybe enhance it
mlrtime42 days ago
Why make it a vscode extension if the point of these 3 tools is a cli interface? Meaning most of the people I know use these tools without VSCode. Is VSC required?
- KronisLV42 days ago
 > Meaning most of the people I know use these tools without VSCode.I guess it depends?You can usually count on Claude Code or Codex or Gemini CLI to support the model features the best, but sometimes having a consistent UI across all of them is also nice - be it another CLI tool like OpenCode (that was a bit buggy for me when it came to copying text), or maybe Cline/RooCode/KiloCode inside of VSC, so you don't also have to install a custom editor like Cursor but can use your pre-existing VSC setup.Okay, that was a bit of a run on sentence, but it's nice to be able to work on some context and then to switch between different models inline: "Hey Sonnet, please look at the work of the previous model up until this point and validate its findings about the cause of this bug."I'd also love it if I could hook up some of those models (especially what Cerebras Code offers) with autocomplete so I wouldn't need Copilot either, but most of the plugins that try to do that are pretty buggy or broken (e.g. Continue.dev). KiloCode also added autocomplete, but it doesn't seem to work with BYOK.
 - bahaAbunojaim42 days ago
 Very true, I like the fact that I can now use them with a consistent UI, shared context and ability to brainstormWill definitely try to add those features in a future release as well
- bahaAbunojaim42 days ago
 That’s a great idea! I can make it a CLI too
- davidmurdoch42 days ago
 Huh. I know hundreds that use LLMs in a VSCode based IDE, and 3 that use the CLI.
 - datameta42 days ago
 I was a proponent initially of CLI when Claude integration with VSCode required a WSL instance, but now that it is integrated directly into VSCode I feel one grouping of tooling hiccups is now ruled out in my workflow. The only (major) nitpick I have is that it wont let you finish typing and cuts you off when asking whether/how to proceed.
dwa359242 days ago
>Claude Code (Anthropic), Codex (OpenAI), and Gemini (Google) have different training, different strengths, and different blind spots.Do they?There was a paper about HiveMind in LLMs. They all tend to produce similar outputs when they are asked open ended questions.
- monkeydust42 days ago
 [2510.22954] Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) <a href="https://share.google/1GHdUvhz2uhF4PVFU" rel="nofollow">https://share.google/1GHdUvhz2uhF4PVFU</a>
- bahaAbunojaim42 days ago
 I usually switch agents when one agent get stuck and I faced several situations where one agent solved a problem that the other agent was stuck on
- blks42 days ago
 It’s perceived, and perception varies across developers and time. These tools are not guaranteed to deliver anything.
spaceman_202042 days ago
I’ve never seen a profession change so fast as coding right now
- blks42 days ago
 Don’t worry, it’s not. Just people doing busy work and spending time struggling their “tools” to make something useful.
 - spaceman_202041 days ago
 Coding isn’t the first profession to be disrupted by automation and it won’t be the last
 - mycall41 days ago
 Wait until automation is itself automated.
- CamperBob242 days ago
 Have to keep in mind that what is happening now is basically what was promised decades ago. Never mind 4GL, 5GL, expert systems, and other efforts that went nowhere... even COBOL was created with the intention of making programming look more like natural language.Often, revolutions take longer to happen than we think they will, and then they happen faster than we think they will. And when the tipping point is finally reached, we find more people pushing back than we thought there would be.
 - bandrami40 days ago
 OK but is it leading to either better or more plentiful software? That's the step that people keep seeming to miss here.
 - bahaAbunojaim42 days ago
 I believe high level languages will be replaced by natural human language, the same way as low level languages replaced by high level languages. It is the natural evolution of development.On the other hand agentic teams will take over solo agents.
 - Terretta42 days ago
 > I believe high level languages will be replaced by natural human languageI believe high level languages will be replaced by natural human languageAsk any human client buying dev work from a web agency how "natural language" spec works out for them.It's not clear to me at all that "natural language" alone is ideal -- unless you also have near real time iteration of builds. If you do, then the build is a concrete manifestation of the spec, and the natural language can say "but that's not what I meant".This allows messy natural language to vector towards the solution, "I'll know it when I see it" style.So, natural language shaping iteratively convergent builds.
 - blks42 days ago
 Is that your actual believe? That non-formal, natural human language explaining the task will replace formal programming?
 - felipeerias42 days ago
 Things seem to be heading in the direction of using formal languages to define deterministic behaviour and natural languages to express matters of human taste.
 - spaceman_202041 days ago
 If you zoom out, it does seem like the most natural thing - why should humans with finite memory and context better than an all knowing machine?
- justatdotin42 days ago
 I think it is actually going to happen verrrrry slowly. but it will happen. Many many of my colleagues are understandably resisting. it will take a long time to balance out.
- qudat41 days ago
 Meh. What I’m doing with coding agents is what I’ve been doing for years: TDD except I use prose to describe what I want instead of writing every line of code and then spend more time in review/qa
 - johnisgood41 days ago
 > except I use prose to describe what I want instead of writing every line of codeExactly.
- bandrami41 days ago
 And yet the output isn't noticeably different from 5 years ago.
tiku42 days ago
Anyone knows of something similar but for terminal?Update:I've already found a solution based on a comment, and modified it a bit.Inside claude code i've made a new agent that uses the MCP gemini through <a href="https://github.com/raine/consult-llm-mcp" rel="nofollow">https://github.com/raine/consult-llm-mcp</a>. this seems to work!Claude code:Now let me launch the Gemini MCP specialist to build the backend monitoring server:gemini-mcp-specialist(Build monitoring backend server) ⎿ Running PreToolUse hook…
- pella42 days ago
 <a href="https://github.com/just-every/code" rel="nofollow">https://github.com/just-every/code</a> "Every Code - push frontier AI to it limits. A fork of the Codex CLI with validation, automation, browser integration, multi-agents, theming, and much more. Orchestrate agents from OpenAI, Claude, Gemini or any provider." Apache 2.0 ; Community fork;
 - ggggffggggg42 days ago
 > Note: If another tool already provides a code command (e.g. VS Code), our CLI is also installed as coder. Use coder to avoid conflicts.“If”, oh, idk, just the tool 90% of potential users will have installed.
 - bahaAbunojaim42 days ago
 When you say orchestrate agents then what it would do? Would it allow the same context across agents and can I make agents brainstorm?
 - pella42 days ago
 <pre><code> # Plan code changes (Claude, Gemini and GPT-5 consensus) # All agents review task and create a consolidated plan /plan "Stop the AI from ordering pizza at 3AM" # Solve complex problems (Claude, Gemini and GPT-5 race) # Fastest preferred (see https://arxiv.org/abs/2505.17813) /solve "Why does deleting one user drop the whole database?" # Write code! (Claude, Gemini and GPT-5 consensus) # Creates multiple worktrees then implements the optimal solution /code "Show dark mode when I feel cranky" # Hand off a multi-step task; Auto Drive will coordinate agents and approvals /auto "Refactor the auth flow and add device login"</code></pre>
 - westurner42 days ago
 just-every/code: <a href="https://github.com/just-every/code" rel="nofollow">https://github.com/just-every/code</a> ... <a href="https://news.ycombinator.com/item?id=44959671">https://news.ycombinator.com/item?id=44959671</a>
- rane42 days ago
 My similar workflow within Claude Code when it gets stuck is to have it consult Gemini. Works either through Gemini CLI or the API. Surprisingly powerful pattern because I've just found that Gemini is still ahead of Opus in architectural reasoning and figuring out difficult bugs. <a href="https://github.com/raine/consult-llm-mcp" rel="nofollow">https://github.com/raine/consult-llm-mcp</a>
 - bahaAbunojaim42 days ago
 This is one of the reasons I actually built it but wanted to make it more generalized to work with any agent and on the same context without switching
 - tiku42 days ago
 I like this solution that you can ask Gemini
 - bahaAbunojaim42 days ago
 Any other ideas that you think would make it more powerful?
 - tiku42 days ago
 Perhaps that you can tell it to "use gemini for task x, claude for task y" as sub-agents.
 bahaAbunojaim42 days ago
 How about adding the ability to tag an agent. for example:@gemini could you review the code and then provide a summary to @claude?@claude can you write the classes based on an architectural review by @codexWhat do you think? Does that make sense ?
- tikimcfee42 days ago
 Here's a portable binary you drop in a directory to allow agentic cli to cross communicate with other agents, store and read state, or act as the driver of arbitrary tmux sessions in parallel: <a href="https://github.com/tikimcfee/gomuxai" rel="nofollow">https://github.com/tikimcfee/gomuxai</a>
 - bahaAbunojaim42 days ago
 This is very interesting, maybe I can also integrate it into Mysti
 - tikimcfee42 days ago
 Happy to help build said integration with ya, feel free to post an issue, fork, or send me a dm. The tool itself exposes the internal DB as well so others with interest can access logs, context, etc.
- esafak42 days ago
 <a href="http://opencode.ai/" rel="nofollow">http://opencode.ai/</a>
 - bahaAbunojaim42 days ago
 Interesting indeed but would it behave the same as Claude code or will it have its own behavior, I think the system prompt is one of the key things that differentiate every agent
 - esafak42 days ago
 I do not understand your question. Even in Claude code you have access to multiple models. You can have one critique the other.
- bahaAbunojaim42 days ago
 I can make it for the terminal if that would be helpful, what do you think?
- vulture91642 days ago
 Pal MCP (formerly Zen) is pretty awesome.<a href="https://github.com/BeehiveInnovations/pal-mcp-server" rel="nofollow">https://github.com/BeehiveInnovations/pal-mcp-server</a>
 - bahaAbunojaim42 days ago
 Will give it a look indeed, I think one of the challenges with the MCP approach is that the context need to be passed and that would add to the overhead of the main agent. Is that right?
 - vulture91642 days ago
 The CLINK command will spawn separate CLI.Don’t quote me, but I think the other methods rely on passing general detail/commands and file paths to Gemini to avoid the context overhead you’re thinking about.
danpalmer42 days ago
> Together they debate, challenge each other, and synthesize the best solutionDo they? How much better are multiple agents on your evals, and what sort of evals are you running? I've also research that suggests that more agents degrades the output after a point.
- bahaAbunojaim41 days ago
 Haven’t done Evals yet but measured on few real world situations where projects got stuck and the brainstorm mode solved it. Definitely running evals is something worth doing and contributions are welcomedI think what really degrades the output is the context length vs context window limits, check out NoLima
 - danpalmer40 days ago
 <a href="https://www.arxiv.org/abs/2512.08296" rel="nofollow">https://www.arxiv.org/abs/2512.08296</a>> coordination yields diminishing or negative returns once single-agent baselines exceed ~45%This is going to be the big thing to overcome, and without actually measuring it all we're doing is AI astrology.
 - bahaAbunojaim40 days ago
 This is why context optimization is going to be critical and thank you so much for sharing this paper as this also validates what we are trying to do. So if we managed to keep the baseline below 40% through context optimization then coordination might actually work very well and helps at scaling agentic systems.I agree on measuring and it is planned especially once we integrate the context optimization. I think the value of context optimization will go beyond just avoiding compacting and reducing cost to providing more reliable agents.
nextaccountic42 days ago
> License: BSL 1.1, free for personal and educational use, converts to MIT in 2030 (would love input on this, does it make sense to just go MIT?)I LOLd at that. Things in AI space become obsolete much faster. I'd say just go with GPL or AGPL if you don't want proprietary software to be built on top of your code
- bahaAbunojaim41 days ago
 Converted it to MIT so it is MIT now
 - nextaccountic41 days ago
 That's quite nice of you!
 - bahaAbunojaim39 days ago
 Thanks!
MrDunham42 days ago
Website link on Github points to <a href="https://deepmyst.com/" rel="nofollow">https://deepmyst.com/</a>But actually hosted on <a href="https://www.deepmyst.com/" rel="nofollow">https://www.deepmyst.com/</a> with no forwarding from the Apex domain to www so it looks like the website is down.Otherwise excited to deep dive into this as this is a variant of how we do development and seems to work great when the AI fights each other.
- csomar42 days ago
 It's a good thing (/s) that Chrome now hides that fact. So it looks like the same domain is down on one tab and working on the other.
- blks42 days ago
 An agent didn’t do very good job
thomas_witt42 days ago
Codex CLI can run as MCP server ootb which you can call directly from Claude code. Together with a prompt to ask codex for a second opinion, that works very well for me, especially in code reviews.
- bahaAbunojaim42 days ago
  But then codex won't have the full context of your existing work and might need to go through its own exploratory path
Tarrosion42 days ago
> Is multi-agent collaboration actually useful or am I just solving my own niche problem?I often write with Claude, and at work we have Gemini code reviews on GitHub; definitely these two catch different things. I'd be excited to have them working together in parallel in a nice interface.If our ops team gives this a thumbs-up security wise I'll be excited to try it out when back at work.
- bahaAbunojaim42 days ago
 Would love to hear your feedback! Please let me know if I can make it any better or if there is anything that would make it very useful
tacone42 days ago
Interesting, I was trying to implement this using AGENTS.md and the runSubagent tool in vscode. Vscode has not yet the capability to invoke different models as subagent so I plan to fallback to instructing copilot to use copilot-cli and gemini-cli. (I am quite angry about copilot CLI offering only full blown models and not the -mini versions though)
- bahaAbunojaim42 days ago
  I'm planning to add copilot, cursor and cline but feel free to contribute to the repo if you would like to do that and will look for ways to use the mini versions of the models as well when I integrate copilot CLI
  - tacone42 days ago
    Problem is, Copilot CLI doesn't really supports free or mini models. You have very tight choice of models. This looks like product decision. I understand why they won't allow you to use the free models on CLI, but not being able to use the (pay for) mini models is beyond me.
    - bahaAbunojaim41 days ago
      copilot support was just added in version 0.2.2
bahaAbunojaim41 days ago
UPDATE: Mysti 0.2.2 ReleaseHey HN! Quick update on Mysti based on your feedback:1- Mysti now supports GitHub Copilot CLI as a fourth provider. So you can now do Claude Code + Copilot (running GPT-5) in Brainstorm mode, or any combination of the 4 providers. Mix and match based on what catches different issues.2- Mysti is now MIT Licensed. Switched from BSL 1.1 to MIT. 3- Better Auth UX When a CLI isn't authenticated, you now get a friendly error with one-click "Open Terminal & Authenticate" instead of cryptic CLI errors.
danielfalbo42 days ago
How do we measure this is any better than just using 1 good model?
- bandrami42 days ago
 One day someone will actually build something with an LLM and do a write-up of it, but until then we'll just keep reading about tooling.
- Closi42 days ago
 Anecdotal experience, but when bugfixing I personally find if a model introduces a bug, it has a hard time spotting and fixing it, but when you give the code to another model it can instantly spot it (even if it's a weaker model overall).So I can well imagine that this sort of approach could work very well, although agree with your sentiment that measurement would be good.
danr442 days ago
licensing with BSL when basically every month the AI world is changing is not a smart decision.
- bahaAbunojaim42 days ago
 Thinking of switching to MIT, what do you think? Is there any other license you would recommend ?
 - RobotToaster42 days ago
 AGPL, it requires anyone who creates a derivative to publish the code of said derivative.
 - bahaAbunojaim42 days ago
 Good idea! Very good point
- rynn42 days ago
 > licensing with BSL when basically every month the AI world is changing is not a smart decisionThis turned me off as well. Especially with no published pricing and a link to a site that is not about this product.At minimum, publish pricing.
 - bahaAbunojaim42 days ago
 Regarding DeepMyst. In the future will offer “optionally” the ability to use smart context where the context will be automatically optimized such that you won’t hit the context window limit “ basically no need for compact” and you would get much higher usage limits because the number of tokens needed will be reduced by up to 80% so you would be able to achieve with a 20 USD claude plan the same as the Pro plan
 - tacone42 days ago
 I strongly suggest to also allow to define a non summarizable part of the context so that behavioral rules stay sharp.
 - bahaAbunojaim42 days ago
 I agree and this is part of what DeepMyst is capable of doing
 tacone42 days ago
 Is it already there? Pretty cool.
 - bahaAbunojaim42 days ago
 It is free and open source. Will make it MIT
 - bahaAbunojaim42 days ago
 Done and converted to MIT
 - rynn42 days ago
 Awesome, in that case, I'll check it out!
- bahaAbunojaim42 days ago
 The project is now MIT!
deepsummer42 days ago
Great idea. Whether brainstorm mode is actually useful is hard to say without trying it out, but it sounds like an interesting approach. Maybe it would be a good idea to try running a SWE benchmark with it.Personally, I wouldn't use the personas. Some people like to try out different modes and slash commands and whatnot - but I am quite happy using the defaults and would rather (let it) write more code than tinker with settings or personas.
- bahaAbunojaim42 days ago
 Fair enough on personas, I like to activate skills more than personas, for example I activate the auto commit skill to ensure the agent would automatically commit after finishing a feature
scrame41 days ago
> Mysti — Built by DeepMyst Inclinks to: <a href="https://deepmyst.com/" rel="nofollow">https://deepmyst.com/</a> Site 404's.> Made with MystiRinging endorsement.
- bahaAbunojaim41 days ago
 Link fixed
DenisM42 days ago
Multi agent collaboration is quite likely the future. All agents have blind spots, collaboration is how they are offset.You may want to study [1] - this is the latest thinking on agent collaboration from Google.[1] <a href="https://www.linkedin.com/posts/shubhamsaboo_we-just-ran-the-biggest-ai-agent-course-ever-activity-7395301281117929472-vRRl" rel="nofollow">https://www.linkedin.com/posts/shubhamsaboo_we-just-ran-the-...</a>
- NitpickLawyer42 days ago
 > Multi agent collaboration is quite likely the futureAutogen from ms was an early attempt at this, and it was fun to play with it, but too early (the models themselves kinda crapped out after a few convos). This would work much better today with how long agents can stay on track.There was also a finding earlier this year, I believe from the swe-bench guys (or hf?), where they saw better scores with alternating between gpt5/sonnet4 after each call during an execution flow. The scores of alternating between them were higher than any of them individually. Found that interesting at the time.
 - paulirish41 days ago
 The latter, if any else are curious: <a href="https://www.swebench.com/post-250820-mini-roulette.html" rel="nofollow">https://www.swebench.com/post-250820-mini-roulette.html</a>
- bahaAbunojaim42 days ago
 Thank you so much for sharing Denis! I definitely believe in the that as the world start switching from single agent to agentic teams where each agent does have specific capabilities. do you know of any benchmarks that covers collaborative agents ?
 - DenisM41 days ago
 You’re welcome.I don’t know if benchmarks, sorry.
prashantsengar42 days ago
This is very useful! I frequently copy the response of one model and ask another to review it and I have seen really good results with that approach.Can you also include Cursor CLI for the brainstorming? This would allow someone to unlock brainstorming with just one CLI since it allows to use multiple models.
- bahaAbunojaim42 days ago
 I’m planning to add Cursor and Cline in the next major release, will try to get in out in Jan
 - reachableceo42 days ago
 Please also add qwen cli support
 - bahaAbunojaim42 days ago
 Will do. I was thinking of also making the LLMs configurable across the agents. I saw a post from the founder of openrouter that you can use DeepSeek with Claude code and was thinking of making it possible to use more LLMs across agents
altmanaltman42 days ago
> Would love feedback on the brainstorm mode. Is multi-agent collaboration actually useful or am I just solving my own niche problem?If it's solving even your own niche problem, it is actually useful though right? Kind of a "yes or yes" question.
- bahaAbunojaim42 days ago
 True and hearing feedback is always helpful and helps validate if it is a common problem or not
GajendraSahu2341 days ago
This looks great! As someone just starting their coding journey, would using multiple agents (Claude/Gemini) help in learning best practices, or is it better suited for experienced developers for refactoring?
- bahaAbunojaim41 days ago
 Thanks! You would need to instruct the agents to follow best practices and explain it while developing. Sometimes they get messy but if you use the right instructions/persona/skills then you will get very good resultsA final review from experienced developers is always recommended
kundi40 days ago
It sounds like an interesting experiment that you're doing. Are there any plans to support cli mode? Many developer are reluctant of VS code and other slow IDEs
- bahaAbunojaim40 days ago
  We are actively working on it, hopefully will get something out in Jan with additional providers like cursor added
adiga100542 days ago
I have been using it for some time and it getting better and better with time in many cases it’s giving better output than other tools the comparison is great feature too keep up the good work
- bahaAbunojaim42 days ago
  Thank you so much! Let me know if you face any issues and happy to address it
sorokod42 days ago
Have you tried executing multiple agents on a single model with modified prompts and have them try to reach consensus?That may solve the original problem of paying for three different models.
- bahaAbunojaim42 days ago
 I think you will still pay for 3 times the tokens for a single model rather than 3 but will consolidate payment.I was thinking to make the model choice more dynamic per agent such that you can use any model with any agent and have one single payment for all so you won’t repeat and pay for 3 or more different tools. Is that in line with what you are saying ?
 - sorokod42 days ago
 Neither the original issue (having three models) nor this one (un consolidated payments) have anything to do with the end result / quality of the output.Can you comment on that?
 - bahaAbunojaim42 days ago
 Executing multiple agents on the same model also works.I find it helpful to even change the persona of the same agent “the prompt” or the model the agent is using. These variations always help but I found having multiple different agents with different LLMs in the backend works better
 - markab2142 days ago
 I love where you're going with this. In my experience it's not about a different persona, it's about constantly considering context that triggers, different activations enhance a different outcome. You can achieve the same thing, of course by switching to an agent with a separate persona, but you can also get it simply by injecting new context, or forcing the agent to consider something new. I feel like this concept gets cargo-culted a little bit.I personally have moved to a pattern where i use mastra-agents in my project to achieve this. I've slowly shifted the bulk of the code research and web research to my internal tools (built with small typescript agents).. I can now really easily bounce between different tools such as claude, codex, opencode and my coding tools are spending more time orchestrating work than doing the work themselves.
 bahaAbunojaim42 days ago
 Thank you and I do like the mantra-agents concept as well and would love to explore adding something similar in the future such that you can quickly create subagents and assign tasks to them
 - sorokod42 days ago
 (BTW, givent token cashing your argument of 3 x 1 = 1 x 3 deserves more scrutiny)
 - bahaAbunojaim42 days ago
 That might be true but if you change the system instructions “which is at the beginning of the prompt” then caching doesn’t hit. So different agents would most likely skip caching unless the last prompt is different then you get the benefit of caching indeed
- mmaunder42 days ago
 Yeah having codex eval its own commits is highly effective. For example.
 - bahaAbunojaim42 days ago
 I agree, I find it very helpful to ask agents to think using a different persona too
taf242 days ago
For me when it’s front end I usually work with Claude and have codex review. Otherwise I just work with codex… Claude also if I’m being lazy and want a thing quickly
- bahaAbunojaim42 days ago
  Gemini is also great at frontends nowadays. I think every agent does have strengths and capabilities
NicoJuicy42 days ago
Sounds very similar to LLM council<a href="https://github.com/karpathy/llm-council" rel="nofollow">https://github.com/karpathy/llm-council</a>
- bahaAbunojaim42 days ago
 Thanks for sharing, I will check it out
omarkoudsi41 days ago
I feel this is quite needed. I am beginner vibe coder and have already felt the need for this. I constantly shift back and forth.
- bahaAbunojaim41 days ago
  Thank you so much and would love to hear your feedback anytime
justatdotin42 days ago
multi-agent collaboration on planning is definitely really valuable. I lean in to gemini's long context and have it set up as a long-term observer who I consult about overall direction, project philosophy, patterns in fail and success, and prioritisation. This gives a different perspective from which to assess other agents' plans.
- bahaAbunojaim41 days ago
  Very true, Claude also tend to struggle around the context window limit and after compact
dunkmaster42 days ago
Any benchmarks? For example vs a single model?
- bahaAbunojaim42 days ago
  It would be great if the community can run some benchmarks and post it on the repo, planning to do that sometime in Jan
RobotToaster42 days ago
That sounds like it could get expensive?
- bahaAbunojaim42 days ago
 Not if you optimize the tokens used. This is what DeepMyst actually do, one of the things we offer is token optimization where we can reduce up to 80% of the context so even if you use twice the optimized context you will end up with 60% less tokens.Note that this functionality is not yet integrated with Mysti but we are planning to add it in the near future and happy to accelerate.I think token optimization will help with larger projects, longer context and avoiding compact.
tomsmithtld42 days ago
the "full" mode where agents critique each other seems more interesting than quick synthesis. curious whether you've seen cases where the debate produces something neither model would've suggested alone?
- bahaAbunojaim41 days ago
  I was working on a project where I tried Claude code to optimize processing of taichi Kernel and it kept using structure that didn’t work with taichi lang limitations so it kept going on a loop, did the same with codex and faced the same issue then tried to have both agents discuss it and it worked! It saved me several hours
bahaAbunojaim42 days ago
UPDATE: License is now MIT! Super excited to see your contributions and feedback!
Alifatisk42 days ago
This reminds me a lot of eye2.ai, but outside of coding
- bahaAbunojaim42 days ago
  I will check it out indeed. What is common between the two?
  - Alifatisk42 days ago
    I guess both consult multiple llms and draw conclusion from them to cover blindspots
    - bahaAbunojaim42 days ago
      I think the main difference is that Mysti consults with agents rather than the underlying LLM and in the future potentially the agents can switch LLMs as well
ekropotin42 days ago
How it’s different from PAL MCP (ex ZEN MCP)?
- bahaAbunojaim42 days ago
  With an MCP the agent needs to write the context to be passed to the MCP then the MCP would run the underlying CLI with that context. Mysti works differently by sharing context directly with the CLIs.
p1esk42 days ago
Why limit to 2 agents? I typically use all 3.
- bahaAbunojaim42 days ago
 Planning to make it work without that limit, did that to avoid complexity but contributions are welcomeI think once I add cursor and cline then will also try to make it work with any number of agents
nickphx42 days ago
how would using multiple services that are incapable of performing the work correctly result in better work?
- bahaAbunojaim42 days ago
  This follows a concept called wisdom of the crowd
lostmsu38 days ago
Huh? I just put something along the lines of: "use Gemini (`gemini <query>`) and Claude (`claude -p <query>`) for design review before trying to implement anything and for code review before reporting task completion" in a user-wide AGENTS.md
matt321042 days ago
For only 3x the cost
- bahaAbunojaim42 days ago
  Not if you optimize the context