A workflow I find useful is to have multiple CLI agents running in different Tmux panes and have one consult/delegate to another using my Tmux-CLI [1] tool + skill. Advantage of this is that the agents’ work is fully visible and I can intervene as needed.<p>[1] <a href="https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#tmux-cli-terminal-automation" rel="nofollow">https://github.com/pchalasani/claude-code-tools?tab=readme-o...</a>
Have you considered using their command line options instead? At least Codex and Claude both support feeding in new prompts in an ongoing conversation via the command line, and can return text or stream JSON back.
You mean so-called headless or non-interactive mode? Yes I’ve considered that but the advantage communication via Tmux panes is that all agent work is fully visible and you can intervene as needed.<p>My repo has other tools that leverage such headless agents; for example there’s a resume [1] functionality that provides alternatives to compaction (which is not great since it always loses valuable context details):
The “smart-trim” feature uses a headless agent to find irrelevant long messages for truncation, and the “rollover” feature creates a new session and injects session lineage links, with a customizable extraction of context for the task to be continued.<p>[1] <a href="https://github.com/pchalasani/claude-code-tools?tab=readme-ov-file#resume-options--managing-context" rel="nofollow">https://github.com/pchalasani/claude-code-tools?tab=readme-o...</a>
I have a similar workflow except I haven’t put time into the tooling - Claude is adept at TMUX and it can almost even prompt and respond to ChatGPT except it always forgets to press Enter when it sends keys. Have your agents been able to communicate with each other with tmux send-keys?
I had the same issue. Subagents are nice but the LLM calling them can’t have a back and forth conversation. I tried tmux-cli and even other options like AgentAPI[0] but the same issue persists, the agent can’t have a back and forth with the tmux pane.<p>To people asking why would you want Claude to call Codex or Gemini, it’s because of orchestration. We have an architect skill we feed the first agent. That agent can call subagents or even use tmux and feed in the builder skill. The architect is harnessed to a CRUD application just keeping track of what features were built already so the builder is focused on building only.<p>0. <a href="https://github.com/coder/agentapi" rel="nofollow">https://github.com/coder/agentapi</a>
Yes this and other edge cases is why I made the Tmux-CLI wrapper. Yes they use send-keys with suitable delays etc
What are you asking/expecting Claude to do with tmux?
I find that asking Claude to develop and Codex to review the uncommitted changes will typically result in high-value code, and eliminate all of Claude’s propensity to perpetually lie and cheat. Sometimes I also ideate with Claude and then ask Claude to get ChatGPT’s opinion on the matter. I started by copy-pasting responses but I found tmux to be a nice way to get rid of the middleman.
This is cool, if Codex or Gemini CLI is supported it would be good to have a section in the readme indicating shortcomings etc (may have missed)
The idea works well with or without direct integration. You can have a cli agent read arbitrary state of any tmux session and have it drive work through it. I use it for everything from dev work to system debugging. It turns out a portable and callable binary with simple parameters is still easier to use for agents than protocols and skills: <a href="https://github.com/tikimcfee/gomuxai" rel="nofollow">https://github.com/tikimcfee/gomuxai</a>
There’s no special support needed; it’s just a bash command that any CLI agent can use. For agents that have skills, the corresponding skill helps leverage more easily. I’ll add that to the README
Claude code, Gemini and codex are all supported but need more testing so I would really value the feedback, bug reports and contributions as well :D<p>Contributions will be highly appreciated and credited
I will look it up indeed
What prompted you to build this?
I used to get stuck sometimes with Claude and needing a different agent to take a look and the switch back and forth between those agents is a headache and also you won’t be able to port all the context so thought this might help solve real blockers for many devs on larger projects
I have both Codex and Claude subs so I wanted one to be able to consult the other. Also it’s useful when you have a cli script that an agent is iterating on, so it can test it. Another use case is for a CLI agent to run a debugger like PDB in another pane, though I haven’t used it much.
I've had good success with a similar workflow, most recently using it to help me build out a captive-wifi debugger[0]. In short, it worked _pretty_ well, but it was quite time intensive. That said, I think removing the human from the loop would have been insanity on this: lots of situations where there were some very poor ideas suggested that the other LLMs went along with, and others where one LLM was the sole voice of reason against the other two.<p>I think my only real take-away from all of it was that Claude is probably the best at prototyping code, where Codex make a very strong (but pedantic) code-reviewer. Gemini was all over the place, sometimes inspired, sometimes idiotic.<p>0: <a href="https://github.com/pjlsergeant/captive-wifi-tool/tree/main" rel="nofollow">https://github.com/pjlsergeant/captive-wifi-tool/tree/main</a>
This is exactly why I built Mysti because I used that flow very often and it worked well, I also added personas and skills so that it is easy to customize the agents behavior and if you have any ideas to make the behavior better then please don’t hesitate to share! Happy to jump on a call and discuss it as well
Getting feedback on a plan or implementation is valuable because you get a fresh set of eyes. Using multiple models <i>may</i> help though it always feels a bit silly to me (if nothing else you’re increasing non-determinism because you know have to understand 2 LLM’s quirks).<p>But the “playing house” approach of experts is somewhere between pointless and actively harmful. It was all the rage in June and I thought people abandoned that later in the summer.<p>If you want the model to eg review code instead of fixing things, or document code without suggesting improvements (for writing docs), that’s useful. But there’s. I need for all these personas.
I created a simple skill in Claude Code CLI that collaborates with Codex CLI. It is just a prompt saved in the skill format. It uses subagents as well.<p>Honest question. How is Mysti better than a simple Claude skill that does the same work?
The skill would allow Claude Code CLI to call Codex CLI but then Claude Code CLI will need to pass context to Codex which would require writing the context "which causes latency" and this process of writing the context will provide limited context to Codex and also eat up from the main context window. Mysti shares the context which is very different from passing context as a parameter.
Could you share your skill and workflow? does claude launch codex in a tmux session?
Codex CLI can run as MCP server ootb which you can call directly from Claude code. Together with a prompt to ask codex for a second opinion, that works very well for me, especially in code reviews.
Interesting, I was trying to implement this using AGENTS.md and the runSubagent tool in vscode. Vscode has not yet the capability to invoke different models as subagent so I plan to fallback to instructing copilot to use copilot-cli and gemini-cli. (I am quite angry about copilot CLI offering only full blown models and not the -mini versions though)
Why make it a vscode extension if the point of these 3 tools is a cli interface? Meaning most of the people I know use these tools without VSCode. Is VSC required?
> Meaning most of the people I know use these tools without VSCode.<p>I guess it depends?<p>You can usually count on Claude Code or Codex or Gemini CLI to support the model features the best, but sometimes having a consistent UI across all of them is also nice - be it another CLI tool like OpenCode (that was a bit buggy for me when it came to copying text), or maybe Cline/RooCode/KiloCode inside of VSC, so you don't also have to install a custom editor like Cursor but can use your pre-existing VSC setup.<p>Okay, that was a bit of a run on sentence, but it's nice to be able to work on some context and then to switch between different models inline: "Hey Sonnet, please look at the work of the previous model up until this point and validate its findings about the cause of this bug."<p>I'd also love it if I could hook up some of those models (especially what Cerebras Code offers) with autocomplete so I wouldn't need Copilot either, but most of the plugins that try to do that are pretty buggy or broken (e.g. Continue.dev). KiloCode also added autocomplete, but it doesn't seem to work with BYOK.
That’s a great idea! I can make it a CLI too
Huh. I know hundreds that use LLMs in a VSCode based IDE, and 3 that use the CLI.
I was a proponent initially of CLI when Claude integration with VSCode required a WSL instance, but now that it is integrated directly into VSCode I feel one grouping of tooling hiccups is now ruled out in my workflow. The only (major) nitpick I have is that it wont let you finish typing and cuts you off when asking whether/how to proceed.
I’ve never seen a profession change so fast as coding right now
Have to keep in mind that what is happening now is basically what was promised decades ago. Never mind 4GL, 5GL, expert systems, and other efforts that went nowhere... even COBOL was created with the intention of making programming look more like natural language.<p>Often, revolutions take longer to happen than we think they will, and then they happen faster than we think they will. And when the tipping point is finally reached, we find more people pushing back than we thought there would be.
>Claude Code (Anthropic), Codex (OpenAI), and Gemini (Google) have different training, different strengths, and different blind spots.<p>Do they?<p>There was a paper about HiveMind in LLMs. They all tend to produce similar outputs when they are asked open ended questions.
[2510.22954] Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) <a href="https://share.google/1GHdUvhz2uhF4PVFU" rel="nofollow">https://share.google/1GHdUvhz2uhF4PVFU</a>
I usually switch agents when one agent get stuck and I faced several situations where one agent solved a problem that the other agent was stuck on
Website link on Github points to <a href="https://deepmyst.com/" rel="nofollow">https://deepmyst.com/</a><p>But actually hosted on <a href="https://www.deepmyst.com/" rel="nofollow">https://www.deepmyst.com/</a> with no forwarding from the Apex domain to www so it looks like the website is down.<p>Otherwise excited to deep dive into this as this is a variant of how we do development and seems to work great when the AI fights each other.
UPDATE: License is now MIT! Super excited to see your contributions and feedback!
Anyone knows of something similar but for terminal?<p>Update:<p>I've already found a solution based on a comment, and modified it a bit.<p>Inside claude code i've made a new agent that uses the MCP gemini through <a href="https://github.com/raine/consult-llm-mcp" rel="nofollow">https://github.com/raine/consult-llm-mcp</a>. this seems to work!<p>Claude code:<p>Now let me launch the Gemini MCP specialist to build the backend monitoring server:<p>gemini-mcp-specialist(Build monitoring backend server)
⎿ Running PreToolUse hook…
My similar workflow within Claude Code when it gets stuck is to have it consult Gemini. Works either through Gemini CLI or the API. Surprisingly powerful pattern because I've just found that Gemini is still ahead of Opus in architectural reasoning and figuring out difficult bugs. <a href="https://github.com/raine/consult-llm-mcp" rel="nofollow">https://github.com/raine/consult-llm-mcp</a>
<a href="https://github.com/just-every/code" rel="nofollow">https://github.com/just-every/code</a> "Every Code - push frontier AI to it limits. A fork of the Codex CLI with validation, automation, browser integration, multi-agents, theming, and much more. Orchestrate agents from OpenAI, Claude, Gemini or any provider." Apache 2.0 ; Community fork;
> Note: If another tool already provides a code command (e.g. VS Code), our CLI is also installed as coder. Use coder to avoid conflicts.<p>“If”, oh, idk, just the tool 90% of potential users will have installed.
When you say orchestrate agents then what it would do? Would it allow the same context across agents and can I make agents brainstorm?
<p><pre><code> # Plan code changes (Claude, Gemini and GPT-5 consensus)
# All agents review task and create a consolidated plan
/plan "Stop the AI from ordering pizza at 3AM"
# Solve complex problems (Claude, Gemini and GPT-5 race)
# Fastest preferred (see https://arxiv.org/abs/2505.17813)
/solve "Why does deleting one user drop the whole database?"
# Write code! (Claude, Gemini and GPT-5 consensus)
# Creates multiple worktrees then implements the optimal solution
/code "Show dark mode when I feel cranky"
# Hand off a multi-step task; Auto Drive will coordinate agents and approvals
/auto "Refactor the auth flow and add device login"</code></pre>
Here's a portable binary you drop in a directory to allow agentic cli to cross communicate with other agents, store and read state, or act as the driver of arbitrary tmux sessions in parallel: <a href="https://github.com/tikimcfee/gomuxai" rel="nofollow">https://github.com/tikimcfee/gomuxai</a>
<a href="http://opencode.ai/" rel="nofollow">http://opencode.ai/</a>
I can make it for the terminal if that would be helpful, what do you think?
Pal MCP (formerly Zen) is pretty awesome.<p><a href="https://github.com/BeehiveInnovations/pal-mcp-server" rel="nofollow">https://github.com/BeehiveInnovations/pal-mcp-server</a>
Great idea. Whether brainstorm mode is actually useful is hard to say without trying it out, but it sounds like an interesting approach. Maybe it would be a good idea to try running a SWE benchmark with it.<p>Personally, I wouldn't use the personas. Some people like to try out different modes and slash commands and whatnot - but I am quite happy using the defaults and would rather (let it) write more code than tinker with settings or personas.
How it’s different from PAL MCP (ex ZEN MCP)?
> Is multi-agent collaboration actually useful or am I just solving my own niche problem?<p>I often write with Claude, and at work we have Gemini code reviews on GitHub; definitely these two catch different things. I'd be excited to have them working together in parallel in a nice interface.<p>If our ops team gives this a thumbs-up security wise I'll be excited to try it out when back at work.
How do we measure this is any better than just using 1 good model?
Anecdotal experience, but when bugfixing I personally find if a model introduces a bug, it has a hard time spotting and fixing it, but when you give the code to another model it can instantly spot it (even if it's a weaker model overall).<p>So I can well imagine that this sort of approach could work very well, although agree with your sentiment that measurement would be good.
One day someone will actually build something with an LLM and do a write-up of it, but until then we'll just keep reading about tooling.
Multi agent collaboration is quite likely the future. All agents have blind spots, collaboration is how they are offset.<p>You may want to study [1] - this is the latest thinking on agent collaboration from Google.<p>[1] <a href="https://www.linkedin.com/posts/shubhamsaboo_we-just-ran-the-biggest-ai-agent-course-ever-activity-7395301281117929472-vRRl" rel="nofollow">https://www.linkedin.com/posts/shubhamsaboo_we-just-ran-the-...</a>
> Multi agent collaboration is quite likely the future<p>Autogen from ms was an early attempt at this, and it was fun to play with it, but too early (the models themselves kinda crapped out after a few convos). This would work much better today with how long agents can stay on track.<p>There was also a finding earlier this year, I believe from the swe-bench guys (or hf?), where they saw better scores with alternating between gpt5/sonnet4 after each call during an execution flow. The scores of alternating between them were higher than any of them individually. Found that interesting at the time.
Thank you so much for sharing Denis! I definitely believe in the that as the world start switching from single agent to agentic teams where each agent does have specific capabilities. do you know of any benchmarks that covers collaborative agents ?
For me when it’s front end I usually work with Claude and have codex review. Otherwise I just work with codex… Claude also if I’m being lazy and want a thing quickly
This is very useful! I frequently copy the response of one model and ask another to review it and I have seen really good results with that approach.<p>Can you also include Cursor CLI for the brainstorming? This would allow someone to unlock brainstorming with just one CLI since it allows to use multiple models.
licensing with BSL when basically every month the AI world is changing is not a smart decision.
The project is now MIT!
Thinking of switching to MIT, what do you think? Is there any other license you would recommend ?
> licensing with BSL when basically every month the AI world is changing is not a smart decision<p>This turned me off as well. Especially with no published pricing and a link to a site that is not about this product.<p>At minimum, publish pricing.
Regarding DeepMyst. In the future will offer “optionally” the ability to use smart context where the context will be automatically optimized such that you won’t hit the context window limit “ basically no need for compact” and you would get much higher usage limits because the number of tokens needed will be reduced by up to 80% so you would be able to achieve with a 20 USD claude plan the same as the Pro plan
It is free and open source. Will make it MIT
> Would love feedback on the brainstorm mode. Is multi-agent collaboration actually useful or am I just solving my own niche problem?<p>If it's solving even your own niche problem, it is actually useful though right? Kind of a "yes or yes" question.
I have been using it for some time and it getting better and better with time in many cases it’s giving better output than other tools the comparison is great feature too keep up the good work
Have you tried executing multiple agents on a single model with modified prompts and have them try to reach consensus?<p>That may solve the original problem of paying for three different models.
I think you will still pay for 3 times the tokens for a single model rather than 3 but will consolidate payment.<p>I was thinking to make the model choice more dynamic per agent such that you can use any model with any agent and have one single payment for all so you won’t repeat and pay for 3 or more different tools. Is that in line with what you are saying ?
Yeah having codex eval its own commits is highly effective. For example.
For only 3x the cost
That sounds like it could get expensive?
Not if you optimize the tokens used. This is what DeepMyst actually do, one of the things we offer is token optimization where we can reduce up to 80% of the context so even if you use twice the optimized context you will end up with 60% less tokens.<p>Note that this functionality is not yet integrated with Mysti but we are planning to add it in the near future and happy to accelerate.<p>I think token optimization will help with larger projects, longer context and avoiding compact.
Any benchmarks? For example vs a single model?
This reminds me a lot of eye2.ai, but outside of coding
Why limit to 2 agents? I typically use all 3.
Sounds very similar to LLM council<p><a href="https://github.com/karpathy/llm-council" rel="nofollow">https://github.com/karpathy/llm-council</a>
how would using multiple services that are incapable of performing the work correctly result in better work?