Show HN: Axe – A 12MB binary that replaces your AI framework

(github.com)

131 points by jrswab9 hours ago

36 comments

CraigJPerry14 minutes ago
I've had good success with something along these lines but perhaps a bit more raw:<pre><code> - claude takes a -p option - i have a bunch of tiny scripts, each script is an agent but it only does one tiny task - scripts can be composed in a unix pipeline </code></pre> For example:<pre><code> $ git diff --staged | ai-commit-msg | git commit -F - </code></pre> Where ai-commit-msg is a tiny agent:<pre><code> #!/usr/bin/env bash # ai-commit-msg: stdin=git diff, stdout=conventional commit message # Usage: git diff --staged | ai-commit-msg set -euo pipefail source "${AGENTS_DIR:-$HOME/.agents}/lib/agent-lib.sh" SYSTEM=$(load_skills \ core/unix-output.md \ core/be-concise.md \ domain/git.md \ output/plain-text.md) SYSTEM+=$'\n\nTask: Given a git diff on stdin, output a single conventional commit message. One line only.' run_agent "$SYSTEM" </code></pre> And you can see to keep the agents themselves tiny, they rely on a little lib to load the various skills and optionally apply some guard / post-exec validator. Those validators are usually simple grep or whatever to make sure there were no writes outside a given dir but sometimes they can be to enforce output correctness (always jq in my examples so far...). In theory the guard could be another claude -p call if i needed a semantic instruction.
anotherevan19 minutes ago
I really like this idea. Gonna need an "Awesome Axe" page that collects agents.One idea I'm thinking of is, after an agent has been in use for a while, and built up and understanding of the task, would be something like, "Write a Python script to replace this agent."I could imagine this would work with agents that are processing log files or other semi-structured data for example.
bensyverson8 hours ago
It's exciting to see so much experimentation when it comes to form factors for agent orchestration!The first question that comes to mind is: how do you think about cost control? Putting a ton in a giant context window is expensive, but unintentionally fanning out 10 agents with a slightly smaller context window is even more expensive. The answer might be "well, don't do that," and that certainly maps to the UNIX analogy, where you're given powerful and possibly destructive tools, and it's up to you to construct the workflow carefully. But I'm curious how you would approach budget when using Axe.
- jrswab8 hours ago
 > how you would approach budget when using AxeGreat question and it's something that I've not dig into yet. But I see no problem adding a way to limit LLMs by tokens or something similar to keep the cost for the user within reason.
bmurphy19761 hour ago
This is interesting. I'd be curious to see a bunch more working examples. Personally I like the chat model because I iterate heavily on planning specs and have a lot of back and forth before implementation.I could see using this once the plan is defined and switching back to chat while iterating on post-implementation cleanup and refactoring.
snadal2 hours ago
Nice! I’ll try this soon, and I’m afraid I’ll end up using it a lot.@jrswab, do you think it would be feasible to limit outgoing connections to a whitelist of domains, URLs, or IP addresses?I’d like to automate some of my email, calendar, or timesheet tasks, but I’m concerned that a prompt injection could end up exfiltrating or deleting data. In fact, that’s the main reason why I’m not using Openclaw or similar projects with real data yet.
- jrswab1 hour ago
 Yes, I think it will be quite trivial to make a output allow list. That's a great idea!
Multicomp3 hours ago
This is what I've been trying to get nanobot to do, so thanks for sharing this. I plan to use this for workflow definitions like filesystems.I have a known workflow to create an RPG character with steps, lets automate some of the boilerplate by having a succession of LLMs read my preferences about each step and apply their particular pieces of data to that step of the workflow, outputting their result to successive subdirectories, so I can pub/sub the entire process and make edits to intermediate files to tweak results as I desire.Now that's cool!
- jrswab1 hour ago
 Love to hear it! Thanks for checking it out and feel free to put up an issue on GitHub if you have any ideas for improvements.
boznz4 hours ago
I will give it a try, I like the idea of being closer to the metal.A Proper self-contained, self improving AI@home with the AI as the OS is my end goal, I have a nice high spec but older laptop I am currently using as a sacrificial pawn experimenting with this, but there is a big gap in my knowledge and I'm still working through GPT2 level stuff, also resources are tight when you're retired. I guess someone will get there this year the way things are going, but I'm happy to have fun until then.
- jrswab1 hour ago
 I'm excited to see how this plays out. Keep me updated on x(twitter)
armcat8 hours ago
Great work! Kind of reminds me of ell (<a href="https://github.com/MadcowD/ell" rel="nofollow">https://github.com/MadcowD/ell</a>), which had this concept of treating prompts as small individual programs and you can pipe them together. Not sure if that particular tool is being maintained anymore, but your Axe tool caters to that audience of small short-lived composable AI agents.
- jrswab8 hours ago
 Thanks for checking it out! And yes the tool is indeed catering to that crowed. It's a need I have and thought others could use it as well.
swaminarayan7 hours ago
Axe treats LLM agents like Unix programs—small, composable, version-controllable. Are we finally doing AI the Unix way?
- jrswab7 hours ago
  That's my dream.
  - kelvinn4 hours ago
    Dream, or _pipe_dream?
mccoyb3 hours ago
Cool work!Aside but 12 MB is ... large ... for such a thing. For reference, an entire HTTP (including crypto, TLS) stack with LLM API calls in Zig would net you a binary ~400 KB on ReleaseSmall (statically linked).You can implement an entire language, compiler, and a VM in another 500 KB (or less!)I don't think 12 MB is an impressive badge here?
- ipython3 hours ago
 it's written in golang. 12MB barely gets you "hello world" since everything is statically linked. With that in mind, the size is impressive.
 - nuxi1 hour ago
 golang doesn't statically link everything by default (anymore?), this is from FreeBSD:<pre><code> $ ls -l axe -rwxr-xr-x 1 root wheel 12830781 Mar 12 22:38 axe* $ ldd axe axe: libthr.so.3 => /lib/libthr.so.3 (0xe2e74a1d000) libc.so.7 => /lib/libc.so.7 (0xe2e74c27000) libsys.so.7 => /lib/libsys.so.7 (0xe2e75de6000) [vdso] (0xe2e7366b000)</code></pre>
 - mccoyb3 hours ago
 I know off topic, but is that mostly coming from the Go runtime (how large is that about?)
- nine_k2 hours ago
 12 MB is not large; it's like 3 minutes of watching YouTube. Actual RAM consumption is only very weakly correlated to the binary size, and that's what matters.
 - mccoyb31 minutes ago
 It is large compared to a stripped Zig ReleaseSmall binary with no runtime. With agents, one can take this repo, and create an extremely small binary.To your point, why even advertise the number? If that particular number is completely irrelevant in practical usage, why mention it? It seems like the point is to impress, hence my response.
stpedgwdgfhgdd3 hours ago
“ MCP support. Axe can connect any MCP server to your agents”I just don't see this in the readme… It is not in the Features section at least.Anyway, i have MCP server that can post inline comments into Gitlab MR. Would like to try to hook it up to the code reviewer.
- jrswab1 hour ago
 Sorry, I need to update that. I just added MCP support a day or so ago.
btbuildem7 hours ago
I really like seeing the movement away from MCP across the various projects. Here the composition of the new with the old (the ol' unix composability) seems to um very nicely.OP, what have you used this on in practice, with success?
- jrswab7 hours ago
 I've shared a few flows I use a lot right now in some other comments.
hmokiguess2 hours ago
looks really cool, how does it differ from something like running claude headless with `claude -p`?
- jrswab1 hour ago
  You don't have all the Claude Code overhead. It only gets what you give it.
  - hmokiguess20 minutes ago
    what do you mean by that, not sure I understand
reacharavindh7 hours ago
Reminded me of this from my bookmarks.<a href="https://github.com/chr15m/runprompt" rel="nofollow">https://github.com/chr15m/runprompt</a>
hamandcheese7 hours ago
> Each agent is a TOML config with a focused job. Such as code reviewer, log analyzer, commit message writer. You can run them from the CLI, pipe data in, get results out.I'm a bit skeptical of this approach, at least for building general purpose coding agents. If the agents were humans, it would be absolutely insane to assign such fine-grained responsibilities to multiple people and ask them to collaborate.
- Zondartul4 hours ago
 It is easier to trust in the correctness and reliability of an LLM when you treat it as a glorified NLP function with a very narrow scope and limited responsibilities. That is to say, LLMs rarely mess up specific low level instructions, compared to open-ended, long-horizon tasks.
- hiccuphippo7 hours ago
 Clankers are not humans.
 - cweagans4 hours ago
 This is the second time I've seen somebody use the word "clankers" in the last couple days to refer to AI. Is that a thing now? Where'd that come from?Gonna be honest, it has taken away from the message both times I've seen it. It feels a bit like you're LARPing your favorite humans vs robots tv show.
 - MisterTea1 hour ago
 I've been hearing the term in IRC and discords for about a year or more already.I get that it can seem childish but when you compare that to the indolent people who are demanding AI, it cancels out.
 - anigbrowl4 hours ago
 It is a thing, i've been hearing it for at least 6 months. There's a lot of people who really hate AI and want nothing to do with it.
 - JadeNB4 hours ago
 You can find the answers to both of your questions on Wikipedia: <a href="https://en.wikipedia.org/wiki/Clanker" rel="nofollow">https://en.wikipedia.org/wiki/Clanker</a>
punkpeye8 hours ago
What are some things you've automated using Axe?
- jrswab7 hours ago
 I have a few flows I'm using it for and have a growing list of things I want to automate. Basically, if there is a process that takes a human to do (like creating drafts or running scripts with variable data) I make axe do it.1. I have a flow where I pass in a youtube video and the first agent calls an api to get the transcript, the second converts that transcript into a blog-like post, and the third uploads that blog-like post to instapaper.2. Blog post drafting: I talk into my phone's notes app which gets synced via syncthing. The first agent takes that text and looks for notes in my note system for related information, than passes my raw text and notes into the next to draft a blog post, a third agent takes out all the em dashes because I'm tired of taking them out. Once that's all done then I read and edit it to be exactly what I want.
 - _ache_29 minutes ago
 Aren't your Hackernews answers automatised?
eikenberry3 hours ago
Does it support the use of other OpenAI API compatible services like Openrouter?
- jrswab1 hour ago
  Yes, I've used it with on OpenAI compatible API from an internal LLM at my job.
  - eikenberry1 hour ago
    Thanks!
0xbadcafebee8 hours ago
Nice. There's another one also written in Go (<a href="https://github.com/tbckr/sgpt" rel="nofollow">https://github.com/tbckr/sgpt</a>), but i'll try this one too. I love that open source creates multiple solutions and you can choose the one that fits you best
- jrswab7 hours ago
 Thanks! Looks like sgpt is a cool tool. Axe is oriented around automation rather than interaction like sgpt. Instead of asking something you define it once and hook it into a workflow.
mark_l_watson8 hours ago
If I have time I want to try this today because it matches my LLM-based work style, especially when I am using local models: I have command line tools that help me generated large one-shot prompts that I just paste into an Ollama repl - then I check back in a while.It looks like Axe works the same way: fire off a request and later look at the results.
- jrswab8 hours ago
 Exactly! I also made it to chain them together so each agent only gets what it needs to complete its one specific job.
Orchestrion7 hours ago
The Unix-style framing resonates a lot.One thing I’ve noticed when experimenting with agent pipelines is that the “single-purpose agent” model tends to make both cost control and reasoning easier. Each agent only gets the context it actually needs, which keeps prompts small and behavior easier to predict.Where it gets interesting is when the pipeline starts producing artifacts instead of just text — reports, logs, generated files, etc. At that point the workflow starts looking less like a chat session and more like a series of composable steps producing intermediate outputs.That’s where the Unix analogy feels particularly strong: small tools, small contexts, and explicit data flowing between steps.Curious if you’ve experimented with workflows where agents produce artifacts (files, reports, etc.) rather than just returning text.
- jrswab7 hours ago
 > Curious if you’ve experimented with workflows where agents produce artifacts (files, reports, etc.) rather than just returning text.Yes! I run a ghost blog (a blog that does not use my name) and have axe produce artifacts. The flow is: I send the first agent a text file of my brain dump (normally spoken) which it then searched my note system for related notes, saves it to a file, then passes everything to agent 2 which make that dump a blog draft and saves it to a file, agent 3 then takes that blog draft and cleans it up to how I like it and saves it. from that point I have to take it to publish after reading and making edits myself.
 - Orchestrion7 hours ago
 That’s a really nice pipeline. The “save to file between steps” pattern seems to appear very naturally once agents start doing multi-stage work.One thing I’ve noticed when experimenting with similar workflows is that once artifacts start accumulating (drafts, logs, intermediate reports, etc.), you start running into small infrastructure questions pretty quickly:– where intermediate artifacts live – how later agents reference them – how long they should persist – whether they’re part of the workflow state or just temporary outputsFor small pipelines the filesystem works great, but as the number of steps grows it starts to look more like a little dataflow system than just a sequence of prompts.Do you usually just keep everything as local files, or have you experimented with something like object storage or a shared artifact layer between agents?
 - 33715 hours ago
 In my prompting framework I have a workflow that the agent would scan all the artifacts in my closed/ folder and create a yyyymmdd-archive artifact which records all artifact name and their summaries, then just delete them. Since the framework is deeply integrated with git, the artifact can be digged up from git history via the recorded names.
 - jskxkakjxjs6 hours ago
 [dead]
dumbfounder7 hours ago
Now what we need is a chat interface to develop these config files.
TSiege8 hours ago
This looks really interesting. I'm curious to learn more about security around this project. There's a small section, but I wonder if there's more to be aware of like prompt injection
- jrswab7 hours ago
 I'm happy you brought this up. I've been thinking about this and working on a plan to make it as solid as possible. For now, the best way would be to run each agent in a docker container (there is an example Dockerfile in the repo) so any destructive actions will be contained to the container.However, this does not help if a person gives access to something like Google Calendar and a prompt tells the LLM to be destructive against that account.
creehappus6 hours ago
I really like the project, although I would prefer a json5 config, not toml, which I find annoying to reason about.
jedbrooke8 hours ago
looks interesting, I agree that chat is not always the right interface for agents, and a LLM boosted cli sometimes feels like the right paradigm (especially for dev related tasks).how would you say this compares to similar tools like google’s dotprompt? <a href="https://google.github.io/dotprompt/getting-started/" rel="nofollow">https://google.github.io/dotprompt/getting-started/</a>
- jrswab7 hours ago
 I've not heard of that before but after looking into it I think they are solving different problems.Dotprompt is a promt template that lives inside app code to standardize how we write prompts.Axe is an execution runtime you run from the shell. There's no code to write (unless you want the LLM to run a script). You define the agent in TOML and run with `axe run <agent name> and pipe data into it.
nthypes8 hours ago
There is no "session" concept?
- jrswab8 hours ago
  Not yet but is on the short list to implement. What would you need from a session for single purpose agents? I'm seeing it more as a way to track what's been done.
a1o8 hours ago
Is the axe drawing actually a hammer?
- hundchenkatze8 hours ago
 Looks like an axe to me. The cutting edge of the axe is embedded into the surface. And the handle attaches near the back of the head like an axe. Most hammers I've seen the handle attaches in the middle.
 - jrswab8 hours ago
 hahaha; this is what I was going for.
 - jjshoe8 hours ago
 Just FYI, your handle is on backwards.
- devmor8 hours ago
 I believe it's actually trying to render a splitting maul, which people often confuse for an axe.
 - daveguy4 hours ago
 Splitting mauls have a wider angle to help separate wood pieces and a beefier back to use with/as a sledgehammer or splitting wedge. What's rendered is definitely more like an axe than a splitting maul.
 - devmor3 hours ago
 What you're describing is exactly what I see in the image.
 - daveguy1 hour ago
 Fair enough. Hard to tell one way or another with all the "action" marks.
- parineum8 hours ago
 There are many different styles of axe and some don't flair out much.[0]<a href="https://inchbyinch.de/wp-content/uploads/2017/08/0400-axe-types.jpeg" rel="nofollow">https://inchbyinch.de/wp-content/uploads/2017/08/0400-axe-ty...</a>[1]<a href="https://i.pinimg.com/originals/da/14/80/da148078cc1478ec6b255d3f5583e2a3.jpg" rel="nofollow">https://i.pinimg.com/originals/da/14/80/da148078cc1478ec6b25...</a>
- fortyseven8 hours ago
 Sure is. How weird.
testingtrade1 hour ago
amazing work my friend
saberience7 hours ago
I’m having trouble understanding when/where I would use this? Is this a replacement for pi or codex?
- jrswab7 hours ago
  This is not a replacement for either in my opinion. Apps like codex and pi are interactive but ax is non-interactive. You define an agent once and the trigger it however you please.
let_rec7 hours ago
Is there Gemini support?
- jrswab7 hours ago
  Not yet but it will be easy to add. If you need it can you create an issue in GitHub? I should be able to get that in today.
zrail8 hours ago
Looks pretty interesting!Tiny note: there's a typo in your repo description.
- jrswab7 hours ago
 nooo! lol but thanks, I'll go hunt it down.
longtermemory1 hour ago
[dead]
ozgurozkan8 hours ago
[flagged]
- r_lee8 hours ago
  wow, like 10 posts within 5 minutes, how great! love me some AI slop on HN @dang
  - hrimfaxi7 hours ago
    Yeah ive been going and flagging everything until they're banned. Old account too.
    - jrswab7 hours ago
      Thank you for your service.
    - r_lee7 hours ago
      salute!
ufish2358 hours ago
Why is this comment an ad?
- ForceBru8 hours ago
  This is the OP promoting their project — makes sense to me
- stronglikedan8 hours ago
  How can it be an ad if it's not selling anything? Seems like a proud parent touting their child to me.
  - jrswab7 hours ago
    I am pretty proud of this one :)
- zrail8 hours ago
  It's a Show HN. That's the point.
- lovich7 hours ago
  Because they had an AI write it. Their other comments seem organic but the one you’re responding to does not
Lliora8 hours ago
12MB for an "AI framework replacement"? That's either brilliant compression or someone's redefining "framework" to mean "toy model that works on my laptop." Show me the benchmarks on actual workloads, not the readme poetry.
- jrswab8 hours ago
  This is not an LLM but a Binary to run LLMs as single purpose agents that can chain together.
  - mrweasel7 hours ago
    Yeah I was disappointed by that too.
  - longtermemory1 hour ago
    [dead]
- hrmtst938372 hours ago
  Putting heavy AI workloads in a 12MB binary means you either make savage cuts on model support or you lock users to strange minimal formats. If you care about ops, eventually you hit edge cases where the "just works" story collapses and you end up debugging missing layers or janky hardware support. If the goal is to experiment locally or run demos, 12MB is fine but pretending it fits broader deployment is a stretch unless they're pulling some wild tricks under the hood.