Ternary Bonsai: Top Intelligence at 1.58 Bits

(prismml.com)

53 points by nnx2 days ago

7 comments

armanj45 minutes ago
I did a quick benchmark & compared it with Qwen3.5: <a href="https://github.com/ArmanJR/PrismML-Bonsai-vs-Qwen3.5-Benchmark" rel="nofollow">https://github.com/ArmanJR/PrismML-Bonsai-vs-Qwen3.5-Benchma...</a>in my results, accuracy-wise Ternary-Bonsai-8B is on par with Qwen3.5-4B. But in accuracy-per-byte, bonsai is the clear winner:=> Ternary-Bonsai-1.7B achieved 65.1% from 462 MiB, beating Qwen3.5-0.8B by 12 points while being ~5% smaller on disk. => Ternary-Bonsai-4B is the accuracy-per-byte winner above 1 GiB. 83.0% from only 1.1 GiB, within 2 points of Qwen3.5-4B at 40% of the weight size.they show strong promise on edge devices and where disk space is limited. I think this lab is worth watching.
WatchDog32 minutes ago
All of their benchmarks are against 16 bit models right?Why aren't they comparing to 2/3/4 bit quants?
- himata411311 minutes ago
 looked at quant versions of these models and they all outperform it so I guess it just doesn't look as good.
Animats1 hour ago
This makes sense. The 1-bit model implies needing 2x as many neurons, because you need an extra level to invert. But the ternary model still has a sign, just really low resolution.(I've been reading the MMLU-Redux questions for electrical engineering. They're very funny. Fifty years ago they might have been relevant. The references to the Intel 8085 date this to the mid-1970s. Moving coil meters were still a big thing back then. Ward-Leonard drives still drove some elevators and naval guns. This is supposed to be the hand-curated version of the questions. Where do they get this stuff? Old exams?)[1] <a href="https://github.com/aryopg/mmlu-redux/blob/main/outputs/multi_expert_helm/electrical_engineering.csv" rel="nofollow">https://github.com/aryopg/mmlu-redux/blob/main/outputs/multi...</a>
yodon1 hour ago
So excited to see this - the big advantage of 1.58 bits is there are no multiplications at inference time, so you can run them on radically simpler and cheaper hardware.
- Animats1 hour ago
  At 4 bits, you could just have a hard-wired table lookup. Two 4 bit values in, 256 entry table. You can have saturating arithmetic and a post-processing function for free. Somebody must be building hardware like that.
mchusma2 hours ago
Ever since I saw the first one of these one-bit models made by Microsoft, I thought this was a fascinating route. I assume that in practice, this is less helpful than it seems, just because there's every economic incentive in the world for the big AI labs to produce small, powerful, fast models. None of them seem to be using this technique, so it's interesting, but I suspect it's not quite working.I also have yet to see any of these at a larger scale. For example, can you try one of these at 100 billion parameters?
ericb1 hour ago
This is pretty cool! I would love to see an even larger models shrunk down.If you got that into a couple gigs--what could you stuff into 20 gigs?
wmf2 hours ago
Yet again they're comparing against unquantized versions of other models. They would probably still win but by a much smaller size margin.
- Dumbledumb1 hour ago
  Wouldnt the margin be higher? All other models being moved from unquantized to quantized would lower their performance, while bonsai stays. I get what you see if it was in regards to score/modelsize, but not for absolute performance