7 comments

  • armanj45 minutes ago
    I did a quick benchmark &amp; compared it with Qwen3.5: <a href="https:&#x2F;&#x2F;github.com&#x2F;ArmanJR&#x2F;PrismML-Bonsai-vs-Qwen3.5-Benchmark" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ArmanJR&#x2F;PrismML-Bonsai-vs-Qwen3.5-Benchma...</a><p>in my results, accuracy-wise Ternary-Bonsai-8B is on par with Qwen3.5-4B. But in accuracy-per-byte, bonsai is the clear winner:<p>=&gt; Ternary-Bonsai-1.7B achieved 65.1% from 462 MiB, beating Qwen3.5-0.8B by 12 points while being ~5% smaller on disk. =&gt; Ternary-Bonsai-4B is the accuracy-per-byte winner above 1 GiB. 83.0% from only 1.1 GiB, within 2 points of Qwen3.5-4B at 40% of the weight size.<p>they show strong promise on edge devices and where disk space is limited. I think this lab is worth watching.
  • WatchDog32 minutes ago
    All of their benchmarks are against 16 bit models right?<p>Why aren&#x27;t they comparing to 2&#x2F;3&#x2F;4 bit quants?
    • himata411311 minutes ago
      looked at quant versions of these models and they all outperform it so I guess it just doesn&#x27;t look as good.
  • Animats1 hour ago
    This makes sense. The 1-bit model implies needing 2x as many neurons, because you need an extra level to invert. But the ternary model still has a sign, just really low resolution.<p>(I&#x27;ve been reading the MMLU-Redux questions for electrical engineering. They&#x27;re very funny. Fifty years ago they might have been relevant. The references to the Intel 8085 date this to the mid-1970s. Moving coil meters were still a big thing back then. Ward-Leonard drives still drove some elevators and naval guns. This is supposed to be the hand-curated version of the questions. Where do they get this stuff? Old exams?)<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;aryopg&#x2F;mmlu-redux&#x2F;blob&#x2F;main&#x2F;outputs&#x2F;multi_expert_helm&#x2F;electrical_engineering.csv" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;aryopg&#x2F;mmlu-redux&#x2F;blob&#x2F;main&#x2F;outputs&#x2F;multi...</a>
  • yodon1 hour ago
    So excited to see this - the big advantage of 1.58 bits is there are no multiplications at inference time, so you can run them on radically simpler and cheaper hardware.
    • Animats1 hour ago
      At 4 bits, you could just have a hard-wired table lookup. Two 4 bit values in, 256 entry table. You can have saturating arithmetic and a post-processing function for free. Somebody must be building hardware like that.
  • mchusma2 hours ago
    Ever since I saw the first one of these one-bit models made by Microsoft, I thought this was a fascinating route. I assume that in practice, this is less helpful than it seems, just because there&#x27;s every economic incentive in the world for the big AI labs to produce small, powerful, fast models. None of them seem to be using this technique, so it&#x27;s interesting, but I suspect it&#x27;s not quite working.<p>I also have yet to see any of these at a larger scale. For example, can you try one of these at 100 billion parameters?
  • ericb1 hour ago
    This is pretty cool! I would love to see an even larger models shrunk down.<p>If you got that into a couple gigs--what could you stuff into 20 gigs?
  • wmf2 hours ago
    Yet again they&#x27;re comparing against unquantized versions of other models. They would probably still win but by a much smaller size margin.
    • Dumbledumb1 hour ago
      Wouldnt the margin be higher? All other models being moved from unquantized to quantized would lower their performance, while bonsai stays. I get what you see if it was in regards to score&#x2F;modelsize, but not for absolute performance