6 comments

  • meatmanek5 hours ago
    This model is pretty cool if you don&#x27;t have a GPU - I was able to get I think 20 or 30 tokens per second on CPU (DDR4 ram) alone. (I don&#x27;t remember if that was with q4 or q8.)<p>Otherwise, if you have a GPU with more than like 4GB of VRAM, there are better models. Gemma4 and Qwen3.6 (or Qwen3.5 if you need the smaller dense models that haven&#x27;t yet been released for 3.6) are a good place to start.
  • BoredomIsFun3 hours ago
    LFM models I&#x27;ve tried all seemed to be suffering from serious coherence issues. I found Gemmas the best at tasks requiring rock solid coherent output; even Qwen&#x27;s not comparable.
    • 1dom2 hours ago
      I think context length is important to consider here.<p>I find Gemmas really good for a short conversation with maybe 3 or 4 exchanges of a few paragraphs each, which covers a surprisingly large amount of interactions.<p>For anything longer form though, particularly with larger code contexts, Qwen is far more useful for me personally.<p>I&#x27;m not an expert in this field, but my understanding is Qwen are hybrid gated attention mechanisms, whereas Gemma is hybrid including a sliding attention attention mechanism which makes it look like it favour the most recent tokens a little too much at times.<p>This is all in the context of local quantized models, I&#x27;m aware both have larger cloud variants that wouldn&#x27;t suffer as much.
  • alyxya5 hours ago
    The blog post was published a couple months ago, and it looks like there hasn&#x27;t been a follow-up release with the fully trained model. I&#x27;m not sure if there&#x27;s much to take away from an early checkpoint besides the unique architectural choices they made in their model for faster inference.
    • adrian_b2 hours ago
      Some smaller models from the LFM2.5 family have been published on Huggingface by the end of March, a month ago.<p>It can be assumed that this larger model takes more time to complete post-training, but it will follow in the near future after those smaller LFM2.5 models.
  • trilogic4 hours ago
    Liquid AI have made some awesome models (especially the smaller ones, they are lightning fast). I wish they made a fast small size coder. Did a finetune distill of 0.8B myself and it is in fact working properly, coding like a 30B model, so I know it is possible. Anyway here you have the 24B parameters with 2B active: <a href="https:&#x2F;&#x2F;hugston.com&#x2F;models&#x2F;lfm2-24b-a2b-q4-k-m" rel="nofollow">https:&#x2F;&#x2F;hugston.com&#x2F;models&#x2F;lfm2-24b-a2b-q4-k-m</a>
  • potatobanana2 hours ago
    I liked LFM2-8B-A1B for its speed on cpu or integrated gpu. This one is slower. I used it recently while coding in offline stressful situation and it is good enough to propose an ok suggestion how to solve a simple to intermediate problem in a bit obscure language. But multi turn iteration was not working well. The code was working but it didn&#x27;t exactly fulfill all expectations from next turns. Good enough to help though and take it further, so helpful.
  • alfiedotwtf5 hours ago
    Tokens per second is nice but I would also like to see quality benchmarks especially against other models. I mean eventually someone’s gonna write a blog post comparing models, so why not just do it yourself… that way your marketing department at least get to control the narrative rather than a random blogger
    • mirekrusin4 hours ago
      It&#x27;s a checkpoint in the middle of training, it makes sense to report speed, which will stay the same and to report quality as they did.