4 comments

  • badsectoracula1 hour ago
    &gt; not to be confused with the somewhat baffling llama_chat_apply_template exposed in the libllama API, which hardcodes a handful of chat formats directly in C++<p>As someone who is tinkering with a desktop-based inference app in FLTK[0], i wish this used the actual Jinja2 template parser llama.cpp uses (or there was another C function that did that since AFAICT for &quot;proper&quot; parsing you need to be able to pass a bunch of data to the template so it knows if you, e.g., do tool calling). Currently i&#x27;m using this adhocky function, but i guess i&#x27;ll either write a Jinja2 interpreter or copy&#x2F;paste the one from llama.cpp&#x27;s code (depending on how i feel at the time :-P).<p>But yeah, GGUF&#x27;s &quot;all-in-one&quot; approach is very convenient. And i agree that it feels odd to have the projection models as separate files - i remember when i first download a vision-capable model, i just grabbed whatever GGUF looked appropriate, then llama.cpp told me it couldn&#x27;t do model and took me a bit to realize that i had to download an extra file. Literally my thought once i did was &quot;wasn&#x27;t GGUF supposed to contain everything?&quot; :-P<p>[0] <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;GiTBE1j.png" rel="nofollow">https:&#x2F;&#x2F;i.imgur.com&#x2F;GiTBE1j.png</a>
    • bitwize28 minutes ago
      Oh my God I freaking love your app. The 90s Linux desktop vibes hit like a hammer. FLTK FTW!
  • ge961 hour ago
    Nice, I recently pulled down TheBloke 7B mistral to try out I have a 4070.
    • bashbjorn1 hour ago
      I love mistral, but that model is... not the best. Maybe try out Gemma 4 e4b, it&#x27;s a similar size to Mistral 7B, and should run great on your 4070 (&quot;E4B&quot; is slightly misleading naming).
      • ge961 hour ago
        Thanks for the tip, what do you use Gemma 4 e4b for?
        • redanddead1 hour ago
          some say it’s a miniaturized gemini model<p>it’s good at writing, coding, decently intelligent<p>you can try it on nvidia nim
    • mixtureoftakes53 minutes ago
      7b mistral is quite outdated. On a 12gb 4070 you can run qwen 3.5 9b q4km or qwen 3.6 35b, the latter will be a lot smarter but also a lot slower due to ram offload.<p>Try both in lm studio, they really are surprisingly capable
      • ge9635 minutes ago
        I have 80gb of ram but it&#x27;s slow capped by i9 CPU or specific asus mobo sucks I think only 2400mhz despite being ddr4<p>Tried all the stuff bios, volting
    • ganelonhb1 hour ago
      I have a 2070 and can confirm it works amazingly fast.<p>I love TheBloke I wish he still made stuff
      • bashbjorn1 hour ago
        Yeah, TheBloke era of local LLMs were good times. TBF Unsloth are doing a fantastic job of publishing quants of the major models quickly - they just don&#x27;t have nearly the volume of &quot;weird&quot; models as TheBloke did.
      • ge961 hour ago
        What do you use it for? I&#x27;m still trying to use agents, I barely use copilot, only at work when I have to.<p>I didn&#x27;t want to get personal with an LLM unless it was local so that&#x27;s why I was setting this up but yeah. So far just research is what I was looking at.
  • kenreidwilson1 hour ago
    &gt;Published May 18, 2026<p>hmmm...
    • bashbjorn1 hour ago
      whoops, my bad. Just a typo in the markdown. Fixed :)