10 comments

  • yu3zhou47 hours ago
    README is in my opinion (author here) the most interesting - I wrote it to help others build useful mental model to be able to recreate the project yourself, without need to even read my code
    • janalsncm1 hour ago
      Really practical teaching approach. I clicked in to see how safetensors are loaded and just kept reading. Thanks for sharing.
  • GoldenJade49 minutes ago
    Thanks for sharing this. As someone currently researching LLMs, I'm sure I'll be referencing this quite a bit going forward.
  • xuanlin3141 hour ago
    The lesson-style README is a great approach. Breaking down LLM inference into digestible steps makes the codebase approachable even for people who haven't touched CUDA before.
  • juancn6 hours ago
    Looks interesting, it reminds me of the first llama.cpp, but better documented.
  • nazgulsenpai7 hours ago
    I love the documentation formatted in lessons. I can't wait to read through it.
  • dwa35925 hours ago
    Very nice job on read me.<p>&gt;&gt;Physically, LLM is a file which contains a lot of float numbers.<p>aka atoms of the LLM.
    • cyanydeez5 hours ago
      the universe is just atomic if statments
  • cookiengineer5 hours ago
    Wanted to add that the author has an amazing blog with lots of interesting papers: <a href="https:&#x2F;&#x2F;jedrzej.maczan.pl&#x2F;" rel="nofollow">https:&#x2F;&#x2F;jedrzej.maczan.pl&#x2F;</a>
  • einpoklum5 hours ago
    It seems the author believes checking the return values of CUDA API calls is not &quot;tiny&quot; enough :-(
  • alexpandey27 minutes ago
    [dead]
  • harshuljain135 hours ago
    [dead]