2 comments

  • ericb1 hour ago
    How long was it trained for? How many tokens?
    • vforno1 hour ago
      Hi, a couple of hours, not too much! Including sft!
  • Chu4eeno3 hours ago
    Very weird coding style, did you run astyle --style=python on C code?<p>Also, your LLM left a comment in the cuda source that it is untested, does the cuda stuff work?
    • bArray1 hour ago
      Not sure, but the code is quite dense and lacking in comments. `nanoeuler` &amp; `nanoeuler_check` is itself the binary checked straight into git with the `.log` file? All of the commit messages are &quot;Add files via upload&quot; and happened in quick succession.<p>I suspect this is LLM generated, which is cool, but shouldn&#x27;t then have the claim &quot;forward and backward passes are written and verified by hand&quot; unless it is true.<p>Regarding the data, old texts from Gutenberg probably lowers the performance - especially as many texts are on purpose whimsical. Shakespeare for example made up words to be theatrical. You have a mix of different old English styles in the corpus - it&#x27;s a terrible way to learn modern English. I had some success using .ZIM data archives from Kiwix as a source, you should get a more stable output using that data.
      • vforno1 hour ago
        Hi, the uploads are one after the other because it was a long, step-by-step research project where I tested the code on another machine. I admit that I&#x27;m slowly making up for the commits on all the projects. For Gutenberg and Shakespeare, I admit that they were the best tests I could do, but I&#x27;ll always improve!
    • dang1 hour ago
      &gt; Very weird coding style, did you run astyle --style=python on C code?<p>I&#x27;m sure you mean it in a more curious way but this type of comment on a Show HN often comes across as too harshy&#x2F;snarky&#x2F;dismissive for what we want here (see <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;showhn.html">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;showhn.html</a>).
    • vforno2 hours ago
      yes yes tested on a 4070 ti 16gb everything worked without problems!