15 comments

  • wk_end9 hours ago
    &gt; 25K parameters is about 70 million times smaller than GPT-4. It will produce broken sentences. That&#x27;s the point - the architecture works at this scale.<p>Since it seems to just produce broken and nonsensical sentences (at least based on the one example given) I&#x27;m not sure if it <i>does</i> work at this scale.<p>Anyway, as written this passage doesn&#x27;t really make a whole lot of sense (the <i>point</i> is that it produces broken sentences?), and given that it was almost certainly written by an AI, it demonstrates that the architecture doesn&#x27;t work especially well at <i>any</i> scale (I kid, I kid).
    • forinti8 hours ago
      How does it compare to a Markov chain generator I wonder.
      • jll297 hours ago
        The Transformer is the more powerful model than Markov chain, but on such a weak machine as the C64, a MC could output text faster - but it surely would sound &quot;psychedelic&quot;, as the memory limits a MC to a first-order or second-order model, so to predict one word, only the two words before would be taken into account as context (and no attention).<p>On a plain vanilla C64, the Transformer cannot really show what it&#x27;s capable of doing. An implementation using 2 bit per weight (vectorized) could be slightly better, perhaps.
    • pizza2347 hours ago
      [dead]
  • arketyp26 minutes ago
    I love these counterfactual creations on old hardware. It highlights the magical freedom of creativity of software.
  • mixmastamyk7 hours ago
    Just reminded me of the random sentence generator program on my Vic-20. I had changed most of the words to all the bad words a preteen could think up. So many laughs with the neighborhood kids.
  • daemonologist7 hours ago
    You can chat with the model on the project page: <a href="https:&#x2F;&#x2F;indiepixel.de&#x2F;meful&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;indiepixel.de&#x2F;meful&#x2F;index.html</a><p>It (v3) mostly only says hello and bye, but I guess for 25k parameters you can&#x27;t complain. (I think the rather exuberant copy is probably the product of Claude et al.)
  • anyfoo8 hours ago
    This would have blown me away back in the late 80s&#x2F;early 90s.<p>(Or maybe not, if it doesn&#x27;t perform better than random, I haven&#x27;t actually tried it out yet. Some more examples would have been nice!)<p>I wonder how far you could push this while still staying period correct, e.g. by adding a REU (RAM Expansion Unit), or even a GeoRAM (basically a REU on steroids).<p>SuperCPU would also be an option, but for me it&#x27;s always blurring the line of &quot;what is a C64&quot; a bit too much, and it likely just makes it faster anyway.
    • LeFantome5 hours ago
      How fast is the “new” Commodore 64?<p>Have not heard much about it since launch. Although, now that I look, it seems they are just shipping now.<p><a href="https:&#x2F;&#x2F;www.commodore.net&#x2F;product-page&#x2F;commodore-64-ultimate-basic-beige-batch2" rel="nofollow">https:&#x2F;&#x2F;www.commodore.net&#x2F;product-page&#x2F;commodore-64-ultimate...</a>
      • steve_taylor3 hours ago
        RAM can be increased to 16 MB and CPU speed to 48 GHz.
        • wk_end2 hours ago
          I’m sorry <i>how many Hz</i>???
  • borsch_not_soup5 hours ago
    Interesting, I’ve always thought neural network progress was primarily bottlenecked by compute.<p>If it turns out that LLM-like models can produce genuinely useful outputs on something as constrained as a Commodore 64—or even more convincingly, if someone manages to train a capable model within the limits of hardware from that era—it would suggest we may have left a lot of progress on the table. Not just in terms of efficiency, but in how we framed the problem space for decades.
    • dpe825 hours ago
      <p><pre><code> YOU&gt; hey C64&gt; HELLO! RE SOUNDS ME. MEFUL! </code></pre> 60s per token for that doesn&#x27;t strike me as genuinely useful.<p>Very, very cool project though!
      • chillingeffect4 hours ago
        not useful in a disaster scenario:<p>YOU&gt; HELP I&#x27;M DROWNING<p>C64&gt; YOU&#x27; HERE!<p>YOU&gt; OH NO I&#x27;M ON FIRE<p>C64&gt; IGLAY!<p>YOU&gt; IM BEING SWALLOWED BE A SNAKE<p>C64&gt;<p>YOU&gt; BIRDS ARE NIPPING ON ME<p>C64&gt; YOU
        • Razengan4 hours ago
          Reminds me of Terry Davis&#x27; random word generator :&#x27;)<p>Maybe there is deeper wisdom in there that we have yet to unearth
    • numpad01 hour ago
      Next-word prediction features always existed for flip phones...
  • classichasclass8 hours ago
    If you&#x27;re running this in VICE, run it under the SuperCPU with warp mode on.
    • bartread8 hours ago
      That&#x27;s a good idea because, although I love this, 1 minute per token is absolutely savage. Whereas if you can juice the performance you&#x27;re into semi-credible Jar Jar Binks simulator territory.<p>It does also make me wonder what you could do with somewhat more powerful retro hardware. I&#x27;d love to see what a transformer running on a PSX or an N64 could do.
  • djmips3 hours ago
    Dissapointed - there was no 6502 code in the GitHub repo.
  • brcmthrowaway8 hours ago
    How does this compare to ELIZA?
    • Geee6 hours ago
      ELIZA is better, because this doesn&#x27;t seem to generate anything coherent. You can try the original ELIZA with DOCTOR script here: <a href="https:&#x2F;&#x2F;anthay.github.io&#x2F;eliza.html" rel="nofollow">https:&#x2F;&#x2F;anthay.github.io&#x2F;eliza.html</a>
    • jll297 hours ago
      Jopsph Weizenbaum&#x27;s ELIZA was rule-based and ran on even slower (1960s) hardware, but because it relied on simple pattern matching instead of neural nets, it would easily have been more responsive (the Emacs editor&#x2F;operating system has an implementation included, start it with: M-x doctor RETURN).<p>ELIZA was not written in assembler, but (different versions) in COMIT, FORTRAN and LISP.<p><a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;pdf&#x2F;10.1145&#x2F;365153.365168" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;pdf&#x2F;10.1145&#x2F;365153.365168</a>
  • harel9 hours ago
    Eliza called, and asked if we saw her grand kids...
    • tclancy8 hours ago
      What makes you say that? This is about you, not me.<p>(Came here to say an update to Eliza could really mess with the last person still talking to her.)
  • Vaslo5 hours ago
    Load”*”,8,1<p>Brings back memories
  • Lerc8 hours ago
    Ok now we need 1541 flash attention.<p>I&#x27;m not sure what the venn diagram of knowledge to understand what that sentence is suggesting looks like, it&#x27;s probably more crowded in the intersection than one might think.
    • dnnddidiej57 minutes ago
      How many 40+ AI pillers? Assume 10M devs in the world. 10% heard of flash attention, 1% heard of 1541 then 10,000
      • Lerc10 minutes ago
        Ahh but you also have to know the significance of the 1541 that makes the Flash attention reference work
  • bighead19 hours ago
    i hate ai, and i love the c64, but i&#x27;ll allow it.
  • ghstinda8 hours ago
    but can you make mac keyboards feel like a c64c?