8 comments

  • i0006 minutes ago
    Would it make sense to embed such single-purpose network with fixed weights within a LLM before pre-training?
  • E-Reverance50 minutes ago
    Not sure how much this fits into the rules but I saw on twitter someone claimed 28 params : <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;SeuperHakkerJa&#x2F;da3050739bea97aabd86ee0d7d5ef689" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;SeuperHakkerJa&#x2F;da3050739bea97aabd86e...</a>
  • amelius1 hour ago
    &gt; In short: if you can swap in a different set of weights and use the exact same inference code for a different task, your setup is legitimate. If the inference code is inseparable from the algorithm, it&#x27;s not.<p>I wonder why they don&#x27;t just write the code themselves, so by design the focus can be on the model.
  • munro8 minutes ago
    &gt;=99% accuracy wtf?!?<p>I was initially excited until i saw that, because it would reveal some sort of required local min capacity, and then further revelation that this was all vibe coded and no arXiv, makes me feel I should save my attn for another article.
  • ks204848 minutes ago
    So, hand-coded weights can do it with 36 params and 311 for trained weights - did anyone try the former architecture, but starting with random weights and learning?
  • medi8r1 hour ago
    You can do that in a single matmul of course.
    • hyperhello1 hour ago
      So can you take an arbitrary transformer and somehow turn it into a compact set of low-power fast gates by some algorithm?
      • measurablefunc1 hour ago
        I think you&#x27;re misunderstanding the joke.
        • medi8r30 minutes ago
          Yes joke is:<p><pre><code> [A B] </code></pre> times<p><pre><code> [1] [1] </code></pre> is<p><pre><code> [A+B]</code></pre>
          • hyperhello20 minutes ago
            From context then, I infer that a transformer is not comprised of matrix multiplications, because it would simply be one that adds two 10-digit numbers.
            • medi8r15 minutes ago
              A transformer tokenizes input, does a bunch of matmul and relu set up in a certain way. It doesn&#x27;t get to see the raw number (just like you don&#x27;t when you look at 1+1 you need visual cortex etc. first.)
  • 1over13713 minutes ago
    Now wrap it all in an Electron app!
  • MarcLore7 minutes ago
    [dead]