Pretraining Language Models via Neural Cellular Automata

(hanseungwook.github.io)

83 points by shmublu4 days ago

11 comments

al-yak18 minutes ago
Eternal question about pre-training, i.e. on what medium this pre-training or pre-pre-training has been performed ... if we assume that the platform is CMOS-based piece of hardware, then the most primitive trainable element would be a two-state learning automaton, that can be assembled out of a couple of dozen transistors. This is effectively a kind of a single cell bacteria. You then need to organize these automata into collectives which can start to exchange tokens and reinforcements between themselves and the training environment, and these collectives do so at a higher level and so on, and that's what can be seen as NCAs. (*) A good source of idea on this approach can be found in the book of M.L. Tsetlin: Automata Theory and Modelling of Biological Systems, 1973 - <a href="https://shop.elsevier.com/books/automation-theory-and-modeling-of-biological-systems/tsetlin/978-0-12-701650-4" rel="nofollow">https://shop.elsevier.com/books/automation-theory-and-modeli...</a>
benob3 hours ago
Reminds me of "Universal pre-training by iterated random computation" <a href="https://arxiv.org/pdf/2506.20057" rel="nofollow">https://arxiv.org/pdf/2506.20057</a>, with bit less formal approach.I wonder if there is a closed-form solution for those kinds of initialization methods (call them pre-training if you wish). A solution that would allow attention heads to detect a variety of diverse patterns, yet more structured than random init.
stanfordkid3 hours ago
I did a similar project but using 3D fractals I found on shadertoy feeding into ViTs. They are extremely simple iterative functions that produce a ton of scene like complexity.I have a pet theory that the visual cortex when developing is linked to some kind of mechanism such as this. You just need proteins that create some sort of resonating signal that feed into the neurons as they grow (obviously this is hand-wavy) but similar feedback loops guide nervous system growth in Zebra fish for example.
- heyitsguay2 hours ago
 What were the results of 3d fractal shader pretraining?
- andai3 hours ago
 I like your funny words, magic man!
gavinray1 hour ago
Can someone ELI5 how this hypothesis could ever be true?<pre><code> > "The core hypothesis: what makes language useful for pre-training is its structure, not its semantics." </code></pre> As a layman, I've always held the intuition that semantics are the only meaningful thing."Structure without semantics" = form without function, symmetric/regular noise, right?My naive bet is on compressing semantics into mediums more expressive/information dense than text. Like how some languages have single words/symbols to represent entire sentence-long concepts.
- andy12_40 minutes ago
 I think what they mean by this is that, for example, in "If it's raining the outside is wet. It's raining, so the outside is wet", it's more important for the model to learn "If A then B. A, therefore B" than to learn what "raining" , "outside" and "wet" mean.
andai3 hours ago
> The key: since every sequence has a unique latent rule, the model must infer that rule in-context to predict what comes next. This in-context learning ability underpins many of the key reasoning capabilities observed in language models.This is a remarkable paper. This is the first time I've heard someone training the actual thing we're trying to get this stuff to do!---> This raises a radical question: Is natural language the only path to intelligence?Of course not! We have octopi, ravens etc., which in many domain display higher intelligence than frontier AIs."Embodied reasoning" (genetic algorithm brute force solving physical tasks for a billion years, to name one solution) is definitely one very practical form of intelligence, although we're taking some shortcuts in replicating it.I'm wondering if simplified analog tasks like Box2D puzzled would help too (or perhaps even simpler? Hanoi? Block worlds?). I know many companies are using simulations of 3D worlds for that.What I don't understand is how that can integrate with the LLM (physical intelligence would seem to require specialized circuitry, if only for the latency). But maybe once we have good specialized models, LLMs can be trained on their synthetic data?
voxleone5 hours ago
Neural cellular automata are interesting because they shift learning from “predict tokens” to “model state evolution.” That feels much closer to a transition-based view of systems, where structure emerges from repeated local updates (transitions) rather than being encoded explicitly.I'm working on a theoretical/computational framework, the Functional Universe, intended for modeling physical reality as functional state evolution. i would say it could be used to replicate your CA process. Won't link it here to signal my good faith discussing this issue - it's on my GH.
- troelsSteegin3 hours ago
 from <a href="https://voxleone.github.io/FunctionalUniverse/pages/executive-summary.html" rel="nofollow">https://voxleone.github.io/FunctionalUniverse/pages/executiv...</a>, "The Functional Universe models reality as a history built from irreversible transitions, with time emerging from the accumulation of causal commitments rather than flowing as a primitive parameter." Is it fair to say that time is simply a way of organizing a log file on a dynamic reality? I interpreted "composition of transitions" as a system of processes. I think the hard modeling problem is interpreting interactions between processes - that transitions don't simply compose, that observed transitions may be confounded views of more complex transitions. I gather NCA would be granular enough to overcome that.
 - voxleone1 hour ago
 That’s a very good objection, and it’s pointing at a real pressure point in our framework.Short answer: it’s close, but incomplete. It’s not that time organizes a log of reality; rather, reality is the accumulation of committed transitions. What you’re calling a ‘log’ it’s the ontological structure itself.I gather you're basically saying: what we see as a transition ≠ what’s actually happening at the fundamental level. This is a legitimate and deep problem.You’re right that observed transitions may not compose cleanly. In the Functional Universe, composition is a property of fundamental transitions. What we observe are often coarse-grained projections of many underlying transitions, which can obscure compositional structure.
dzink4 hours ago
“The long-term vision is: foundation models that acquire reasoning from fully synthetic data, then learn semantics from a small, curated corpus of natural language. This would help us build models that reason without inheriting human biases from inception.”
- qsera4 hours ago
 I think this is a bit risky, because it assumes that all knowledge that a human posses about nature is acquired after birth.But is that correct? I think organisms also come with a partial built in understanding of nature at birth.
 - throw-qqqqq3 hours ago
 > I think organisms also come with a partial built in understanding of nature at birthI agree. Most organisms are quite pre-trained: they have “instincts” and natural behaviors.E.g. newly hatched turtles know to crawl towards the ocean immediately when they hatch. They don’t learn that on their way.It seems to me that most lifeforms come into this world pre-trained.
 - jamilton3 hours ago
 I don’t think that assumption is being made, why do you think that? In terms of metaphor, training a model could be considered both knowledge acquired after birth and its evolution. But I don’t think it’s particularly useful to stay thinking in metaphors.
dmos623 hours ago
Honestly, I never thought about reasoning this way, but it's kind of obvious now that someone did it. Very interesting.
builderhq_io3 hours ago
[dead]
Heer_J3 hours ago
[dead]