Why AI systems don't learn – On autonomous learning from cognitive science

(arxiv.org)

34 points by aanet5 hours ago

5 comments

zhangchen1 hour ago
Has anyone tried implementing something like System M's meta-control switching in practice? Curious how you'd handle the reward signal for deciding when to switch between observation and active exploration without it collapsing into one mode.
- robot-wrangler24 minutes ago
  > Curious how you'd handle the reward signal for deciding when to switch between observation and active exploration without it collapsing into one mode.<p>If you like biomimetic approaches to computer science, there's evidence that we want something besides neural networks. Whether we call such secondary systems emotions, hormones, or whatnot doesn't really matter much if the dynamics are useful. It seems at least possible that studying alignment-related topics is going to get us closer than any perspective that that's focused on learning. Coincidentally quanta is on some related topics today: <a href="https://www.quantamagazine.org/once-thought-to-support-neurons-astrocytes-turn-out-to-be-in-charge-20260130/" rel="nofollow">https://www.quantamagazine.org/once-thought-to-support-neuro...</a>
aanet5 hours ago
by Emmanuel Dupoux, Yann LeCun, Jitendra Malik<p>"he proposed framework integrates learning from observation (System A) and learning from active behavior (System B) while flexibly switching between these learning modes as a function of internally generated meta-control signals (System M). We discuss how this could be built by taking inspiration on how organisms adapt to real-world, dynamic environments across evolutionary and developmental timescales. "
- iFire1 hour ago
  <a href="https://github.com/plastic-labs/honcho" rel="nofollow">https://github.com/plastic-labs/honcho</a> has the idea of one sided observations for RAG.
- dasil0034 hours ago
  If this was done well in a way that was productive for corporate work, I suspect the AI would engage in Machievelian maneuvering and deception that would make typical sociopathic CEOs look like Mister Rogers in comparison. And I'm not sure our legal and social structures have the capacity to absorb that without very very bad things happening.
  - marsten2 hours ago
    Agents playing the iterated prisoner's dilemma learn to cooperate. It's usually not a dominant strategy to be entirely sociopathic when other players are involved.
    - ehnto1 hour ago
      You don't get that many iterations in the real world though, and if one of your first iterations is particularly bad you don't get any more iterations.
      - cortesoft22 minutes ago
        But AI will train in the artificial world
jdkee2 hours ago
LeCun has been talking about his JEPA models for awhile.<p><a href="https://ai.meta.com/blog/yann-lecun-ai-model-i-jepa/" rel="nofollow">https://ai.meta.com/blog/yann-lecun-ai-model-i-jepa/</a>
beernet5 hours ago
The paper's critique of the 'data wall' and language-centrism is spot on. We’ve been treating AI training like an assembly line where the machine is passive, and then we wonder why it fails in non-stationary environments. It’s the ultimate 'padded room' architecture: the model is isolated from reality and relies on human-curated data to even function.<p>The proposed System M (Meta-control) is a nice theoretical fix, but the implementation is where the wheels usually come off. Integrating observation (A) and action (B) sounds great until the agent starts hallucinating its own feedback loops. Unless we can move away from this 'outsourced learning' where humans have to fix every domain mismatch, we're just building increasingly expensive parrots. I’m skeptical if 'bilevel optimization' is enough to bridge that gap or if we’re just adding another layer of complexity to a fundamentally limited transformer architecture.
lock-locku2 hours ago
[dead]
- g_a_a_s2 hours ago
  [dead]