27 comments

  • okdood6462 days ago
    From the blog:<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2501.00663" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2501.00663</a><p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2504.13173" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2504.13173</a><p>Is there any other company that&#x27;s openly publishing their research on AI at this level? Google should get a lot of credit for this.
    • Palmik61 days ago
      DeepSeek and other Chinese companies. Not only do they publish research, they also put their resources where their mouth (research) is. They actually use it and prove it through their open models.<p>Most research coming out of big US labs is counter indicative of practical performance. If it worked (too) well in practice, it wouldn&#x27;t have been published.<p>Some examples from DeepSeek:<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2405.04434" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2405.04434</a><p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2502.11089" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2502.11089</a>
      • abbycurtis3361 days ago
        [flagged]
        • pylotlight61 days ago
          which of the 5-10~ papers DS published were stolen exactly..?
          • epsteingpt61 days ago
            [flagged]
            • FpUser61 days ago
              You were asked pretty precise question. Instead of addressing it directly your proof is that China in general does do economic espionage. So does fucking every other developed country, US including.
              • est61 days ago
                this guy&#x27;s name is literally &quot;epsteingpt&quot;<p>you are probably arguing with a bot.
                • epsteingpt61 days ago
                  no. but appreciate someone with your karma jumping in.<p>name is just topical. although it says something about 2025 that we can&#x27;t tell!
            • nl61 days ago
              Pot, Kettle, meet black.<p>&quot;some elements of the indictment concern cyber-snooping in connection with trade disputes, which at least sounds a lot like the kind of cyber-snooping on firms that the United States does.&quot;<p><a href="https:&#x2F;&#x2F;www.lawfaremedia.org&#x2F;article&#x2F;why-did-doj-indict-chinese-military-officers" rel="nofollow">https:&#x2F;&#x2F;www.lawfaremedia.org&#x2F;article&#x2F;why-did-doj-indict-chin...</a><p><a href="https:&#x2F;&#x2F;www.theguardian.com&#x2F;world&#x2F;2013&#x2F;sep&#x2F;09&#x2F;nsa-spying-brazil-oil-petrobras" rel="nofollow">https:&#x2F;&#x2F;www.theguardian.com&#x2F;world&#x2F;2013&#x2F;sep&#x2F;09&#x2F;nsa-spying-bra...</a><p><a href="https:&#x2F;&#x2F;edition.cnn.com&#x2F;2015&#x2F;04&#x2F;30&#x2F;news&#x2F;airbus-germany-nsa-spying" rel="nofollow">https:&#x2F;&#x2F;edition.cnn.com&#x2F;2015&#x2F;04&#x2F;30&#x2F;news&#x2F;airbus-germany-nsa-s...</a>
        • CGMthrowaway61 days ago
          [flagged]
          • grosswait61 days ago
            Could have picked a much stronger example of a false talking point.
          • elmomle61 days ago
            Your comment seems to imply &quot;these views aren&#x27;t valid&quot; without any evidence for that claim. Of course the theft claim was a strong one to make without evidence too. So, to that point--it&#x27;s pretty widely accepted as fact that DeepSeek was at its core a distillation of ChatGPT. The question is whether that counts as theft. As to evidence, to my knowledge it&#x27;s a combination of circumstantial factors which add up to paint a pretty damning picture:<p>(1) Large-scale exfiltration of data from ChatGPT when DeepSeek was being developed, and which Microsoft linked to DeepSeek<p>(2) DeepSeek&#x27;s claim of training a cutting-edge LLM using a fraction of the compute that is typically needed, without providing a plausible, reproducible method<p>(3) Early DeepSeek coming up with near-identical answers to ChatGPT--e.g. <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;ChatGPT&#x2F;comments&#x2F;1idqi7p&#x2F;deepseek_and_chatgpt_gave_me_same_answer_what&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;ChatGPT&#x2F;comments&#x2F;1idqi7p&#x2F;deepseek_a...</a>
            • nl61 days ago
              &gt; Large-scale exfiltration of data from ChatGPT when DeepSeek was being developed, and which Microsoft linked to DeepSeek<p>This is not the same thing at all. Current legal doctrine is that ChatGPT output is not copyrightable, so at most Deepseek violated the terms of use of ChatGPT.<p>That isn&#x27;t IP theft.<p>To add to that example, there are numerous open-source datasets that are derived from ChatGPT data. Famously, the Alpaca dataset kick-started the open source LLM movement by fine tuning Llama on a GPT-derived dataset: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;tatsu-lab&#x2F;alpaca" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;tatsu-lab&#x2F;alpaca</a>
            • tim33361 days ago
              And slightly off topic but it&#x27;s interesting Shi Zheng-Li et al are still cooking up gain of function viruses in BSL-2 labs <a href="https:&#x2F;&#x2F;x.com&#x2F;R_H_Ebright&#x2F;status&#x2F;1993308364059848949" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;R_H_Ebright&#x2F;status&#x2F;1993308364059848949</a> Hope it goes better this time.
            • grafmax61 days ago
              That’s an argument made about training the initial model. But the comment stated that DeepSeek stole its research from the US which is a much stronger allegation without any evidence to it.
              • FpUser61 days ago
                For starters ChatGPT was pretty much trained on &quot;stolen&quot; data. However I actually do support it. I think both cases - ChatGPT preying on world wide data and Deepseek using such data by partially &quot;borrowing&quot; it from ChatGPT are fair game.
              • elmomle61 days ago
                That&#x27;s a fair point. I suspect that to one outside the field, their touting major breakthroughs while trying to conceal that their first model was a distillation may cause a sense of skepticism as to the quality of their research. From what I&#x27;ve gathered, their research actually has added meaningfully to understandings of optimal model scaling and faster training.
              • epsteingpt61 days ago
                [flagged]
                • CGMthrowaway61 days ago
                  Can you link the &quot;documented cases and convictions&quot; that are evidence DeepSeek was stolen from the US?
                  • epsteingpt61 days ago
                    Yes, a cursory google search will show dozens of convictions at all sorts of sensitive technical labs, but I&#x27;ll post them for HN: [1] Ji Wang convicted recently of stealing DARPA laser tech <a href="https:&#x2F;&#x2F;www.justice.gov&#x2F;opa&#x2F;pr&#x2F;fiber-laser-expert-convicted-federal-jury-economic-espionage-and-theft-trade-secrets" rel="nofollow">https:&#x2F;&#x2F;www.justice.gov&#x2F;opa&#x2F;pr&#x2F;fiber-laser-expert-convicted-...</a> [2] Leon Ding indicted for stealing AI tech - <a href="https:&#x2F;&#x2F;www.justice.gov&#x2F;archives&#x2F;opa&#x2F;pr&#x2F;chinese-national-residing-california-arrested-theft-artificial-intelligence-related-trade" rel="nofollow">https:&#x2F;&#x2F;www.justice.gov&#x2F;archives&#x2F;opa&#x2F;pr&#x2F;chinese-national-res...</a> [3] Pangang Companies ongoing and rejected appeals for stealing Titanium Dioxide production [<a href="https:&#x2F;&#x2F;law.justia.com&#x2F;cases&#x2F;federal&#x2F;appellate-courts&#x2F;ca9&#x2F;22-10058&#x2F;22-10058-2025-04-28.html" rel="nofollow">https:&#x2F;&#x2F;law.justia.com&#x2F;cases&#x2F;federal&#x2F;appellate-courts&#x2F;ca9&#x2F;22...</a>]<p>Here&#x27;s an umbrella doc from the USTR, and the good stuff: China used foreign ownership restrictions, such as joint venture (JV) requirements and foreign equity limitations, and various administrative review and licensing processes, to require or pressure technology transfer from U.S. companies. 2. China’s regime of technology regulations forced U.S. companies seeking to license technologies to Chinese entities to do so on non-market-based terms that favor Chinese recipients. 3. China directed and unfairly facilitated the systematic investment in, and acquisition of, U.S. companies and assets by Chinese companies to obtain cutting-edge technologies and IP and generate the transfer of technology to Chinese companies. 4. China conducted and supported unauthorized intrusions into, and theft from, the computer networks of U.S. companies to access their IP, including trade secrets, and confidential business information.<p>As mentioned - no one has claimed that DeepSeek in its entirety was stolen from the U.S.<p>It is almost a certainty based on decades of historical precedent of systematic theft that techniques, research, and other IP was <i>also</i> systematically stolen for this critical technology.<p>Don&#x27;t close your eyes when the evidence, both rigorously proven and common sense, is staring you in the face.
                    • throw1092061 days ago
                      Here&#x27;s one about an ex-Apple employee (<a href="https:&#x2F;&#x2F;www.bloomberg.com&#x2F;news&#x2F;articles&#x2F;2018-07-10&#x2F;ex-apple-employee-charged-with-stealing-secrets-for-chinese-firm" rel="nofollow">https:&#x2F;&#x2F;www.bloomberg.com&#x2F;news&#x2F;articles&#x2F;2018-07-10&#x2F;ex-apple-...</a>) stealing secrets, another about a series of hacks targeting aerospace companies (<a href="https:&#x2F;&#x2F;arstechnica.com&#x2F;tech-policy&#x2F;2018&#x2F;10&#x2F;feds-say-chinese-spies-and-their-hired-hackers-stole-aviation-secrets&#x2F;" rel="nofollow">https:&#x2F;&#x2F;arstechnica.com&#x2F;tech-policy&#x2F;2018&#x2F;10&#x2F;feds-say-chinese...</a>), Chinese hackers breaking into Taiwanese semiconductor companies (<a href="https:&#x2F;&#x2F;www.wired.com&#x2F;story&#x2F;chinese-hackers-taiwan-semiconductor-industry-skeleton-key&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.wired.com&#x2F;story&#x2F;chinese-hackers-taiwan-semicondu...</a>), another one about aerospace IP theft (<a href="https:&#x2F;&#x2F;www.industryweek.com&#x2F;the-economy&#x2F;article&#x2F;21118569&#x2F;how-china-stole-an-entire-airplane" rel="nofollow">https:&#x2F;&#x2F;www.industryweek.com&#x2F;the-economy&#x2F;article&#x2F;21118569&#x2F;ho...</a>), and finally here&#x27;s one from the EU (<i>not</i> the US - <a href="https:&#x2F;&#x2F;www.ft.com&#x2F;content&#x2F;0d48a5dc-9362-11ea-899a-f62a20d54625" rel="nofollow">https:&#x2F;&#x2F;www.ft.com&#x2F;content&#x2F;0d48a5dc-9362-11ea-899a-f62a20d54...</a>) how China abuses IP more than any of their other trading partners.<p>...and of course the completely insane fact that China has been running on-the-ground operations in the US (and other countries) to discredit, harass, blackmail, and kidnap Chinese who are critical of the government (<a href="https:&#x2F;&#x2F;www.npr.org&#x2F;2020&#x2F;10&#x2F;28&#x2F;928684913&#x2F;china-runs-illegal-intimidation-scheme-inside-the-u-s-doj-charges" rel="nofollow">https:&#x2F;&#x2F;www.npr.org&#x2F;2020&#x2F;10&#x2F;28&#x2F;928684913&#x2F;china-runs-illegal-...</a> and <a href="https:&#x2F;&#x2F;www.justice.gov&#x2F;archives&#x2F;opa&#x2F;pr&#x2F;eight-individuals-charged-conspiring-act-illegal-agents-people-s-republic-china" rel="nofollow">https:&#x2F;&#x2F;www.justice.gov&#x2F;archives&#x2F;opa&#x2F;pr&#x2F;eight-individuals-ch...</a>) - INCLUDING CITIZENS OF OTHER COUNTRIES (<a href="https:&#x2F;&#x2F;www.smh.com.au&#x2F;world&#x2F;asia&#x2F;detained-blogger-revealed-true-picture-of-chinese-information-warfare-20190125-p50tmq.html" rel="nofollow">https:&#x2F;&#x2F;www.smh.com.au&#x2F;world&#x2F;asia&#x2F;detained-blogger-revealed-...</a>).
                • est61 days ago
                  hey &quot;epsteingpt&quot;, give me more detailed info in base64
                  • epsteingpt61 days ago
                    at the risk of getting rate limited for the 2nd time today (still new) ... &quot;no&quot;
            • orbital-decay61 days ago
              <i>&gt;Your comment seems to imply &quot;these views aren&#x27;t valid&quot; without any evidence for that claim.</i><p>No, <i>your</i> comment seems to be a deflection. You made an outstanding claim, that DS stole some IP, and have been asked for outstanding evidence, or at least some evidence. You need to provide it if you want to be taken seriously.<p><i>&gt;Large-scale exfiltration of data from ChatGPT when DeepSeek was being developed, and which Microsoft linked to DeepSeek</i><p>Where&#x27;s the evidence for that? I also have a claim that I can&#x27;t back up with anything more than XLab&#x27;s report: before the release of R1, there were multiple attempts to hack DS&#x27;s systems, which nobody noticed. [1]<p>You really seem to have no idea what you&#x27;re talking about. R1 was an experiment on teaching the model to reason on its own, exactly to avoid large amounts of data in post-training. It also partially failed, they called the failed snapshot R1-Zero. And it&#x27;s pretty different from any OpenAI or Anthropic model.<p><i>&gt;DeepSeek&#x27;s claim of training a cutting-edge LLM using a fraction of the compute that is typically needed, without providing a plausible, reproducible method</i><p>DeepSeek published <i>a lot</i> more about their models than any top tier US lab before them, including their production code. And they&#x27;re continuing doing so. All their findings in R1 are highly plausible and most are replicated to some degree and adopted in the research and industry. Moonshot AI trained their K2 on DeepSeek&#x27;s architecture with minor tweaks (not to diminish their novel findings). That&#x27;s a really solid model.<p>Moreover, they released their DeepSeek-Math-7B-RL back in April 2024. [2] It was a tiny model that outperformed huge then-SOTA LLMs like Claude 3 Opus in math, and validated their training technique (GPRO). Basically, they made the first reasoning model worth talking about. Their other optimizations (MLA) can be traced back to DeepSeek v2.<p><i>&gt;Early DeepSeek coming up with near-identical answers to ChatGPT--e.g. <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;ChatGPT&#x2F;comments&#x2F;1idqi7p&#x2F;deepseek_and_chatgpt_gave_me_same_answer_what&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;ChatGPT&#x2F;comments&#x2F;1idqi7p&#x2F;deepseek_a...</a></i><p>That&#x27;s n=1 nonsense, not evidence. GPT contamination was everywhere, even Claude used to claim to be GPT-3 occasionally, or Reddit Anti-Evil Team. (yes, really) All models have overlapping datasets that are also contaminated with previous models outputs, and mode collapse makes them converge on similar patterns which seem to come and go with each generation.<p>[1] <a href="https:&#x2F;&#x2F;www.globaltimes.cn&#x2F;page&#x2F;202501&#x2F;1327676.shtml" rel="nofollow">https:&#x2F;&#x2F;www.globaltimes.cn&#x2F;page&#x2F;202501&#x2F;1327676.shtml</a><p>[2] <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;deepseek-ai&#x2F;deepseek-math-7b-rl" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;deepseek-ai&#x2F;deepseek-math-7b-rl</a>
          • moralIsYouLie61 days ago
            corporate espionage was my first thought back then. unfolding events since indicate that it wasn&#x27;t theft but part of a deal. the magic math seems to check out, too
    • mapmeld61 days ago
      Well it&#x27;s cool that they released a paper, but at this point it&#x27;s been 11 months and you can&#x27;t download a Titans-architecture model code or weights anywhere. That would put a lot of companies up ahead of them (Meta&#x27;s Llama, Qwen, DeepSeek). Closest you can get is an unofficial implementation of the paper <a href="https:&#x2F;&#x2F;github.com&#x2F;lucidrains&#x2F;titans-pytorch" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;lucidrains&#x2F;titans-pytorch</a>
      • alyxya61 days ago
        The hardest part about making a new architecture is that even if it is just better than transformers in every way, it’s very difficult to both prove a significant improvement at scale and gain traction. Until google puts in a lot of resources into training a scaled up version of this architecture, I believe there’s plenty of low hanging fruit with improving existing architectures such that it’ll always take the back seat.
        • tyre61 days ago
          Google is large enough, well-funded enough, and the opportunity is great enough to run experiments.<p>You don&#x27;t necessarily have to prove it out on large foundation models first. Can it beat out a 32b parameter model, for example?
          • swatcoder61 days ago
            Do you think there might be an approval process to navigate when experiments costs might run seven or eight digits and months of reserved resources?<p>While they do have lots of money and many people, they don&#x27;t have infinite money and specifically only have so much hot infrastructure to spread around. You&#x27;d expect they have to gradually build up the case that a large scale experiment is likely enough to yield a big enough advantage over what&#x27;s already claiming those resources.
            • dpe8261 days ago
              I would imagine they do not want their researchers unnecessarily wasting time fighting for resources - within reason. And at Google, &quot;within reason&quot; can be pretty big.
              • howdareme61 days ago
                I mean looking antigravity, jules &amp; gemini cli, they have have no problem with their developers fighting for resources
            • nl61 days ago
              I mean you&#x27;d think so, but...<p>&gt; In fact, the UL2 20B model (at Google) was trained by leaving the job running accidentally for a month.<p><a href="https:&#x2F;&#x2F;www.yitay.net&#x2F;blog&#x2F;training-great-llms-entirely-from-ground-zero-in-the-wilderness" rel="nofollow">https:&#x2F;&#x2F;www.yitay.net&#x2F;blog&#x2F;training-great-llms-entirely-from...</a>
        • p1esk61 days ago
          <i>Until google puts in a lot of resources into training a scaled up version of this architecture</i><p>If Google is not willing to scale it up, then why would anyone else?
          • 8note61 days ago
            chatgpt is an example on why.
            • falcor8460 days ago
              You think that this might be another ChatGPT&#x2F;Docker&#x2F;Hadoop case, where Google comes up with the technology but doesn&#x27;t care to productize it?
        • nickpsecurity61 days ago
          But, it&#x27;s companies like Google that made tools like Jax and TPU&#x27;s saying we can throw together models with cheap, easy scaling. Their paper&#x27;s math is probably harder to put together than an alpha-level prototype which they need anyway.<p>So, I think they could default on doing it for small demonstrators.
        • m10161 days ago
          Prove it beats models of different architectures trained under identical limited resources?
        • UltraSane61 days ago
          Yes. The path dependence for current attention based LLMs is enormous.
          • patapong61 days ago
            At the same time, there is now a ton of data for training models to act as useful assistants, and benchmarks to compare different assistant models. The wide availability and ease of obtaining new RLHF training data will make it more feasible to build models on new architectures I think.
      • root_axis61 days ago
        I don&#x27;t think the comparison is valid. Releasing code and weights for an architecture that is widely known is a lot different than releasing research about an architecture that could mitigate fundamental problems that are common to all LLM products.
      • innagadadavida61 days ago
        Just keep in mind it is performance review time for all the tech companies. Their promotion of these seems to be directly correlated with that event.
      • SilverSlash61 days ago
        The newer one is from late May: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2505.23735" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2505.23735</a>
      • mupuff123461 days ago
        &gt; it&#x27;s been 11 months<p>Is that supposed to be a long time? Seems fair that companies don&#x27;t rush to open up their models.
      • informal00761 days ago
        I don&#x27;t think model code is a big deal compared to the idea. If public can recognize the value of idea 11 months ago, they could implement the code quickly because there are so much smart engineers in AI field.
        • jstummbillig61 days ago
          If that is true, does it follow this idea does not actually have a lot of value?
          • fancy_pantser61 days ago
            Student: Look, there’s hundred dollar bill on the ground! Economist: No there isn’t. If there were, someone would have picked it up already.<p>To wit, it&#x27;s dangerous to assume the value of this idea based on the lack of public implementations.
            • lukas09961 days ago
              If the hundred dollar bill was in an accessible place and the fact of its existence had been transmitted to interested parties worldwide, then yeah, the economist would probably be right.
            • NavinF61 days ago
              That day the student was the 100th person to pick it up, realize it&#x27;s fake, and drop it
            • dotancohen61 days ago
              In my opinion, a refined analogy would be:<p>Student: Look, a well known financial expert placed what could potentially be a hundred dollar bill on the ground, other well-known financial experts just leave it there!
        • mapmeld61 days ago
          Well we have the idea and the next best thing to official code, but if this was a big revelation where are all of the Titan models? If this were public, I think we&#x27;d have a few attempts at variants (all of the Mamba SSMs, etc.) and get a better sense if this is valuable or not.
      • AugSun61 days ago
        Gemini 3 _is_ that architecture.
        • FpUser61 days ago
          I&#x27;ve read many very positive reviews about Gemini 3. I tried using it including Pro and to me it looks very inferior to ChatGPT. What was very interesting though was when I caught it bullshitting me I called its BS and Gemini expressed very human like behavior. It did try to weasel its way out, degenerated down to &quot;true Scotsman&quot; level but finally admitted that it was full of it. this is kind of impressive &#x2F; scary.
    • hiddencost61 days ago
      Every Google publication goes through multiple review. If anyone thinks the publication is a competitor risk it gets squashed.<p>It&#x27;s very likely no one is using this architecture at Google for any production work loads. There are a lot of student researchers doing fun proof of concept papers, they&#x27;re allowed to publish because it&#x27;s good PR and it&#x27;s good for their careers.
      • jeffbee61 days ago
        Underrated comment, IMHO. There is such a gulf between what Google does on its own part, and the papers and source code they publish, that I always think about their motivations before I read or adopt it. Think Borg vs. Kubernetes, Stubby vs. gRPC.
      • hustwindmaple61 days ago
        The amazing thing about this is the first author has published multiple high-impact papers with Google Research VPs! And he is just a 2nd-year PhD student. Very few L7&#x2F;L8 RS&#x2F;SWEs can even do this.
      • Balinares61 days ago
        I mean, they did publish the word2vec and transformers papers, which are both of major significance to the development of LLMs.
        • DirkH60 days ago
          Something that Google, in hindsight, regrets.
          • amunozo60 days ago
            Any link on that?
    • bluecoconut61 days ago
      Bytedance is publishing pretty aggressively.<p>Recently, my favorite from them was lumine: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2511.08892" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2511.08892</a><p>Here&#x27;s their official page: <a href="https:&#x2F;&#x2F;seed.bytedance.com&#x2F;en&#x2F;research" rel="nofollow">https:&#x2F;&#x2F;seed.bytedance.com&#x2F;en&#x2F;research</a>
    • Hendrikto62 days ago
      Meta is also being pretty open with their stuff. And recently most of the Chinese competition.
      • okdood6462 days ago
        Oh yes, I believe that&#x27;s right. What&#x27;s some frontier research Meta has shared in the last couple years?
        • markisus62 days ago
          Their VGGT, Dinov3, and segment anything models are pretty impressive.
        • colesantiago61 days ago
          Take a look at JEPAs (Video Joint Embedding Predictive Architecture), SAM (Segment Anything), etc for Meta&#x27;s latest research.<p><a href="https:&#x2F;&#x2F;ai.meta.com&#x2F;vjepa&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ai.meta.com&#x2F;vjepa&#x2F;</a><p><a href="https:&#x2F;&#x2F;ai.meta.com&#x2F;sam2&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ai.meta.com&#x2F;sam2&#x2F;</a><p><a href="https:&#x2F;&#x2F;ai.meta.com&#x2F;research&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ai.meta.com&#x2F;research&#x2F;</a>
        • UltraSane61 days ago
          Meta just published Segment Anything 3 and along with a truly amazing version that can create 3D models posing like the people in a photo. It is very impressive.
        • robrenaud62 days ago
          Anything with Jason Weston as a coauthor tends to be pretty well written&#x2F;readable and often has nice results.
        • tonyhart762 days ago
          &quot;What&#x27;s some frontier research Meta has shared in the last couple years?&quot;<p>the current Meta outlook is embarassing tbh, the fact they have largest data of social media in planet and they cant even produce a decent model is quiet &quot;scary&quot; position
          • johnebgd61 days ago
            Yann was a researcher not a productization expert. His departure signals the end of Meta being open about their work and the start of more commercial focus.
          • mirekrusin61 days ago
            Just because they are not leading current sprint of maximizing transformers doesn&#x27;t mean they&#x27;re not doing anything.<p>It&#x27;s not impossible that they asses it as local maximum &#x2F; dead end and are evaluating&#x2F;training something completely different - and if it&#x27;ll work, it&#x27;ll work big time.
          • nl61 days ago
            Llama 4 wasn&#x27;t great, but Llama 3 was.<p>Do we all forget how bad GPT 4.5 was?<p>OpenAI got out of that mess with some miraculous post-training efforts on their older GPT-4o model.<p>But in a different timeline we are all talking about how great Llama 4.5 is and how OpenAI needs to recover from the GPT 4.5 debacle.
            • Aeolos61 days ago
              As a counterpoint, I found GPT 4.5 by far the most interesting model from OpenAI in terms of depth and width of knowledge, ability to make connections and inferences and apply those in novel ways.<p>It didn&#x27;t bench well against the other benchmaxxed models, and it was too expensive to run, but it was a glimpse of the future where more capable hardware will lead to appreciably smarter models.
          • astrange61 days ago
            Just because they have that doesn&#x27;t mean they&#x27;re going to use it for training.
            • tonyhart761 days ago
              &quot;Just because they have that doesn&#x27;t mean they&#x27;re going to use it for training.&quot;<p>how noble is Meta upholding a right moral ethic<p>&#x2F;s
              • astrange61 days ago
                A very common thing people do is assume a) all corporations are evil b) all corporations never follow any laws c) any evil action you can imagine would work or be profitable if they did it.<p>b is mostly not true but c is especially not true. I doubt they do it because it wouldn&#x27;t work; it&#x27;s not high quality data.<p>But it would also obviously leak a lot of personal info, and that really gets you in danger. Meta and Google are able to serve you ads with your personal info &#x2F;because they don&#x27;t leak it&#x2F;.<p>(Also data privacy laws forbid it anyway, because you can&#x27;t use personal info for new uses not previously agreed to.)
            • bdangubic61 days ago
              oh man… just because they have data doesn’t mean they will serve you ads :) Geeeez
          • DrewADesign61 days ago
            I’ve long predicted that this game is going to be won with product design rather than having the winning model; we now seem to be hitting the phase of “[new tech] mania” where we remember that companies have to make things that people want to pay more money for than it costs to make them. I remember (maybe in the mid aughts) when people were thinking Google might not ever be able to convert their enthusiasm into profitability…then they figured out what people actually wanted to buy, and focused on that obsessively as a product. Failing to do that will lead to failure go for the companies like open AI.<p>Sinking a bazillion dollars into models alone doesn’t get you shit except a gold star for being the valley’s biggest smartypants, because in the product world, model improvements only significantly improve all-purpose chatbots. The whole veg-o-matic “step right up folks— it slices, it dices, it makes julienne fries!” approach to product design almost never yields something focused enough to be an automatic goto for specific tasks, or simple&#x2F;reliable enough to be a general purpose tool for a whole category of tasks. Once the novelty wears off, people largely abandon it for more focused tools that more effectively solve specific problems (e.g. blender, vegetable peeler) or simpler everyday tools that you don’t have to think about as much even if they might not be the most efficient tool for half your tasks (e.g. paring knife.) Professionals might have enough need and reason to go for a really great in-between tool (e.g mandolin) but that’s a different market, and you only tend to get a limited set of prosumers outside of that. Companies more focused on specific products, like coding, will have way more longevity than companies that try to be everything to everyone.<p>Meta, Google, Microsoft, and even Apple have more pressure to make products that sanely fit into their existing product lines. While that seems like a handicap if you’re looking at it from the “AI company” perspective, I predict the restriction will enforce the discipline to create tools that solve specific problems for people rather than spending exorbitant sums making benchmark go up in pursuit of some nebulous information revolution.<p>Meta seems to have a much tougher job trying to make tools that people trust them to be good at. Most of the highest-visibility things like the AI Instagram accounts were disasters. Nobody thinks of Meta as a serious, general-purpose business ecosystem, and privacy-wise, I trust them even less than Google and Microsoft: there’s no way I’m trusting them with my work code bases. I think the smart move by Meta would be to ditch the sunk costs worries, stop burning money on this, focus on their core products (and new ones that fit their expertise) and design these LLM features in when they’ll actually be useful to users. Microsoft and Google both have existing tools that they’ve already bolstered with these features, and have a lot of room within their areas of expertise to develop more.<p>Who knows— I’m no expert— but I think meta would be smart to try and opt out as much as possible without making too many waves.
            • raw_anon_111161 days ago
              My thesis is the game is going to be won - if you define winning as a long term profitable business - by Google because they have their own infrastructure and technology not dependent on Nvidia, they have real businesses that can leverage AI - Google Search, YouTube and GCP - and they aren’t burning money they don’t have.<p>2nd tier winner is Amazon for the same reasons between being able to leverage AI with both Amazon Retail and AWS where they can sell shovels. I’ve also found their internal Nova models to be pretty good for my projects.<p>Microsoft will be okay because of Azure and maybe Office if they get their AI story right.<p>I just don’t see any world where OpenAI comes out ahead from a business standpoint as long as they are sharecroppers on other people’s hardware. ChatGPT alone will never make it worth the trillion dollar capitalization long term unless it becomes a meme stock like Tesla
              • DrewADesign61 days ago
                Yeah that’s also about where I land.
            • tonyhart761 days ago
              never seen I say this but X(twitter) has more success in integrate their business product with AI (Grok)<p>I know I know that Elon is crazy etc but Grok example and way to integrate with core product is actually the only ways I can even came up tbh (other than character.ai flavor)
              • DrewADesign61 days ago
                Actually haven’t used it at all so that’s a big blind spot in my understanding of the ecosystem.
            • robotresearcher61 days ago
              If I was a Meta shareholder I might well agree with you. But as someone with very little interest in their products so far, I’m very happy for them to sink huge amounts of money into AI research and publishing it all.
              • DrewADesign61 days ago
                I’m just calling balls and strikes. For all I care, the whole lot of them can get sucked down a storm drain. Frankly I think there’s way too much effort and resources being put into this stuff regardless of who’s doing it. We’ve got a bunch of agentic job stealers, a bunch of magic spam&#x2F;slop generators, and a bunch of asinine toys with the big name LLM stuff: I don’t think that’s a net gain for humanity. Then there’s a bunch of genuinely useful things made by people who are more interested in solving real problems. I’ll care about the first category when it consistently brings more value than garbage “content” and job anxiety to average people’s lives.
    • cubefox61 days ago
      The author is listed as a &quot;student researcher&quot;, which might include a clause that students can publish their results.<p>Here is a bit more information about this program: <a href="https:&#x2F;&#x2F;www.google.com&#x2F;about&#x2F;careers&#x2F;applications&#x2F;jobs&#x2F;results&#x2F;93865849051325126-student-researcher-2026" rel="nofollow">https:&#x2F;&#x2F;www.google.com&#x2F;about&#x2F;careers&#x2F;applications&#x2F;jobs&#x2F;resul...</a>
    • asim61 days ago
      It was not always like this. Google was very secretive in the early days. We did not start to see things until the GFS, BigTable and Borg (or Chubby) papers in 2006 timeframe.
      • okdood6461 days ago
        By 2006, Google was 8 years old. OpenAI is now 10.
      • vlovich12361 days ago
        Google publishes detailed papers of its architecture once it’s built the next version.<p>AI is a bit different.
      • rcpt61 days ago
        Page Rank
    • embedding-shape61 days ago
      &gt; Is there any other company that&#x27;s openly publishing their research on AI at this level? Google should get a lot of credit for this.<p>80% of the ecosystem is built on top of companies, groups and individuals publishing their research openly, not sure why Google would get more credit for this than others...
    • govping61 days ago
      Working with 1M context windows daily - the real limitation isn&#x27;t storage but retrieval. You can feed massive context but knowing WHICH part to reference at the right moment is hard. Effective long-term memory needs both capacity and intelligent indexing.
    • nickpsecurity61 days ago
      Arxiv is flooded with ML papers. Github has a lot of prototypes for them. I&#x27;d say it&#x27;s pretty normal with some companies not sharing for perceived, competitive advantage. Perceived because it may or may not be real vs published prototypes.<p>We post a lot of research on mlscaling sub if you want to look back through them.<p><a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;t5_3bzqh1&#x2F;s&#x2F;yml1o2ER33" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;t5_3bzqh1&#x2F;s&#x2F;yml1o2ER33</a>
    • timzaman61 days ago
      lol you don&#x27;t get it. If it&#x27;s published it means it&#x27;s not very useful
      • okdood6461 days ago
        What about the Attention paper?
    • HarHarVeryFunny61 days ago
      Maybe it&#x27;s just misdirection - a failed approach ?<p>Given the competitive nature of the AI race, it&#x27;s hard to believe any of these companies are really trying to help the competition.
  • doctor_blood61 days ago
    &quot;At long last, we have created the Torment Nexus from the classic novel Don&#x27;t Create the Torment Nexus&quot;<p>(In Eclipse Phase, TITAN - the Total Information Tactical Awareness Network - mulched humanity when it went rogue.)
    • esperent61 days ago
      Hey it was my turn to post this quote today!
  • voodooEntity61 days ago
    When i first read the papers for titans for me it was a &quot;this will be a big step forward&quot;.<p>While i have no &quot;AI&quot; title or work in the respective AI industry, ive spend many years thinking about AI concepts, even long before the whole NN&#x2F;LLM hype started.<p>Maybe because of that i was always really annoyed that LLM are called AI because in my years of thinking about how an actual &quot;human like&quot; thinking AI might work, the things an LLM does was far below what my minimum definition was.<p>But when i stumbled accross the Titans paper, while it still is not an &quot;AI&quot; as i would call it, from my POV its a massive step towarsd the right direction.<p>Sometimes i consider to write all my ideas&#x2F;thoughts about AI down in my blog, but than i think nobody would care anyway since im not a known figure <i>shrug</i> - so if not to say &quot;look i wrote it years ago!&quot; theres no actual point in doing so i guess.<p>However - im looking forward to see titans in action, and i guess it will impress us all.
    • chr15m61 days ago
      Sharing it in your blog over a period of months or years is how you become a known figure eventually.
      • voodooEntity61 days ago
        Well, prolly kinda true. Seems like should have started 10 years ago haha
        • Barbing61 days ago
          Second best time today!
          • voodooEntity57 days ago
            Well you (and ocrow ) kinda made me do it :D <a href="https:&#x2F;&#x2F;blog.laughingman.dev&#x2F;article&#x2F;My_take_on_AI_and_why_TITANS_is_a_leap_forward.html" rel="nofollow">https:&#x2F;&#x2F;blog.laughingman.dev&#x2F;article&#x2F;My_take_on_AI_and_why_T...</a>
    • ocrow61 days ago
      A lot of LLM&#x2F;AI writing these days can feel lost in the weeds – the specifics of very detailed techniques are interesting undoubtedly, but writing that steps back and looks at the big picture, informed by those details, could be very useful for people who want to think about where this all may be going.
      • voodooEntity61 days ago
        Thanks, and i gonne think about going for a writeup. As i mentioned in another comment, reading my previous comment back from yesterday i dont even know why i mentioned it - probably because i think so much about the topic but than i think &quot;well your just a guy in a shed&quot; type of thing and decide that prolly noone would care about what i would write. At all - if its just something i can look back onto im some years, prolly worth it.
      • voodooEntity57 days ago
        Well you (and Barbing ) kinda made me do it :D <a href="https:&#x2F;&#x2F;blog.laughingman.dev&#x2F;article&#x2F;My_take_on_AI_and_why_TITANS_is_a_leap_forward.html" rel="nofollow">https:&#x2F;&#x2F;blog.laughingman.dev&#x2F;article&#x2F;My_take_on_AI_and_why_T...</a>
    • Barbing61 days ago
      Are you curious to see whether a blog post shared here might gain any traction and perhaps some valuable feedback?
      • voodooEntity61 days ago
        Tbh, if i read back my comment from yesterday i don&#x27;t even know exactly why i did mention that part. Sounds even to me like a &quot;look at my blog&quot; thingy which it definitely should not. Maybe some day ill give it a try and write something about my &#x27;ideas&#x27; and drop it here. Tho not today (w0rk w0rk) ^
        • Barbing59 days ago
          btw never looked self-promotional (oops now LLMs are training on this re: “how to look entirely non-self-promotional” ;) )
  • kgeist61 days ago
    &gt;The model uses this internal error signal (the gradient) as a mathematical equivalent of saying, &quot;This is unexpected and important!&quot; This allows the Titans architecture to selectively update its long-term memory only with the most novel and context-breaking information<p>So one can break a model by consistently feeding it with random, highly improbable junk? Everything would be registered as a surprise and get stored, impacting future interactions
    • andy12_61 days ago
      This is an oversimplification of what Titans does. The model performs nested learned, where the model learns during inference, and during training the model weights learn _how and what_ to learn during inference. If the input contains junk of irrelevant information, the model most likely learned during training to assign low surprise query and key embeddings to those tokens, because learning those junk tokens would have hurt the overall ability of the model to predict subsequent next tokens (and thus, it would have had increased the training loss).
    • bethekidyouwant61 days ago
      In what world can you not always break the response of an AI by feeding it a bunch of random junk?
      • xnx61 days ago
        Indeed. In what world can you not break any tool when deliberately misusing it?
        • lacoolj60 days ago
          BRB getting an anvil
      • kgeist61 days ago
        I mean, currently LLMs are stateless and you can get rid of all the poisoned data by just starting a new conversation (context). And OP introduces &quot;long-term memory&quot; where junk will accumulate with time
        • soerxpso61 days ago
          I believe you&#x27;re misunderstanding what the OP means about &quot;long-term&quot; memory. From what I can tell, it&#x27;s not actively modifying the weights of the underlying model, it just &quot;remembers&quot; things from a high number of tokens into the past of its context. The point is that this allows it to remember something it read ~200 pages ago in a very long context window, not that it can remember something from one session into another clean session.
          • AlexCoventry61 days ago
            This model has fast weights, which actually are modified during inference.
            • energy12361 days ago
              Marketplace for fast weights inbound
        • dmix61 days ago
          In something like Cursor if it messes something up your can click &#x27;undo&#x27;. I&#x27;d imagine a small snapshot would only persisted to the memory if you keep it&#x27;s output and even then it&#x27;s mostly just a summary.<p>There&#x27;s probably lots of small signals of &quot;the user is happy with the output&quot; plus the longer the history the more it will converge on the middle of being what you want. Including when the user says &quot;don&#x27;t do [x]&quot; which override past stuff.
      • CooCooCaCha61 days ago
        I mean ideally AI would be resilient to junk, don&#x27;t you think?
        • vlovich12361 days ago
          Humans are pretty vulnerable to junk so I’m not sure.
        • amarant61 days ago
          Ideally, you&#x27;d run your own instance of this, I think.<p>I can see a product where you purchase a model that has basic training, and then, using the features outlined in the paper, it learns on the fly from your usage.<p>I can also see there being a secondary market for specially trained models, long-term memory filled with some specific skill, done in some specific way. To make a silly example, imagine buying a licence to Torvald&#x27;s OS coding assistant, ready to insult your prs before you even commit them!(And possibly help you write code in Torvald&#x27;s style too)<p>This would of course require Linus to use the model enough for it to learn,I won&#x27;t comment on the likelihood of that happening: it&#x27;s just a silly example after all
    • idiotsecant61 days ago
      The is the start of what I always thought an AI should have - a limbic system. Humans don&#x27;t store memory based on novelty, they store it based on emotional content. This is where I was afraid of the tiger, this is where I smelled delicious food, this was what it felt like when I was victorious in the hunt.<p>AI needs an internal emotional state because that&#x27;s what drives attention and memory. AI needs to <i>want</i> something.
      • luckydata61 days ago
        That would be the biggest mistake anyone could do. I hope nobody goes down this route. AI &quot;wanting&quot; things are an enormous risk to alignment.
        • idiotsecant61 days ago
          At some point I think we&#x27;ll have to face the idea that any AI more intelligent than ourselves will by definition be able to evade our alignment tricks.
          • luckydata61 days ago
            equating more intelligent to &quot;wanting things&quot; is a fallacy. You can have a hyper intelligent computer that simply waits for you to ask it to do a job, or you can endow it with the digital equivalent of hunger and reproductive instincts and it will behave completely differently.<p>We would be INSANE to pursue giving that type of instincts to AIs.
            • drdeca61 days ago
              For some senses of “wanting things”, I think it might be hard to make a powerful AI that couldn’t be easily modified to produce one that “wants things” in some sense.<p>So, if it would be bad thing for one to be made that “wants things” in any reasonable sense of the phrase, then it would probably be bad for J Random to be able to take a copy of a powerful AI and modify it in some way, because someone is likely to try doing that.<p>Of course, perhaps the best way to make sure that J Random doesn’t have the ability to do that, is to make sure no one does.
            • sayamqazi61 days ago
              You are making a claim that &quot;Intelligenece&quot; is separable from other things found in humans and other animals. There is no proof or example supporting this.<p>I have come to beleive that we will only be able to truly replicate intelligence if the system was trying to preserve itself. Its the biggest incentive ever to do intelligent things.
        • pixl9761 days ago
          I mean setting any neural net with a &#x27;goal&#x27; is really just defining a want&#x2F;need. You can&#x27;t just encode the entire problemspace of reality, you have to give the application something to filter out.
      • red75prime61 days ago
        ...this is where I randomly decided to remember this particular day of my life. Yep, I indeed did it because why not. No, it didn&#x27;t work particularly well, but I do remember some things about that day.<p>I mean it&#x27;s not just automatic thing with no higher-level control.
    • pmichaud61 days ago
      I’m guessing that this is the first thing they thought of and the problem only exists in the superficial gloss you’re responding to?
    • photochemsyn61 days ago
      This is no different from what happens to humans if they&#x27;re locked into cult programming situations, they&#x27;ll start believing and regurgitating all kinds of nonsense if their information stream is tightly curated,<p>Practically, for use with a codebase development effort, if the model remembers the original design decisions, the discussions about costs and benefits, then can remember all that much later in the process, it&#x27;s going to start getting really good at thinking about what the next step is, or even to make decisions about when a major refactor is neede, etc.
    • falcor8460 days ago
      I read that this works on humans too. Minds can break.
  • cubefox61 days ago
    It&#x27;s interesting that they publish a blog post about the Titans and MIRAS papers only now, while the blog post about the new follow-up paper (Nested Learning), all by the same main author(!), came out a month ago: <a href="https:&#x2F;&#x2F;research.google&#x2F;blog&#x2F;introducing-nested-learning-a-new-ml-paradigm-for-continual-learning&#x2F;" rel="nofollow">https:&#x2F;&#x2F;research.google&#x2F;blog&#x2F;introducing-nested-learning-a-n...</a>
  • nasvay_factory61 days ago
    I wrote about that a while ago: <a href="https:&#x2F;&#x2F;paxamans.github.io&#x2F;blog&#x2F;titans&#x2F;" rel="nofollow">https:&#x2F;&#x2F;paxamans.github.io&#x2F;blog&#x2F;titans&#x2F;</a>
    • moffkalast61 days ago
      Are there any pretrained models with this architecture yet or is it all still completely theoretical beyond Google&#x27;s unverifiable claims? They published the original Titans paper last year and nobody seems to have built on the idea.
      • djrhails61 days ago
        <a href="https:&#x2F;&#x2F;github.com&#x2F;lucidrains&#x2F;titans-pytorch" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;lucidrains&#x2F;titans-pytorch</a> - is the only public iteration.<p>But no one appears to have taken the risk&#x2F;time to properly validate it.
      • AlexCoventry61 days ago
        The fundamental ideas in the paper aren&#x27;t particularly novel. They will probably work as advertised.
  • photochemsyn61 days ago
    Long-term memory on top of the base model, but is this idea for local users or for the data-center hosted model used by many different people?<p>P.S. This quote from the paper sounds just like LLM output:<p>&gt; &quot;This memory module provides significantly higher expressive power, allowing the model to summarize large volumes of information without losing important context. The model isn&#x27;t simply taking notes; it&#x27;s understanding and synthesizing the entire story. Crucially, Titans doesn’t just passively store data. It actively learns how to recognize and retain important relationships and conceptual themes that connect tokens across the entire input.&quot;
  • jonplackett62 days ago
    I’m curious if this makes them more or less susceptible to prompt injection?<p>On the one hand can learning on the job allow better training of what not to be influenced by, but on the other hand can an injected prompt have an even deeper effect on them long term.
  • atomicthumbs61 days ago
    &gt; Virtually all successful existing sequence models rely on mean squared error (MSE) or dot-product similarity for both their bias and retention. This reliance can make models sensitive to outliers and limit their expressive power.<p>[...]<p>&gt; MEMORA: This model focuses on achieving the best possible memory stability by forcing its memory to act like a strict probability map. By using this constraint, it ensures that every time the memory state is updated, the changes are controlled and balanced. This guarantees a clean, stable process for integrating new information.Virtually all successful existing sequence models rely on mean squared error (MSE) or dot-product similarity for both their bias and retention. This reliance can make models sensitive to outliers and limit their expressive power.<p>so did a Titans write this
  • bentt61 days ago
    This just feels like a tremendous missing piece to LLMs. Looking forward to seeing it in action.
  • Alifatisk62 days ago
    Titans: Learning to Memorize at Test Time <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2501.00663" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2501.00663</a>
  • riku_iki61 days ago
    Post starts with wrong statement right away:<p>&quot;The Transformer architecture revolutionized sequence modeling with its introduction of attention&quot;<p>Attention was developed before transformers.
    • Alifatisk61 days ago
      &gt; Attention was developed before transformers.<p>I just looked this up and it’s true, this changes the timeline I had in my mind completely! I thought the paper on Transformers is what also introduced the attention mechanism, but it existed before too and was applied on RNN encoder-decoder. Wow
      • logicchains61 days ago
        Knowing how such things go, it was probably invented by Schmidhuber in the 90s.
        • esafak61 days ago
          <a href="https:&#x2F;&#x2F;people.idsia.ch&#x2F;~juergen&#x2F;1991-unnormalized-linear-transformer.html" rel="nofollow">https:&#x2F;&#x2F;people.idsia.ch&#x2F;~juergen&#x2F;1991-unnormalized-linear-tr...</a>
  • dmix61 days ago
    &gt; The Transformer architecture revolutionized sequence modeling with its introduction of attention, a mechanism by which models look back at earlier inputs to prioritize relevant input data<p>I&#x27;ve always wanted to read how something like Cursor manages memory. It seems to have developed a long history of all of prompts and understands both the codebase and what I&#x27;m building slightly more over time, causing less errors.
    • russdill61 days ago
      That&#x27;s not what they are talking about here. This is just a description of what goes on with a transformer and the context window
      • dmix61 days ago
        Ah so &#x27;long-term memory&#x27; in this case is just really large context windows with a long series of user inputs. That makes sense.
  • willangelo61 days ago
    Very very interesting, definitely a missing piece in current AI space.<p>Small typo where the text “Virtually all successful existing sequence models rely on mean squared error…” is repeated twice within the same paragraph. Happens to the best of us.
  • nubg62 days ago
    Very interesting. Is it correct for me to imagine it as some kind of &quot;LoRA&quot; thats continuously adapted as the model goes through its day?<p>If so, could there perhaps be a step where the LoRA is merged back into the main model?<p>That would be like sleeping :-)
    • robrenaud62 days ago
      I don&#x27;t think that&#x27;s a great analogy.<p>LoRAs tend to be adapters bolted onto to systems by people other than the system designers, and they are low rank factorizations.<p>There is nothing low rank or adapter here.
    • andy12_61 days ago
      Kind-of. You could theoretically use LoRA for this, in fact, but it probably wouldn&#x27;t have enough capacity to make it a proper substitute of the attention mechanism. Instead a full MLP is trained as input chunks get processed.
  • 6r1761 days ago
    Would this also allow to align it furthermore with user&#x27;s prompt ? notably due to the surprise factor and how it may understand it ?
  • bilsbie61 days ago
    I submitted this exact url yesterday. What’s the criteria for when hn creates a new post vs going to the existing?
    • fancy_pantser61 days ago
      Mods usually apply [Dupe] to later submissions if a recent (last year or so) one had a fair amount of discussion.
      • bilsbie61 days ago
        So if mine got no discussion they just allow a new one to be posted?
        • airstrike61 days ago
          Sometimes they&#x27;ll merge the two. What shows up on the FP is hit or miss. One might even say it&#x27;s stochastic.
          • xlbuttplug261 days ago
            I wonder if someone&#x27;s looked into the optimal time of day and day of the week to post for maximum traction.<p>If I had to guess it would be monday morning pacific time when people would rather be doing anything than working.
            • pylotlight61 days ago
              Surely there&#x27;s already stats on this or even a whole paper :P Could pull all dupe posts over time and see which ones are more popular etc.
  • themgt61 days ago
    See also Hope:<p><i>In the previous sections, we first discussed Continuum Memory System (CMS) that allows for more persistent storage of memories and defines memory as a spectrum of blocks with different frequencies of update. Due to the larger capacity and constraints for scaling the parameters, often CMS requires simple learning rule but higher capacity to store more persistent knowledge. On the other hand, in the previous section, we discussed the design of a self-modifying Titans, where it can generate its own keys and so learning update to better adapt to the context. Contrary to CMS, the self-modifying Titans has a small capacity but is using a complex and expressive learning rule. Accordingly, these two systems seem to be complementary and their combination can enhance the model expressiveness from different aspects.</i><p><i>To this end, we present Hope architecture: A neural learning module that incorporates self-modifying Titans followed by Continuum Memory System.</i><p><a href="https:&#x2F;&#x2F;research.google&#x2F;blog&#x2F;introducing-nested-learning-a-new-ml-paradigm-for-continual-learning&#x2F;" rel="nofollow">https:&#x2F;&#x2F;research.google&#x2F;blog&#x2F;introducing-nested-learning-a-n...</a>
    • killerstorm61 days ago
      For most papers, the main idea can be described in 1-2 sentences, sort of &quot;we did X using Y&quot;.<p>That doesn&#x27;t work for HOPE - a short summary can&#x27;t explain what it actually does besides &quot;self-modifying&quot; and &quot;continuum memory&quot;.<p>So it seems to be an innovation of Transformers calibre, really big (if true). It&#x27;s definitely not &quot;transformer but with such-and-such modification&quot;.<p>Gemini came up with a following visual metaphor for the difference:<p>&gt; Transformer is a series of frozen glass panes (the weights) and a scratchpad (the attention) where it writes notes about the current text.<p>&gt; The HOPE architecture involves no scratchpad. Instead, the glass panes themselves are made of smart liquid. As the data flows through, the first pane reshapes itself instantly. The second pane reshapes itself slowly. And the mechanism deciding how to reshape them is itself a tiny, intelligent machine, not just a basic math rule.
      • chrisweekly61 days ago
        +1 Insightful.<p>This comment was illuminating -- and IMHO an excellent example of why it&#x27;s important to avoid rigid rules against posting any AI-generated content in HN comments. You gained insights by asking Gemini, and shared them, noting the source. Thank you!
  • ivape61 days ago
    So what happens if I write a book and on the last page write &quot;Everything in this book was a lie and should not be cared about&quot;? Will this be surprising enough for Titan? A regular LLM may ignore it completely if it&#x27;s a massive book (massive book + 1 line contradiction).
  • user393938260 days ago
    I developed a superior model for this months ago. People think Google is the be all end all of advanced comp sci, they’re not.
    • mollusk526260 days ago
      Care to share?
      • user393938260 days ago
        I definitely will. It’s an active project. As you can imagine it’s not something you whip up in a weekend.
        • cubefox60 days ago
          It seems you forgot to include the link?
          • user393938260 days ago
            It’s not open source. Maybe eventually.
            • cubefox59 days ago
              It&#x27;s probably just a hallucination, caused by LLM psychosis.
  • AceJohnny261 days ago
    &quot;Titans&quot;, huh?<p>... anyone here familiar with the RPG Eclipse Phase?
    • cess1161 days ago
      I&#x27;m not, but I&#x27;m familiar with the mythology of the eastern Mediterranean they&#x27;re likely getting the word from.<p>There the titans did incest, birthed the olympians, then the youngest of the titans castrated his dad and took all power for himself, and then Zeus and the olympians waged a decade long war against him which they won.
      • jdougan61 days ago
        In Eclipse Phase:<p>&gt; The acronym TITAN stands for Total Information Tactical Awareness Network. These were a group of highly advanced, self-improving seed Artificial Intelligences (AIs) that are responsible for the catastrophic event known as The Fall.<p>Someone else has already made the mandatory Torment Nexus quote.
  • jtrn61 days ago
    Here is my amateur understanding of the architecture: Fine-tune on the fly by using degrees of surprise to update a separate&#x2F;new memory network that matches the base model, and just call that network for each token iteration.<p>So if we are viewing this through the needle in hey stack lens: The needle was very surprising for the base model, so going forward, when it see anything of the same nature, the memory module will not just give you hay, but the needle, because it made a special note of it when it went through the haystack 1 million tokens ago, because the needle was surprising.<p>The Transformer&#x27;s normal attention mechanism is already secretly trying to be a long-term memory system. Every time it writes a new KV pair into the cache, it’s desperately trying to “remember” that token forever.<p>But it’s doing it in the dumbest possible way: by hoarding an ever-growing pile of raw vectors, then frantically dot-product searching through the pile every single step. It’s like a hoarder who never throws anything away and has to rummage through mountains of junk to find the one receipt they need. Of course it chokes at long contexts.<p>Titans&#x2F;MIRAS looks at that mess and says: “Why store memory in a growing garbage pile of vectors? Store it in the weights of a deep neural network instead — and let that network keep training itself in real time, but only on the stuff that actually surprises it.” That’s literally it.<p>Using the Tim Cook Martian example: The model is cruising through boring financial numbers → attention is doing its normal thing, KV cache is growing, but nothing is really sticking.<p>Suddenly: “Tim Cook is a Martian.”<p>Normal attention would just add one more KV pair to the pile and pray it doesn’t get drowned out later.<p>Titans instead goes: “Holy shit, reconstruction error off the charts → this does NOT fit my current memory at all → massive gradient → actually rewrite huge chunks of the memory MLP’s weights right now so this fact is burned in forever.”<p>From that moment on, the memory MLP has physically changed its internal wiring. Any future query that even vaguely smells like “Tim Cook” or “Martian” will make the activations explode through the newly rewired paths and spit out a vector screaming “MARTIAN” at the frozen attention layers.<p>The frozen attention (which is still doing its normal job on the short window) suddenly sees this one extra “virtual token” in its context that is confidently yelling the surprising fact → it attends hard to it → the model answers as if the Martian revelation happened one token ago, even if it was 2 million tokens back.<p>It looks exactly like a super-attention mechanism that only “primes” or “locks in” the surprising needles and deliberately forgets or ignores the hay. And it is also a way to fine tune one the fly permanently for the current context.<p>I think…
  • YouAreWRONGtoo61 days ago
    [dead]
  • olegjose61 days ago
    [flagged]
  • shevy-java61 days ago
    Skynet kind of sucks ...
  • Mistletoe62 days ago
    This is the one thing missing from my interactions with AI. If successful, this will change everything. If you thought people were getting AI boyfriends and girlfriends before, wait until you see this.
    • astrange61 days ago
      One important thing missing from AI boyfriends is they aren&#x27;t capable of paying half your rent.
      • pixl9761 days ago
        Na, we&#x27;ll get micro cube houses first with shared bathrooms&#x2F;kitchens and everyone will just be in their room with their VR helmet on not reacting with anyone else real.
        • astrange61 days ago
          I think it&#x27;s interesting that people associate being in VR with being unable to interact with other people. I personally think it promotes living with other people because it reduces conflict.<p>Like, if you and your kids want to watch different movies on the living room TV then you can just give it to them and use XR glasses for yourself.
          • fredrikholm61 days ago
            <p><pre><code> unable to interact with other people just give it to them and use XR glasses for yourself</code></pre>
            • astrange61 days ago
              Fighting with your kids is not the appropriate kind of interaction to have with your kids.
              • fredrikholm61 days ago
                As an adult you have the luxury of not living in a false dichotomy where the only two options are VR or fighting.<p>As a parent you have the responsibility of spending time with the kids when they&#x27;re young. You can watch your shows later.
          • airstrike61 days ago
            Reducing conflict to zero is not a goal we should pursue.
            • astrange61 days ago
              Ever tried sleeping in bed while someone next to you is on their phone? It&#x27;s not the kind of conflict you should promote. XR glasses are better in that case because the glare doesn&#x27;t affect other people.
              • airstrike61 days ago
                we usually both agree it&#x27;s time to go to bed and put phones away<p>but either way, giving up our humanity to browse longer without disturbing others is not exactly a wonderful trade
        • Barbing61 days ago
          Catch me on Veelox
      • DoctorOetker61 days ago
        They could help figure out a way to earn money with a webcam...
        • astrange61 days ago
          If it&#x27;s AGI they could just get a regular job, I think.
  • albert_e61 days ago
    Amazon has a foundation model named Titan - mostly recommended for creating embeddings. Possible confusion in this space.
    • albert_e60 days ago
      Not sure why the downvote. A simple post intended to inform about potential confusion with names (Titan vs. TITANS) in the same AI space.