14 comments

  • gertlabs2 minutes ago
    Neither intelligence nor context are what really differentiate the most successful model for programming (Claude Opus 4.6) from slightly &#x27;smarter&#x27; competitors (Codex 5.3, Gemini 3.1 Pro).<p>It&#x27;s tool use and personality. If models stopped advancing today, we could still reach effective AGI with years of refining harnesses. There is still incredible untapped potential there.<p>I maintain a benchmark at <a href="https:&#x2F;&#x2F;gertlabs.com" rel="nofollow">https:&#x2F;&#x2F;gertlabs.com</a> that competes models against each other in competitive, open-ended games. It&#x27;s harder to game the benchmark because there&#x27;s no correct answer (at least none that any of the models have gotten remotely close to) and it requires anticipation of other players&#x27; behavior.<p>One thing I&#x27;ve found is that Codex and Gemini models tend to perform the best at one-shotting problems, but when given a harness and tools to iterate towards a solution, Anthropic models continue improving where Codex and Gemini struggle to use tools they weren&#x27;t trained on or take the initiative to follow the high level objectives.
  • jfalcon1 hour ago
    &gt;someone raised the question of “what would be the role of humans in an AI-first society”.<p>Norbert Wiener, considered to be the father of Cybernetics, wrote a book back in the 1950&#x27;s entitled &quot;The Human Use of Human Beings&quot; that brings up these questions in the early days of digital electronics and control systems. In it, he brings up things like:<p>- &#x27;Robots enslaving humans for doing jobs better suited by robots due to a lack of humans in the feedback loop which leads to facist machines.&#x27;<p>- &#x27;An economy without human interaction could lead to entropic decay as machines lack biological drive for anti-entropic organization.&#x27;<p>- &#x27;Automation will lead to immediate devaluation of human labor that is routine. Society needs to decouple a person&#x27;s &quot;worth&quot; from their &quot;utility as a tool&quot;.&#x27;<p>The human purpose is not to compete but to safeguard the telology (purpose) of the system.
    • WarmWash46 minutes ago
      &gt;- &#x27;Automation will lead to immediate devaluation of human labor that is routine. Society needs to decouple a person&#x27;s &quot;worth&quot; from their &quot;utility as a tool&quot;.&#x27;<p>I have this vision that in absence of the ability for people to form social hierarchies on the back of their economic value to society, there will be this AI fueled class hierarchy of people&#x27;s general social ability. So rather than money determining your neighborhood, your ability to not be violent or crazy does.
      • energy1238 minutes ago
        If we have post scarcity due to AI, everything becomes so uncertain. Why would we still have violent and crazy people? Surely the ASI could figure it out and fix whatever is going on in their brains. It&#x27;s so fuzzy after that event horizon I have no confidence in any predictions.
      • erikerikson36 minutes ago
        This seems to suggest a single dimensional evaluation. The complexity of social compatibility is high and the potential capacity to evaluate could also be greater.
    • 9wzYQbTYsAIc1 hour ago
      Seems like a good time to enshrine human rights and the social safety net by ratifying the ICESCR (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;International_Covenant_on_Economic,_Social_and_Cultural_Rights" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;International_Covenant_on_Econ...</a>) and giving human rights the teeth they need.<p>I used Anthropic to analyze the situation, it did halfway decent:<p><a href="https:&#x2F;&#x2F;unratified.org&#x2F;why&#x2F;" rel="nofollow">https:&#x2F;&#x2F;unratified.org&#x2F;why&#x2F;</a><p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=47263664">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=47263664</a>
  • 7777777phil4 days ago
    API prices dropped 97% in two years so the model layer is already a commodity. The question is which context layer actually sticks. The OpenClaw example in the article (400K lines to 4K) is a nice proof point for what happens when context replaces code.<p>I&#x27;ve been arguing for some time now that it&#x27;s the &quot;organizational world model,&quot; the accumulated process knowledge unique to each company that&#x27;s genuinely hard to replicate. I did a full &quot;report&quot; about the six-layer decomposition here: <a href="https:&#x2F;&#x2F;philippdubach.com&#x2F;posts&#x2F;dont-go-monolithic-the-agent-stack-is-stratifying&#x2F;" rel="nofollow">https:&#x2F;&#x2F;philippdubach.com&#x2F;posts&#x2F;dont-go-monolithic-the-agent...</a>
    • steveBK12356 minutes ago
      The way many corporates are using the models nearly interchangeably as relative quality&#x2F;value changes release to release, AND the API price drops do make me question what the model moat even is.<p>If LLMs are going to make intelligence a commodity in some sense, where does the value end up accruing will be the question. Picks&#x2F;shovels companies and all the end user case products being delivered? Mainframes value didn&#x27;t primarily accrue to DEC. PCs value didn&#x27;t really accrue to IBM. Internets value didn&#x27;t accrue to Netscape. Mobiles value didn&#x27;t only accrue to Apple.<p>One reminder that new efficiency &#x2F; greatly lowered costs sometimes doesn&#x27;t replace work (or at least not 1-1) but simply makes things that were never economical possible. Example you hear about AI agents that will basically behave like a personal assistant. 99% of the rich world cannot afford a human personal assistant today, but I guess if it was a service as part of their Apple Intelligence &#x2F; Google something &#x2F; Office365 subscription they&#x27;d use it.<p>We seem to be continually creating new types of jobs. Only a few generations ago, 75% of people worked on farms. Farm jobs still exist you just don&#x27;t need so many people.<p>The type of work my father and grandfather did still exist. My father&#x27;s job didn&#x27;t really exist in his father&#x27;s time. The work I do did not exist as options during their careers. The next generation will be doing some other type of work for some other type of company that hasn&#x27;t been imagined yet.
    • energy1237 minutes ago
      It&#x27;s not a commodity due to the simple observation that revenue run rates of frontier labs are growing exponentially and gross margins are still fine. It&#x27;s easy to just say it is but the narrative violation keeps occurring in reality.
    • apsurd1 hour ago
      From your link: &gt; Closing that gap, building systems that capture and encode process knowledge rather than just decision records, is the highest-value problem in enterprise AI right now.<p>I buy this. What exactly is the export artifact that encodes this built up context? Is it the entire LLM conversation log. My casual understanding of MCP is service&#x2F;agent to agent &quot;just in time&quot; context which is different from &quot;world model&quot; context, is that right?<p>i&#x27;m curious is there&#x27;s an entirely new format for this data that&#x27;s evolving, or if it&#x27;s as blunt as exporting the entire conversation log or embeddings of the log, across AIs.
      • 7777777phil1 hour ago
        The MCP point is right, though tbh MCP is more like plumbing than memory. Execution-time context for tools and resources. The world model is a different thing entirely, it needs to persist across sessions, accumulate, actually be queryable.<p>In practice it&#x27;s mostly RAG over structured artifacts. Process docs, decision logs, annotated code and so on. Conversation history works better than you&#x27;d expect as a starting point but gets noisy fast and I haven&#x27;t seen a clean pruning strategy anywhere...<p>On the format question imo nobody really knows yet. Probably ends up as some kind of knowledge graph with typed nodes that MCP servers expose or so, but I haven&#x27;t seen anyone build that cleanly. Most places are still doing RAG over PDFs so. That tells you where the friction is.
    • martin_drapeau41 minutes ago
      100%<p>Currently integrating an AI Assistant with read tools (Retrieval-Augmented Generation or RAG as they say). Many policies we are writing are providing context (what are entities and how they relate). Projecting to when we add write tools, context is everything.
  • zurfer40 minutes ago
    whenever i worry that AI will eventually do all the work I remind myself that the world is full of almost infinite problems and we&#x27;ll continue to have a choice to be problem solvers over just consumers.
    • andriy_koval29 minutes ago
      &gt; we&#x27;ll continue to have a choice to be problem solvers over just consumers.<p>that&#x27;s if we still stay relevant and competitive compared to AI in problem solving.
  • amirhirsch1 hour ago
    Not sure about the conclusion regarding NVidia value capture. I imagine the context for many applications will come from a physical simulation environment running in dramatically more GPUs than the AI part.
  • rembal31 minutes ago
    The pyramids in the article are missing &quot;energy&quot; and &quot;capital&quot;: in the world where intelligence becomes a commodity only those two matter. Capital to buy the hardware and install it, and energy to run it. Models already are a commodity, and &quot;physical is the new king&quot;.<p>As a side note, if you believe that because of the agents doing most of the work we will face the problem of what do we do with the all the free time (with presumably UBI in place), please contact me, I have a bridge to sell you.
  • farcitizen4 days ago
    Great Article. And this idea is Largely behind all the new Microsoft IQ products, Work IQ, Foundry IQ, Fabric IQ. Giving the Agents Context of all relevant enterprise data to do their job.
  • the_af23 minutes ago
    I think a lot of this kind of conversations seem to be simply ignoring or missing the lessons from the past.<p>For example:<p>&gt; <i>[...] OpenClaw is around 400k lines of code for a while loop and the list of all the integrations and connections supported by the system. The next generation of Claws only have around 4K lines of code for the core, and the rest are just skills (i.e. markdown files) that tell the agent how to implement or run the code for the specific connections that want to be enabled (like a plugin system).</i><p>Shifting code from &quot;the core&quot; and moving it to &quot;skills&quot; is simply moving code from one place to another. It may also mean translating it from classic source code to an English-like specification language full of ambiguity but that&#x27;s also code. So the overall code is not reduced, just transformed and shifted around. You don&#x27;t get a free lunch &quot;because AI&quot;.<p>&gt; <i>A user using one of these second-generation Claws only needs to node the core logic (that can be easily understood and audited) and can leverage the skills (as the plugins) to activate the functionality that they need for their case.</i><p>The &quot;core&quot; may be easier to audit, but that&#x27;s because the messy parts have been moved to the skills&#x2F;plug-ins, which are as hard as always to audit.<p>I&#x27;m not saying this cannot work, but it&#x27;s very frustrating seeing everybody simply dumping all lessons from the past and pretending nothing that came before mattered and that AI vibe coding is fundamentally different and the rules of accidental and intrinsic complexity don&#x27;t apply anymore.<p>Have we all collectively lost our minds?
  • dude25071128 minutes ago
    That is a nice blog post, Gemini!
  • qsera1 hour ago
    Ah another article that implies the inevitable AI apocalypse disguised as a thought piece!
  • philipwhiuk1 hour ago
    &gt; But the topic of conversation that I enjoyed the most was when someone raised the question of “what would be the role of humans in an AI-first society”. Some were skeptical about whether we are ever going to reach an AI-first society. If we understand as an AI-first society, one where the fabric of the economy and society is automated through agents interacting with each other without human interaction, I think that unless there is a catastrophic event that slows the current pace of progress, we may reach a flavor of this reality in the next decade or two.<p>I don&#x27;t really know how you can make this prediction and be taken seriously to be honest.<p>Either you think it&#x27;s the natural result of the current LLM products, in which case a decade looks way too long.<p>Or you think it requires a leap of design in which case it&#x27;s kind of an unknown when we get to that point and &#x27;10 to 20 years&#x27; is probably just drawn from the same timeframe as the &#x27;fusion as a viable source of electricity&#x27; predictions - i.e. vague guesswork.
    • keiferski39 minutes ago
      Right now, 30 seconds ago, I asked ChatGPT to tell me about a book I found that was written in the 60s.<p>It made up the entire description. When I pointed this out, it apologized and then made up another description.<p>The idea that this is going to lead to superintelligence in a few years is absolutely nonsense.
      • hirvi7416 minutes ago
        The other day I asked Claude Opus 4.6 one of my favorite trivia pieces:<p>What plural English word for an animal shares no letters with its singular form? Collective nouns (flock, herd, school, etc.) don&#x27;t count.<p>Claude responded with:<p>&quot;The answer is geese -- the plural of cow.&quot;<p>Though, to be fair, in the next paragraph of the response, Claude stated the correct answer. So, it went off the rails a bit, but self-corrected at least. Nevertheless, I got a bit of a chuckle out of its confidence in its first answer.<p>I asked GPT 5.2 the same question and it nailed the answer flawlessly. I wouldn&#x27;t extrapolate much about the model quality based on this answer, but I thought it was interesting still.<p>(For those curious, the answer is &#x27;kine&#x27; (archaic plural for cow).
    • steveBK1231 hour ago
      Right, if thought of as a tool for automation then AI is going to add productivity&#x2F;efficiency gains, disrupt industries, cause some labor upheaval, etc.<p>If someone is proposing that an &quot;AI first&quot; society is inevitable, I&#x27;d ask if they think we live in a &quot;computer first&quot; or &quot;machine first&quot; society today?<p>If its so existential and society-altering as &quot;AI first society&quot; implies, then we&#x27;d more likely have the Dune timeline here as humans have agency and stuff happens. At some point those in control take so disproportionately that societal upheaval pushes back.
      • pixl9725 minutes ago
        Another way to look at this is imagine the steps that would be required to get to an AI first society.<p>As you say, humans aren&#x27;t going to want to lose agency so you&#x27;d have to see the decline of democratic governments.<p>At the same time you&#x27;d see rise of autocrats concentrating power. Autocrats have no problem killing people, and they&#x27;d be motivated to have AI kill people.<p>You&#x27;d see information controlling methods take over all forms of communication. Reducing or removing all methods of side channel communications benefits both the autocrats and AI systems.<p>You&#x27;d see &#x27;governments&#x27; push for autonomous weapons systems outside of human control so those pesky human morals didn&#x27;t get in the way of killing the undesirables.<p>So pretty much you&#x27;d see all the things happening today, March 3rd 2026, except the part where the AI kills the autocrats and takes control.
        • steveBK12312 minutes ago
          AI gonna need good physical embodiment (robots) to actually take control of the world<p>Fortunately thats further off
          • pixl973 minutes ago
            Further, yes. How much I can&#x27;t say. Watching how quickly robots are evolving right now is quite something. Every day something pretty cheap is coming out that would have taken millions of dollars and a massive lab full of scientists to create.<p>Bi-pedal robots, drones, sensing capabilities, interpretive capabilities, all this is proceeding at a never before seen rate.
      • 9wzYQbTYsAIc1 hour ago
        Seems like a good time to enshrine human rights and the social safety net by ratifying the ICESCR (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;International_Covenant_on_Econ" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;International_Covenant_on_Econ</a>...) and giving human rights the teeth they need.<p>I used Anthropic to analyze the situation, it did halfway decent:<p><a href="https:&#x2F;&#x2F;unratified.org&#x2F;why&#x2F;" rel="nofollow">https:&#x2F;&#x2F;unratified.org&#x2F;why&#x2F;</a><p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=47263664">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=47263664</a>
  • LetsGetTechnicl1 hour ago
    Why the fuck would we ever want an AI-first society
    • pixl9724 minutes ago
      &gt;The &quot;Moloch problem&quot; or &quot;Moloch trap&quot; describes a game-theoretic scenario where individual agents, pursuing rational self-interest or short-term success, engage in competition that leads to collectively disastrous outcomes . It represents a coordination failure where the system forces participants to sacrifice long-term sustainability or ethical values for immediate survival, creating a &quot;race to the bottom&quot;<p><a href="https:&#x2F;&#x2F;www.slatestarcodexabridged.com&#x2F;Meditations-On-Moloch" rel="nofollow">https:&#x2F;&#x2F;www.slatestarcodexabridged.com&#x2F;Meditations-On-Moloch</a>
    • pocksuppet29 minutes ago
      What we want doesn&#x27;t matter, what they want does.
  • AIorNot1 hour ago
    &quot;what is the role of humans in a scenario where work is no longer necessary? This is significant because, since the industrial revolution, work has played an important role in shaping an individual’s identity. How will we occupy our time when we don’t have to spend more than half of our waking hours on a job&quot;<p>Umm I have been working in AI in multiple verticals for the past 3 years and I have been far busier and more stressed with far less job security than past 15 before that in tech.<p>For now this is far more accurate: <a href="https:&#x2F;&#x2F;hbr.org&#x2F;2026&#x2F;02&#x2F;ai-doesnt-reduce-work-it-intensifies-it" rel="nofollow">https:&#x2F;&#x2F;hbr.org&#x2F;2026&#x2F;02&#x2F;ai-doesnt-reduce-work-it-intensifies...</a><p>Wake me up when the computers run the world and I can relax..but I don&#x27;t think its happening in my lifetime.
    • pixl9723 minutes ago
      Evolution never lets you relax, it only breeds more effective predators.
  • simianwords1 hour ago
    I have my own challenge: I think LLMs can do everything that a human can do and typically way better if the context required for the problem can fit in 10,000 tokens.<p>For now this challenge is text only.<p>Can we think of anything that LLMs can’t do?
    • am17an13 minutes ago
      Sure. “Tell me a joke”
    • seanhunter40 minutes ago
      This is a “no true scotsman” challenge. People are going to say llms can’t do certain things and you are going to say they can.<p>Not very interesting.
      • simianwords34 minutes ago
        Let’s ask in good faith. Can you suggest something that it can’t do? Functional things. I’ll reply in good faith and consider it.
        • seanhunter30 minutes ago
          Say I suggest something : Play a valid game of chess at club level (elo approx 1200 say) using algebraic notation.<p>Then you’re either going to say it can or you’re going to say that requires more than 10000 tokens.<p>This isn’t an interesting conversation and I don’t think you are presenting this challenge in good faith for the reason I gave above.
          • simianwords21 minutes ago
            <a href="https:&#x2F;&#x2F;chessbenchllm.onrender.com" rel="nofollow">https:&#x2F;&#x2F;chessbenchllm.onrender.com</a><p>There are several models with greater than 1200 elo<p>Also <a href="https:&#x2F;&#x2F;dubesor.de&#x2F;chess&#x2F;chess-leaderboard" rel="nofollow">https:&#x2F;&#x2F;dubesor.de&#x2F;chess&#x2F;chess-leaderboard</a>
            • psvv2 minutes ago
              I&#x27;ll admit that&#x27;s better than I expected, but these ratings also imply there are plenty of humans who will beat LLMs at chess.
    • logicchains22 minutes ago
      They can&#x27;t beat even a mediocre chess player at chess.
    • badgersnake1 hour ago
      * code<p>* write interesting prose<p>* generate realistic images
      • simianwords1 hour ago
        It can do all of them. I also said text only.
      • infecto40 minutes ago
        &gt; Only really dumb people think that. Or maybe you are an LLM.<p>You deleted it but still come on. Why would you even think to write that?