68 comments

  • I&#x27;m not sure you need a &quot;DeepSeek native coding agent&quot; to take advantage of DeepSeeks cache, yesterday as the Codex quota usage issue still wasn&#x27;t solved for me, I wrote a tiny little bridge so I could use DeepSeek V4 Pro via Codex, and seems most of everything I did was basically cached as far as I can tell: <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;7eKn6wN.png" rel="nofollow">https:&#x2F;&#x2F;i.imgur.com&#x2F;7eKn6wN.png</a> (2026-05-23 Input (Cache hit): 39,123,200 tokens, Input (Cache miss) 1,692,286), and the bridge is doing not special, just massage the DeepSeek API shape into what Codex expects, nothing particular about caching at all.<p>Besides being even better at the caching, I&#x27;m not sure what benefits you&#x27;d get compared to just firing up OpenCode with the DeepSeek API yourself, it&#x27;ll similarly do caching for sure and also &quot;talks directly to api.deepseek.com&quot; if that matters, and you&#x27;ll get a much more mature harness.
    • kiproping21 hours ago
      This would be a better page to link to <a href="https:&#x2F;&#x2F;github.com&#x2F;esengine&#x2F;DeepSeek-Reasonix&#x2F;blob&#x2F;main&#x2F;docs&#x2F;ARCHITECTURE.md#pillar-1--cache-first-loop" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;esengine&#x2F;DeepSeek-Reasonix&#x2F;blob&#x2F;main&#x2F;docs...</a><p>They explain some of the the reasons why they have a better solution and why they are very opinionated<p>&gt;Automatic prefix caching activates only when the exact byte prefix of the previous request matches. Most agent loops reorder, rewrite, or inject fresh timestamps each turn — cache hit rate in practice: &lt;20%.<p>So they optimize on this plus other techniques to improve cache hits, making it cheaper.
      • sparkleMing6 hours ago
        The last time I heard about something like this, it was Claude Code intentionally injecting random strings to break caching when you&#x27;re not using a Claude model. Aside from that kind of intentional sabotage, I don&#x27;t think any coding agent would just ignore prefix caching.
      • krackers13 hours ago
        &gt;Most agent loops reorder, rewrite, or inject fresh timestamps each turn<p>That&#x27;s really surprising, since it&#x27;d defeat the whole point of KV caching. I mean I buy it considering how sloppily coded the harnesses seem to be, but this like obvious low hanging fruit.<p>I&#x27;ve also often wondered why LLMs aren&#x27;t trained with a format of having a dedicated contextual system-instruction role at the _end_, which you could use to put context like current time or other misc stuff.
        • jeremyjh16 minutes ago
          Its not surprising, that doc is full of AI slop.
      • embedding-shape4 hours ago
        &gt; Most agent loops reorder, rewrite, or inject fresh timestamps each turn<p>I haven&#x27;t seen that, it&#x27;d be crazy slow if they did this. What &quot;agent loops&quot; are they talking about here specifically? The vagueness makes it sound potentially made up.
      • vidarh7 hours ago
        I&#x27;ve never seen an agent loop &quot;reorder, rewrite, or inject fresh timestamps&quot; each turn other than mostly towards the <i>end</i> of the messages. Messing with a large part of the context every turn would be a fairly crazy thing to do.
        • nawitus2 hours ago
          Yeah. Those claims are just some random AI slop from claude..
          • vidarh1 hour ago
            It&#x27;s a really lazy one too - there are so many open source harnesses, including e.g. Codex and Kimi-CLI, and of course the leaked Claude Code source, so it&#x27;s trivial to verify if someone even just bothered to ask an agent to check actual source code examples.
    • 3uler23 hours ago
      Opencode has really bad cache stability issues that they seem uninterested in fixing at the moment.
      • dathery22 hours ago
        The OpenCode devs talk about this on Twitter a lot, e.g. <a href="https:&#x2F;&#x2F;xcancel.com&#x2F;thdxr&#x2F;status&#x2F;2048268697790300343" rel="nofollow">https:&#x2F;&#x2F;xcancel.com&#x2F;thdxr&#x2F;status&#x2F;2048268697790300343</a><p>&gt; tool call pruning breaks cache and people will tell you this is horrible and expensive<p>&gt; except i looked at some anthropic data and real user behavior ends up with better cache hits and 30% less spend<p>&gt; even this is needs to be analyzed further, it&#x27;s just not simple<p>&gt; for openai data it&#x27;s inverted! cache hit ratio is actually better [sic: I think he meant worse based on the screenshot] with tool call pruning turned on<p>&gt; but the net $ saved is only 5%<p>&gt; kimi is a funny one - it has better cache hits with pruning on...but is also more expensive!<p>There was also another thread recently where he discussed that pruning improves user experience (models are smarter with less context) but I can&#x27;t find it.<p>This can also be disabled in the config: <a href="https:&#x2F;&#x2F;opencode.ai&#x2F;docs&#x2F;config&#x2F;#compaction" rel="nofollow">https:&#x2F;&#x2F;opencode.ai&#x2F;docs&#x2F;config&#x2F;#compaction</a>
        • soerxpso19 hours ago
          My understanding of caching with most models&#x2F;providers is that a prefix substring of the context has to be reused for a cache hit, but not necessarily the whole entire context window. So if you prune tool calls from the history, you&#x27;re going to get one cache miss on the newly-pruned history, and then you&#x27;re going to be getting cache hits on every subsequent turn, with a lower number of input tokens. If you prune subsequent tool calls after that, you would still get a cache hit for the already-pruned portion of the context, just not the full context.
          • __natty__19 hours ago
            So it makes sense to first send stable prompt, reasoning and files content, tool calls summary and actual tool calls at the very end?
            • leemoore15 hours ago
              The way you do this (and the way opencode does it) is you do most of your pruning in more recent history. Last I looked at opencode, they start pruning tool call results after 2 full agentic turns. So you probably dont get quite as good hits on cache for the most recent 1-5% of your turns, but after that everything else caches fine and those tool calls that likely aren&#x27;t relavent to your session anymore are gone.
        • awoimbee7 hours ago
          You didn&#x27;t quote the interesting part:<p>&gt; our implementation is it only prunes calls from &gt; 3 user messages ago, if context is &gt; 40K, and only if there&#x27;s at least 20K tokens to be removed<p>Seems reasonable to me and explains why I can have long sessions (way longer than with zed agents) while still hitting cache. Opencode is just missing per-provider TTL.
          • arthurcolle6 hours ago
            I found that keeping current context utilization at 18% of total context length was best for minimizing spend, across all models with 400k context length or more
        • hirako200019 hours ago
          They are. Empirical evidence on my side. Because attention is sparse across the context. It&#x27;s not truly treating a million token the way it treats a fraction of that count. For performance.
      • huqedato22 hours ago
        I can&#x27;t confirm this. Having utilized Opencode for a large project over the past 10 months, with multiple models and agents, we&#x27;ve never run into such &#x27;cache stability issues&#x27;.&quot;
      • embedding-shape23 hours ago
        That&#x27;d be really easy to spot and also fix, most likely. Any open issue you could point us to, must surely been reported already?
        • nolok22 hours ago
          &gt; That&#x27;d be really easy to spot and also fix, most likely<p>Ah, reminds me of good old &quot;There are only 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.&quot;
          • criemen22 hours ago
            &gt; Ah, reminds me of good old &quot;There are only 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.&quot;<p>You quip, but LLM KV caching (from the harness side) is quite easy: You get a cache hit on stable prompt prefixes, period. That means you want to keep the prefix stable, and only append at the end of the conversation. Made up example: Don&#x27;t put the git branch name into the system prompt part (that comes first), as whenever the branch name changes, that&#x27;d trigger a cache invalidation of the entire prompt.<p>Getting this right requires some care to not by accident modify the prefix, basically, and some design on communicating the things that can change (user configuration, working dir, git information, ...).
            • franknord2321 hours ago
              That sounds like the experience of writing Containerfiles; since steps are cached you want to pull the thing you are iterating on as far down as possible.
              • gopher_space19 hours ago
                All of this work has been done before in different contexts. Memory management with bigger blocks and weaker definitions that change whenever some grad student gets a bright idea.
                • vidarh7 hours ago
                  100%. Since you mention memory management: Generational GC is pretty much the same idea: Keep the stuff that&#x27;s least likely to change an important property (liveness) together.<p>Conceptually the underlying general idea is to sort things based on stability if you can avoid recomputing properties of the stable part.
            • xcjsam22 hours ago
              [dead]
        • 3uler4 hours ago
          <a href="https:&#x2F;&#x2F;github.com&#x2F;anomalyco&#x2F;opencode&#x2F;pull&#x2F;14743" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;anomalyco&#x2F;opencode&#x2F;pull&#x2F;14743</a>
        • krzyk22 hours ago
          Opencode (and other coding agents) have hundreds of open issues reported. It is quite discouraging when they are not being closed&#x2F;fixed.
      • magicalhippo9 hours ago
        What I noticed when using OpenCode with llama.cpp, was that the default host RAM prompt cache size in llama.cpp was way too small for say 128k Qwen3.6 27B.<p>The default is just 8GB and a full 128k context for the dense model can take most of that. So then comes an agent and causes eviction and subsequent cache miss.<p>Bumped the cache size (--cram IIRC) up to 48GB and had much better results.
      • estebarb15 hours ago
        I&#x27;m not sure that is really the case, or relevant in practice. I have been using OpenCode with DeepSeek lately (regular coding). For instance, today I got 120 million input tokens hitting cache, vs just 2.59million missing cache.
        • ctxc11 hours ago
          Reads like a LOT of tokens to me. What does your usage &#x2F;workflow look like? I&#x27;m v curious because although I do use Claude code, my token counts aren&#x27;t nearly as much<p>I want to know if I&#x27;m missing something cool!
          • mordae6 hours ago
            Not OP, but I routinely load 150k tokens into context. A full sub-package to work on, select other files in the monorepo, e.g. front-end visualization and back-end data loader. Then work some 150k tokens, then start again.<p>At the end, cache hit rate is like 99.5% if Novita is not having issues.<p>For official DeepSeek API, 99.9% or something.<p>Custom harness that never compacts or otherwise doctors the history.
            • ctxc2 hours ago
              Those numbers make sense to me...120 million input tokens is like 120 sessions of hitting the full context limit, which seems like a lot to me though
      • metalspot20 hours ago
        I am getting 98.6% cache hit ratio on deepseek-v4-flash with opencode
        • bobkb19 hours ago
          That’s impressive!<p>On the sheer performance it’s comparable to Opus ?
          • stavros16 hours ago
            Here are my stats (from DeepSeek directly, with a script I wrote). The prices are what equivalent Sonnet usage would have cost, the actual amount I paid was $10. On performance, DeepSeek V4 Pro is comparable to Sonnet for me.<p><pre><code> .&#x2F;cost.py amount-2026-5.csv 0.3 3.75 15 input_cache_hit_tokens: 472,971,520 tokens -&gt; $141.8915 input_cache_miss_tokens: 13,299,013 tokens -&gt; $49.8713 output_tokens: 3,334,962 tokens -&gt; $50.0244 cache hit rate: 97.27% (472,971,520&#x2F;486,270,533) cache miss rate: 2.73% (13,299,013&#x2F;486,270,533) total: $241.7872 </code></pre> All of this usage was with an OpenCode subagent exclusively.
          • c0rruptbytes12 hours ago
            [flagged]
        • upcoming-sesame19 hours ago
          out of curiosity, how do you measure cache hit rate in opencode ?
          • malikNF18 hours ago
            opencode stats
            • lugu16 hours ago
              So the calculation is:<p>Total input token = input + cache read + cache write Cache hit rate = cache read &#x2F; total input token.<p>That is 71% in my very limited use of opencode.
            • hackernows_test16 hours ago
              The first
      • Bombthecat23 hours ago
        [flagged]
    • tontinton20 hours ago
      Yep exactly my thoughts, went and looked at the code for the deepseek provider in my coding agent. and basically all of what the author wrote there is implemented... <a href="http:&#x2F;&#x2F;github.com&#x2F;tontinton&#x2F;maki" rel="nofollow">http:&#x2F;&#x2F;github.com&#x2F;tontinton&#x2F;maki</a> for the curios
    • bwfan1231 day ago
      &gt; I wrote a tiny little bridge so I could use DeepSeek V4 Pro via Codex<p>Can you share the bridge. DeepSeek v4 is awesome paired with claude-code or opencode. I found that claude code costs me less than opencode and I am presuming this is due to a better engineered harness.
      • Sure, keep in mind it&#x27;s a steaming pile of hacked together hacks, probably won&#x27;t work in every case, doesn&#x27;t support every feature that should be supported (like parallel tool calling, both Codex + DeepSeek API support it), and it might make your computer catch on fire: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;embedding-shapes&#x2F;eab3e63e5a95d3d78a270472e1be0c9e" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;embedding-shapes&#x2F;eab3e63e5a95d3d78a2...</a><p>I only used it for a few hours to play around with stuff before the quota issue was fixed and I could resume using GPT models, and the bridge was coded by DeepSeek-V4-Flash-IQ2XXS + DwarfStar4 locally, I take no responsibility for what might happen with your computer or you, during usage or just reading the code.<p>Edit: heh, like don&#x27;t look at line 117 for example where seemingly it likes to handle misspellings in the .env file which totally wasn&#x27;t my fault for typo&#x27;ing the API key in that file... I&#x27;m sure there are tons of sharp edges and dumb stuff in there.
      • spacedcowboy4 hours ago
        I don&#x27;t think DeepSeek v4 Flash is as good as Claude for relatively complex tasks. I ran with DeepSeek for a week, giving it the same sort of tasks that Claude normally does, and then ran Claude and asked it to continue. It found a whole bunch of things that had been &quot;overlooked&quot; by DeepSeek, and spent some time fixing them before wanting to move on.<p>DeepSeek is good, Claude is better, at least IMHO. Deepseek is a lot cheaper though :)
      • bayesianbot21 hours ago
        LiteLLM can serve OpenAI API endpoint IIRC and proxy that to other providers like DeepSeek, should work with Codex
      • Den_VR23 hours ago
        I’m feeling more a novice every day, but how isn’t this just handing over your code to team deepseek for whatever they might want
        • embedding-shape23 hours ago
          Not everyone is working with state secrets or user personal data (or even more closely guarded, company secrets) on a daily basis, most of what I hack on is either FOSS already, or will be, not much to keep secret here.<p>Obviously, if you do deal with any sort of secrets, then using local LLMs over OpenAI, Anthropic, DeepSeek or whoever is obviously preferred, and in the case of personal data of users, probably a requirement.
          • jack_pp22 hours ago
            either this or you work on software that even if copied won&#x27;t get you far since the business relies on network effects or pure networking.<p>Getting the source code of facebook or instagram doesn&#x27;t mean you could compete with them.<p>I work for a company that has built relationship with event organizers over the past 10 years. The code I maintain could be written from scratch in maybe 2-3 months even though it was built over the past 10 years but besides that you have frontend &#x2F; DB &#x2F; hardware &#x2F; logistics etc
            • Demiurge20 hours ago
              I actually agree with you, for the most part. The code I work with actually does contain some valuable algorithms, but Im pretty sure the effort of integrating them into a larger system is pointless without the data. It’s almost like stealing half-life 2 source code without any assets.<p>Still, “Getting the source code of facebook or instagram doesn&#x27;t mean you could compete with them.” I think to giants like that, having access to their source code could open up some very interesting loop holes for manipulating the ranking algorithms, or even security vulnerabilities.
              • jack_pp20 hours ago
                True, haven&#x27;t thought of that. However very few actual projects &#x2F; companies are in a situation where the chinese GOVT would be interested to spend resources to hack your platform. For the ones that are afraid of that there&#x27;s always self hosting of course
                • Den_VR2 hours ago
                  I used to work with HVAC companies, and I noticed that many of their customers mistakenly believed they were purchasing air conditioners. They didn’t consider these devices, which they connected to the internet, as computers. Despite being systems that required user names, passwords, updates, monitoring, and other maintenance, the prevailing attitude among these customers was, “This is an appliance, and why would anyone care about my air conditioner?”<p>All this to say, not even subject matter experts necessarily appreciate the risk involved in their work
        • oldmanhorton23 hours ago
          You’re not a novice, there are a lot of us who know exactly what we are doing and see this as a huge downside. We are just being told to go faster, faster, faster lest we miss out on… something?
        • spacedcowboy4 hours ago
          Somehow I don&#x27;t think DeepSeek will be <i>that</i> interested in a 6502 compiler [1]...<p>1: <a href="https:&#x2F;&#x2F;atari-xt.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;atari-xt.com&#x2F;</a>
        • dudisubekti12 hours ago
          Yeah, but it&#x27;s miles better than giving Anthropic and OpenAI your data. At least Deepseek is releasing open-weight models and a lot of open-source libraries.<p>If you&#x27;re concerned about espionage then the only solution is host the models yourself, which again, only open-weight models like Deepseek enable you to do this.
        • jijji22 hours ago
          there&#x27;s laws on the books in China that says that every company operating in China must aid and abet the Chinese government in espionage against the rest of the world. given those facts, I find it deeply troubling to be using anything coming out of China, especially a program that runs in the context of a Linux terminal on a machine that might have something important on it. I&#x27;d argue it&#x27;s a back door waiting to happen, if not sooner than obviously later.
          • goobatrooba20 hours ago
            As a European I have to admit I am these days more worried about the US than China. See yesterday&#x27;s article about the US government forcing Microsoft to give them lists of Dutch government officials. Utter madness. At least the Chinese mainly care about the money and power levers, the US about strange worlds of revenge and manipulation, trying to change or influence your government. E.g. which of the two countries has put crippling personal sanctions on staff of the international criminal court?<p>Honestly I&#x27;d love to love the US again, but basically after Obama things have just gone down and down and no soul will trust the US again in the next generation or two.
            • c1sc011 hours ago
              Besides the language barrier it’s actually also just simpler to do business with the Chinese. There are issues like censorship but they are known &amp; can be routed around. It’s best to just ignore the US and move your business elsewhere.
            • monch196213 hours ago
              As an Australian, I completely agree with every point in your response
            • gizajob8 hours ago
              That particular rot actually turned cancerous with Bush and Cheney, not Obama, IMO.
            • OtomotO17 hours ago
              Exactly this.<p>I don&#x27;t care about the US more than about Russia or China these days.<p>They are definitely not our allies anymore.
              • dominotw16 hours ago
                you dont need to be allies to do business. walmart is not my ally.
                • ajuc2 hours ago
                  Not enough trust to do business.
                • OtomotO8 hours ago
                  True, but then I expect them to betray me at any junction and I&#x27;ll gladly do the same.
                • schubidubiduba4 hours ago
                  The difference is that Walmart is a stable, reliable trade partner that honors contracts and is not trying to use propaganda to make you a fascist
            • jijji13 hours ago
              The situation you reference is related to a specific investigation by US congress requesting documents about potentially illegal censorship actions by EU officials from a specific company (microsoft). The difference is that the laws in china are broadly defined to include giving all intellectual property of anyone back to the government with no oversight, for the purposes of espionage.<p>The former relates to a specific investigation about potential criminal activity, the latter relates to broad illegal activity committed by the government itself unrelated to any specific case.<p>The US has no laws on the books forcing companies to wantonly give intellectual property and other espionage level material back to the government. If they did, no one would use cloud providers.<p>To avoid this, you can run your own hosted machine in a colocation facility, because in the US, people do have reduced rights when their data is controlled by a third party versus being controlled by themselves. Its the same as if the data was in your house, they would need a search warrant to obtain it, but when its at a Azure or AWS datacenter not controlled by you, your privacy rights are reduced by doing this.
              • MASNeo6 hours ago
                &gt; no one would use cloud providers<p>I think many are trying to move away from US providers actually. FISA section 702 and the current administrations liberties taken towards international law are not helping. The trust problem is real.<p>Not sure I’d trust China with anything onshore. But offshore, it does seem they play by the rules, because it pragmatically serves the stability of the people. China has not started wars in the past 50 years or so. By that logic one may assume they’d not abuse the arguably broad powers over Chinese firms abroad to risk one now.<p>In a world where rules are increasingly less important how states use power matters more to me than how they claim to be monitored.
              • watwut6 hours ago
                &gt; If they did, no one would use cloud providers.<p>EU has literal directive about location of data which has to be located in the EU and not in the USA, because the data are in danger otherwise.
                • miroljub6 hours ago
                  Yep, and then they let US companies handle that data. One more proof EU regime is run by ... no, I won&#x27;t tell, don&#x27;t wanna get arrested.
              • cicko9 hours ago
                &gt; The US has no laws on the books<p>Correct. They come up on Twitter daily. Pardon, this other truth bullshit.
            • dominotw16 hours ago
              so govt forcing a private coroporation being a big deal that a its on the worldwide news is more scary to you than an implicit mandate that china forces on its companies?
          • subscribed4 hours ago
            I forbidden from working on the company code with DS, but if I have a private something that looks pretty much like one of the thousands repositories put there, it doesn&#x27;t matter that much.
          • zaphirplane3 hours ago
            Like which country allows companies to not follow a legal directive. How weird
          • tim-projects20 hours ago
            Is it not better to have a country far away spying on you than your own country?
            • azinman217 hours ago
              Not if it’s industrial espionage.
          • Danox17 hours ago
            The four biggest (obvious) backdoor countries in the world in no particular order the United States, Israel, Russia, China. Honorable mentions, North Korea, Ukraine…
          • cicko9 hours ago
            Yes, and the Russians are still coming!
          • _3u1020 hours ago
            FISA section 702 &#x2F; Five eyes &#x2F; Room 641A.
      • NamlchakKhandro16 hours ago
        Claude code and open code are streaming piles of shit
    • himata41131 day ago
      this appears to be native to the terminal, as in, there&#x27;s no special application that runs or wraps an agent inside a tui. So basically instead of commands you type plain english?
      • &gt; this appears to be native to the terminal, as in, there&#x27;s no special application that runs or wraps an agent inside a tui<p>Same with codex? codex-rs at least, is a TUI as well, it does run a &quot;app-server&quot; in the background, that the TUI actually interacts with, but that&#x27;s just an implementation detail. Also makes it easy to hook in your own programs to fire of codex &quot;headless&quot; sessions even without the TUI.
  • agrippanux20 hours ago
    This website seems to have been generated by Codex - I asked Codex to create an HTML overview of a feature for my team and it made an overly produced monstrosity - complete with the same large stat boxes that were for the most part devoid of meaningful information - using the same font, colors, layout, hero section, etc. It was also terrible on mobile just like this is.<p>In the end I had Claude produce a one-page html file that was 95% of the way there and it took minor editing to clearly explain the intent of the feature.
    • port1118 hours ago
      A lot of LLM-driven design now looks like this. I don’t understand how people don’t find ugly the pairings with an heavily italicised serif. You also can’t read much of the page on mobile, because the code example keeps shifting the content around.<p>Now, that is overly critical, I’m sure their heart is in the right place. But a simpler website would do :)
      • gizajob8 hours ago
        Yeah such amazing tech used to produce a tediously unreadable website with great flair.
      • krm0118 hours ago
        It’s sad to see companies not spending a bit more on design. Sure, ai will help you get something decent out fast. But there’s a threshold where design becomes an indicator of trust. Especially for b2b software that tailor to large corps. Good design, character, adds directly to the bottom line.
        • schaefer2 hours ago
          &gt; It’s sad to see companies…<p>The article is about an open source agent harness, Reasonix, that is built to leverage the DeepSeek native api.<p>There’s no company here. No design budget. These people are graciously sharing a project they made in their free time.
    • easygenes8 hours ago
      Claude Opus 4.7 defaults to exactly this design language for a lot of &quot;just make me a rich html presentation page&quot; requests without further specification.
    • ritonlajoie14 hours ago
      strange, I got the same design with claude design, same fonts, same title designs with the strange character etc...
    • locknitpicker19 hours ago
      &gt; In the end I had Claude produce a one-page html file that was 95% of the way there and it took minor editing to clearly explain the intent of the feature.<p>That doesn&#x27;t say much about any model though. For starters, any software engineer can tell you that leaving out features can drastically simplify any project.
  • perseusai20 minutes ago
    This is a nice companion to the token saving context app I made. Even has the same Claude Design site, which I think looks awesome! Even though something is cheap, the concepts that make using Deepseek more efficiently can surely be applied elsewhere. Cool stuff!
  • jbellis18 hours ago
    As someone who has been writing harnesses for a year: the people at opencode etc aren&#x27;t stupid, when they decide to break the prefix cache [usually partially] it&#x27;s always because they&#x27;ve tested it and it gives better results overall.<p>If you think that dsv4 behaves differently enough from the aggregate of other models, submit a PR with a patch to special case that to your harness of choice with evidence. Just blindly assuming &quot;append only all the time because cache&quot; is a waste of everyone&#x27;s time.
    • schaefer2 hours ago
      &gt; As someone who has been writing harnesses for a year…<p>Your agent harness, brokk, looks great. I’m going to try it this morning.
    • phrotoma4 hours ago
      Is &quot;harness&quot; in this context ~= &quot;agent&quot;?
      • abustamam2 hours ago
        I&#x27;ve understood harness to be the software that runs the agent (open code, pi, Claude code)
    • anon37383910 hours ago
      Are there any learning resources you&#x27;d recommend on writing harnesses? I&#x27;m interested in doing a non-coding one, but not really sure where to start.
      • jbellis7 hours ago
        Generically, I would say, just start building it and ask your favorite coding agent for advice when you get stuck. This is the first technology that can teach you how to use it! (But do ask a model with a recent knowledge cutoff, i.e. not gemini.)
    • d-fault8 hours ago
      [flagged]
    • prakashsunil13 hours ago
      [flagged]
  • skeledrew1 day ago
    Not a fan of that page. The animated typing and resulting continuous resize of the example keeps moving the content beneath it down and up. Such bad UX.
    • Agents or no agents, people still need to test their websites on different resolutions or at least window width, but seems this is becoming a lost art.
      • mirekrusin23 hours ago
        Yeah, doesn’t look designed for people who want to read it beyond animated typing animation.
    • m4rkuskk22 hours ago
      Claude design AI slob.
  • stiray19 hours ago
    If only author would understand, that some people want single, self sustained binary that doesnt take half of computer memory and would rather write it in rust or golang.
    • Defenestresque15 hours ago
      github.com&#x2F;charmracelet&#x2F;crush<p>The company that had that acrimonious split from OpenCode. Still, fully written in Go and compared to node-based harnesses, uses 1&#x2F;5th the RAM. (At least for me.)<p>Works with any provider (including OpenRouter free ones).<p>No conflict of interest here, just a happy &quot;customer&quot; of this excellent resource.
    • Xeoncross18 hours ago
      I&#x27;m really happy to see a lot of new software come out in Rust, Go, or Zig.<p>The value and ease of development that slow interpreted languages used to offer is disappearing. New languages have all the nice things built in, or rather, our 1am pager alarms are starting to make us mad.
    • wg018 hours ago
      Can someone explain that was use of AI (and all the claims) that a coding agent cannot be written in plain go for example? Given there are tons of good terminal libraries for golang?
      • xlii18 hours ago
        It can be written in Golang but interaction libraries are very limited and with sharp edges.<p>There&#x27;s Google&#x27;s genkit, charmbracelet&#x27;s fantasy and LangChainGo. Each has ugly hacks and omissions. Then handling slice streaming of data into Elm architecture (bubbletea) is also complex.<p>So in theory nothing stand against but in practice one has to get quite low to the ground to get anything done.<p>Also: Golang agent exist! It&#x27;s called crush and is developed by charmbracelet people. It&#x27;s so-so though I prefer Pi myself.
    • zozbot23418 hours ago
      If you want to try a single self-contained binary that does take half of your computer memory or more, there&#x27;s always ds4-agent.
    • crystal_revenge18 hours ago
      If this is what you want, especially in the age of coding agents, why not just build it yourself?
    • pancsta19 hours ago
      Having a coding bot but skimming on coding? That should tell us something.
  • unshavedyak23 hours ago
    It&#x27;s pretty funny, i&#x27;m a $200&#x2F;m Claude subscriber and i&#x27;ve had little need to use anything else. However the more Claude has been restricting my workflow <i>(notably around the recent IDE&#x2F;-p usage change)</i> the more i&#x27;ve been wanting to go elsehwere.<p>I&#x27;m concerned since i really want SOTA reasoning, but DeepSeek still has me interested.
    • Alifatisk22 hours ago
      &gt; I&#x27;m concerned since i really want SOTA reasoning<p>I think you should give other models a try and see how much they differ from SOTA models. I did this and realized, even Qwen-2.5-Max was enough. I am sure even Claude Sonnet 3.5 is enough for things I play around with. I am not really striving for fields medal in Mathematics.
      • unshavedyak20 hours ago
        That&#x27;s fair, neither am i - i do tend to work in large, complex, full of legacy decision based codebases. Eg i have access to Sonnet (of course), but i choose to solely work in Opus because i find its output reads better, analyzes better, etc.<p>The &quot;cost&quot; is dumb models is just so high for me. Eg every bad decision they make increases my frustration quite a bit. Despite putting a lot of effort into my workflow to help reduce the number of decisions they make, they always will. So my hedge is always against that.. trying to reduce how insane they can be heh.
    • gck120 hours ago
      I gave a fairly complex reverse engineering task to DS-4 xhigh and GPT-5.5 xhigh today.<p>After about 6 hours, both ultimately failed to fully RE, however, there were some drastic differences:<p>DS stopped every 30 minutes or so, saying it did full RE and it should all work now, while in fact, it didn&#x27;t complete even 1% of it. It also looked for shortcuts again and again, despite me prompting heavily that the specific shortcut may not be used. It was a complete and utter failure.<p>GPT-5.5, on the other hand, blew me away. It just did the right things, didn&#x27;t jump to next steps until it was sure it completed the initial layers and had a full understanding of what&#x27;s required. The only time I prompted it during the 6 hours was when I saw it going in the right direction and I could nudge it slightly towards an even better way. I never felt I was fighting it. Okay, maybe a little bit - after compaction, it sometimes would go on a &quot;no I&#x27;m not helping you with reverse engineering&quot; tangent, but it would resolve in a clean session.<p>I cancelled my Claude subscription a month ago, so I haven&#x27;t tested that, but DeepSeek has reminded me a lot of how I worked with Opus 4.6&#x2F;4.7. Which perhaps could be a positive sign to some, but GPT-5.5 showed me that the way claude&#x2F;ds work is just way too annoying.
      • Aurornis17 hours ago
        &gt; DS stopped every 30 minutes or so, saying it did full RE and it should all work now, while in fact, it didn&#x27;t complete even 1% of it. It also looked for shortcuts again and again, despite me prompting heavily that the specific shortcut may not be used. It was a complete and utter failure.<p>This is my experience with non-SOTA models across the board. When you try them on little tasks and they work it feels amazing, but then you go deeper and you&#x27;re back to going in loops and fighting the model for hours.<p>Switching back to a SOTA model immediately yields progress again.<p>When I read all of the comments from people saying they can&#x27;t tell a difference between Opus and &lt;insert open weight model here&gt; I don&#x27;t know if they haven&#x27;t really used it much yet, or if they&#x27;re just not doing anything complicated.
        • am17an4 hours ago
          Did you read the OP when he&#x27;s exactly chiding the model you&#x27;re glazing?
          • Aurornis1 hour ago
            Did you intentionally miss the point of my comment? Substitute Opus for GPT-5.5 if you will. I use both as well as locally hosted models using some of your branches, even.
      • ttul20 hours ago
        What you’re experiencing is the difference in model intelligence. Most models can seem pretty good at simple stuff over short time horizons. Complex work requires that more intelligence be stuffed into those trillion-dimensional spaces.
      • cmrdporcupine18 hours ago
        The GPT models are heavily biased to a more incremental, empirical, evidence based approach. Sometimes to a fault. I prefer them for this reason, but it requires coaxing or strategic use of &#x2F;goal to break it out if its highly staged, one piece at a time, approach.. if you don&#x27;t like it.<p>I suspect for people doing more... website ... type development, the more &quot;yeet this into existence&quot; style of Opus feels preferable.<p>With Claude I was constantly jamming my finger on the escape key &quot;wait, you did what?! based on what proof?!&quot;
        • beering18 hours ago
          You make it sound as if Codex is for people who know what they want and Claude Code is for people who don’t know what they’re doing.
          • cmrdporcupine17 hours ago
            I was trying to not sound that biased, but ok ;-)
    • KronisLV21 hours ago
      &gt; i&#x27;ve been wanting to go elsehwere.<p>There&#x27;s always the option of using Anthropic&#x27;s models for some tasks like planning and then just hand over the implementation task to something like DeepSeek. Across different tools, a Markdown plan works pretty okay. That&#x27;s what I&#x27;m planning to do if I go from the 5x Max subscription down to the Pro.<p>I am also writing a launcher that makes using 3rd party providers with Claude Code easy (<a href="https:&#x2F;&#x2F;ccode.kronis.dev" rel="nofollow">https:&#x2F;&#x2F;ccode.kronis.dev</a>) and I already have a local proxy up and running, just not dynamic model switching yet. Though it shouldn&#x27;t be too hard to add, will probably be there within a week or two, depending on my schedule.<p>I don&#x27;t think it&#x27;s wise to leave Anthropic altogether because their models are great (and a subscription gives you features like Remote Control which I like), but switching tiers and maybe saving a bit of money seems viable! On the other hand, you do need a quality baseline, because I remember using Cerebras with GLM 4.6 way back and there was a bit too much slop.
    • logicchains22 hours ago
      If you want SOTA reasoning you should be using GPT 5.5 Pro.
      • unshavedyak20 hours ago
        This is fair, but i&#x27;ve found the different models to have different moods and require different interactions to get them to stick to just the specific edits i ask for, etc.<p>I used to surf the three big players frequently and got really tired of the effort needed to steer some models. In the end i ended up sticking with Claude because it required less steering effort. While not strictly reasoning, a models ability to follow clear directions consistently is something i&#x27;d consider part of its SOTA capabilities.<p>Eventually i just tired of exploring. I just want stability.<p>Which ironically is why i&#x27;m thinking about moving from Claude. The very basic IDE&#x2F;-p usage getting removed from my plan is a UX stability issue. I&#x27;m trying to progressively improve my workflows and efficiency, not have to establish a new foundation anytime something shifts. Quite frustrating.
      • auggierose21 hours ago
        Codex has only GPT 5.5
    • 0xbadcafebee21 hours ago
      You should definitely stick to the $200 plan, and <i>not</i> try the $10 coding plans with open weight models and higher limits. Anthropic needs your money to stay solvent, and you&#x27;ll sleep better knowing you&#x27;re using SOTA.
      • port1118 hours ago
        (Zero reason to defend Anthropic.)<p>I’ve gone that route. I really wanted to stop using Claude, but Deepseek v4 Pro and Kimi 2.6 didn’t do the job. For a lot of coding tasks or well-specced plans, maybe… but then that’s a plan made by Opus anyway.<p>Even Sonnet is sometimes not worth the trouble. Opus is very thorough and reviews its own mistakes quite well. Catches a lot of edge cases.<p>I’m not saying we shouldn’t try other things — I did! —, but it’s more or less okay that people just like Claude Code subscriptions? The back and forth I had with Kimi on a small feature came out to ~1.8€, which is 10% of my Claude subscription each month. And that was a single session. CC with Serena uses tokens fairly well.
        • bazhand17 hours ago
          &#x2F;advisor is like the old &#x2F;opusplan mode but for running tasks not just pre-planning. It can work nicely with Sonnet as the main agent and escalates to Opus as needed.
      • constantius17 hours ago
        The world would be better long-term if we chise tonfund open models instead however.<p>If you think short-term and only about yourself, paying for SOTA regardless of how many military contracts the lab has is the best thing, but paying for open models is both better ethically, and for a future where AI belongs to everyone and not just to Altman et al.
  • I love the focus on cache hit efficiency. Hats off to the deekseek team for creating a great product that maximizes cost efficiency for the user.
    • bwfan1231 day ago
      &gt; Hats off to the deekseek team for creating a great product<p>I have been using it for a while, and I wholeheartedly agree. imo, it is as good as codex or claude which I also use. It is a winner in the cost-sensitive tier, and if some startup could put it together with data-retention in mind, it could be a great product sold to the enterprise, as data-retention and privacy are the main issues for the coding-assistant usecase.
      • chillfox23 hours ago
        Deepseek v4 pro is definitely my preferred cheap model, it&#x27;s very good, and I use it all the time for my personal projects (opencode go plan), but I also use Claude Opus all the time at work and Deepseek is not as good as that, but it does compete with Sonnet for capability, and beats it on price.
        • pjerem18 hours ago
          I have unlimited Claude Opus at work and it’s wonderful. Not allozwed to use it for personal use though.<p>So I use Deepseek Pro on the $20 Ollama Cloud plan and it’s really not that far behind and I never triggered the plan’s limits.<p>It’s like 10-15% less powerful but costs 10 times less.<p>Totally worth it. I prefer Opus because my employer pays for it but I would personally never pay 10 times more for it.
          • chillfox14 hours ago
            Nice,<p>I have got unlimited Claude Opus at work as well.<p>I was really having a hard time deciding between the Ollama and OpenCode plans for personal use, I couldn&#x27;t really understand how much usage I would get with the Ollama plan, so in the end I went with OpenCode and I have never hit the limits despite using it most evenings and weekends for several hours.
            • abustamam2 hours ago
              What models do you use in open code? I too have unlimited opus at work and I tried using my same workflow from work using Kimi 2.6 in open code and... It&#x27;s just not it, even for relatively simple stuff.<p>Maybe I should try DS4p?
        • spaceman_20209 hours ago
          I genuinely don’t think you need Opus 4.7&#x2F;GPT-5.5 tier models for 95% of tasks in a normal workplace<p>People are out there using frontier intelligence to make responsive headers and weekly work reports. Absolutely don’t need the latest and greatest models for this stuff
        • HDBaseT15 hours ago
          Deepseek V4 Pro is an amazing model, even without the unreal cost factored in.<p>It is my default model at the moment. I&#x27;m not doing anything too complex though. I honestly found more expensive models like Qwen 3.6 to fail in tasks Deepseek nails.<p>I&#x27;m interested in knowing what people are using for tasks which require a bit more thinking. Kimi 2.6? Qwen 3.7? GLM 5.1?
          • Akamant4 hours ago
            17 GoLang microservices for a serious project were written perfectly using the latest version of QWEN(3.6). The only areas where we really had to work hard were documentation and a very serious task breakdown. All of this was tested, and yes, a review was required, but everything was within reason. The deadline was 10 days of 24&#x2F;7 work, including the review. When attempting to submit the same task, Opus 4.7&#x2F;4.6 had to be stopped after three hours. If you have significant resources for experimentation, you can certainly try. For us, the choice is absolutely clear at this point.
          • chillfox14 hours ago
            I don&#x27;t think there&#x27;s any open models at the moment that can handle the more challenging stuff.<p>The things that I use Opus for at work is finding bugs in about ~200k lines of microservices and libraries in a niche language. So, we will get these bug reports that are missing context, can&#x27;t easily be reproduced on our dev server, and are usually the result of something deep in multiple services&#x2F;libraries combining with very custom configs. I can ask Opus (max thinking) to find what could cause the bug, and it usually nails it in a few hours (would take me 1-2 weeks to trace it myself). The end result will be like less than 10 lines of code to fix it, some tests to reproduce the bug and a nice report explaining it, so it can be checked in an hour or two.
    • nicce22 hours ago
      Just in case, note that this project is someone&#x27;s side project<p>&gt; Independent open-source project · not affiliated with DeepSeek
    • Bombthecat23 hours ago
      Adding already cheap API cost and you probably could let it run for days and the same task..
    • stavros1 day ago
      How can you have cache hit efficiency? Isn&#x27;t it just a matter of not changing the previous context? I don&#x27;t understand what knobs there are to tweak on this.
      • everforward23 hours ago
        &gt; Isn&#x27;t it just a matter of not changing the previous context?<p>Yes, but a lot of harnesses change previous context. E.g. the system prompt injects the current time&#x2F;date, working directory, files in the working directory, etc. Compaction also changes the whole previous context. I _think_ changing the list of tools also invalidates cache, so invoking a subagent with different tools would invalidate the cache.<p>My vague impression is that it&#x27;s in a similar vein to functional programming languages. It generally disallows doing things that lead to bugs (cache misses in this case), and presumably allows you to do those things in a way that makes it much clearer that this is likely to cause cache misses. I would guess that in this paradigm, you don&#x27;t mutate your existing session, you derive a new session by mutating the prior context into a new context.
        • chillfox23 hours ago
          changing between plan&#x2F;build mode in some agents will change the tools list, which breaks the cache.
          • brookst23 hours ago
            Cache is always there, it’s just that it only caches up to the point where an input token changes. So if the tools list is early in the prompt, changing it would limit cache for most of the prompt. If the tools list is the last thing, you could still get 99% cache hits even if it changes every turn.
            • RevEng21 hours ago
              After a couple of turns the system prompt is a small part of the context. Not changing the system prompt at all is key so that the rest of the history is itself part of the prefix.
            • chillfox14 hours ago
              Depends upon the service and how the harness is built, Some of the services allow for very few cache keys, so you won&#x27;t necessarily get any cache if you edit recent messages as the cache is not per message, but big blocks of everything up to a cache key.<p>This was actually surprising to me when I learned about it as I have never worked with (or built) any cache working like that before.
  • edg500010 hours ago
    Side note: In DeepSeek API docs they mention that coding clients automatically are assigned the highest thinking effort, despite any settings. This is what I suspected when using OpenCode with V4; it keeps reasoning in very long cycles, this felt like a flaw in the model. May just be a weird API thing.<p>Overall I find their API design and docs so messy. It&#x27;s a shame, since it&#x27;s the main entrypoint to using their service.
  • JSR_FDED9 hours ago
    Maybe the first problem this tool can tackle is creating a better web page? Content continually shifting, super annoying.
  • mkrd7 hours ago
    God, I whish there was a code harness I don’t have to install a JavaScript runtime for
  • mmaunder23 hours ago
    Unusable thanks to the top animation pushing the rest of the site down repeatedly as you’re trying to read.
    • busymom018 hours ago
      The layout of the entire page is horrendous on mobile too. Looks like a huge wide site where content is only in a tiny column on left side.
  • schaefer23 hours ago
    Okay, I&#x27;m curious.<p>From the FAQ, I see:<p>&gt;Can I point it at a self-hosted &#x2F; private DeepSeek endpoint?<p>&gt;Yes. Since 0.30 we accept non-standard key prefixes for self-hosted DeepSeek endpoints. Just point `baseUrl` at your internal address — the loop, cache strategy, and tool protocol are unchanged.<p>But my question is: If I use Reasonix to talk to a deepseek endpoint through openrouter, am I still getting the cache-hit benifits of this agent harness?
    • csunoser23 hours ago
      Yes*. At least from my limited usage of deepseek-flash for a few billion tokens on openrouter, the cache-hit rate is &gt;95%. And I simply used the claude code harness pointed at the openrouter anthropic compatible endpoint with no fluff.
      • port1118 hours ago
        Did you get proper tool use? Some CC-driven models seem to get a bit off when it comes to MCP usage. For example: I really struggled to get Kimi to use Serena, which I think ended up costing too many tokens.
      • schaefer23 hours ago
        thank you!
    • thomasfromcdnjs14 hours ago
      I would wonder that too, I&#x27;m only a novice openrouter user, but I do notice it reroutes my same-model requests to different providers.<p>Maybe users reporting otherwise are just looking at their client reports which wouldn&#x27;t be able to tell the difference.
      • Lapel274211 hours ago
        Look into Openrouter&#x27;s provider routing.
  • danborn264 hours ago
    The caching strategy here looks really solid for keeping API costs down. Curious how it handles state invalidation when the agent context gets too large though.
  • danborn2622 hours ago
    High caching rates for coding agents can drastically reduce latency and API costs. I am curious to see how the caching strategy handles context invalidation across multiple files.
  • naaqq8 hours ago
    I don&#x27;t think it&#x27;s helpful, you can already get a 99+% cache hit on claude code, just change the api settings to deepseek. I would like to use a agent built by deepseek itself using deepseek models. Deepseek should make their own agent based on their model, just like OpenAI and Anthropic.
    • m00dy8 hours ago
      same here, using claude code on deepseekv4. just burnt 24.1M input hit and 170k cache miss.
  • singiamtel23 hours ago
    I would&#x27;ve liked benchmarks against other harnesses showing the caching performance
    • Havoc17 hours ago
      Just checked the stats on my opencode&#x2F;DS usage...looks like 70%ish hit rate.<p>Pretty shaky datapoint though...don&#x27;t use it as primary model
    • Alifatisk22 hours ago
      Is there benchmarks and measurements that offers comparisons between different harnesses?
  • mark_l_watson18 hours ago
    I tried it and the text input area was black with a dark font. I checked the documentation, and asked DeepSeek v4, Claude, and Gemini for help with the fonts&#x2F;style and nothing works except to run in a terminal with a dark theme. Crazy. None of the devs on the project use a light theme?
    • miav18 hours ago
      I agree that this is an issue, but.. no, they probably don’t. Light themes <i>are</i> very rarely used.
      • jofzar13 hours ago
        I understand why, but I didn&#x27;t even think of light themed terminals till now.. .
  • yanhangyhy11 hours ago
    In the open-source contributors section, when you see a lot of anime or cartoon avatars, you know most of the devs are Chinese.
  • storus21 hours ago
    Can it instruct DeepSeek during an LLM call to start removing old tool calls from the context instead of waiting for the LLM call to finish if the context size approaches DeepSeek&#x27;s dumb zone? Claude Code can&#x27;t do that, &#x2F;compact can only happen after the LLM call; it&#x27;s often preferable to start cleaning up context during an LLM call, especially when tool calls are huge like reading markdown files; implementation-wise all that is needed is to start removing earliest &lt;tool call start&gt; ... &lt;tool call end&gt; and replacing them just with some log entry stating this tool call was already performed, then re-running KV cache prefill (so the &quot;online&quot; compaction would get 0.5s latency hit every time it&#x27;s performed). That way one can read 1000 files in one LLM call.
  • pkulak22 hours ago
    Doesn&#x27;t Pi Agent do exactly this? Assuming &quot;append only&quot; means they do some kind of compaction as well.
  • hirako20001 day ago
    Good timing given the cost spike across other frontier models.
    • notjes1 day ago
      Good thing DS just made their discount permanent. <a href="https:&#x2F;&#x2F;x.com&#x2F;deepseek_ai&#x2F;status&#x2F;2057854261699195173" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;deepseek_ai&#x2F;status&#x2F;2057854261699195173</a>
  • wg020 hours ago
    Performance is horrible when you type but caching is magical.<p>Extremely pro consumer tool. I have been hammering it hard with 97% cache utilization and barely $0.03 dollar spent for me constantly exploring a codebase.
    • snqb19 hours ago
      Deepseek API caches very efficiently itself. I use it heavily via pi agent, and a lot of times I get 99%+ caching for longer sessions.<p>Have you tried using Deepseek API via other agents? This project tbh looks like a S-tier slop
      • wg018 hours ago
        I have used it with OpenCode and was good enough.
  • Isn&#x27;t caching a server-side thing? How does the agent affect it, significantly at least?
    • Say you put the current time down to the second in the system prompt, which is the message that goes in front of the entire conversation, then basically nothing will be cached, every agent turn needs to ingest the entire session over and over. Contrast to not doing that, and the backend can leverage caching all the way up to the latest message, as nothing until then changed.
      • nawitus9 hours ago
        That&#x27;s not necessarily true, you can have multiple cache points, see e.g. <a href="https:&#x2F;&#x2F;platform.claude.com&#x2F;docs&#x2F;en&#x2F;build-with-claude&#x2F;prompt-caching#when-to-use-multiple-breakpoints" rel="nofollow">https:&#x2F;&#x2F;platform.claude.com&#x2F;docs&#x2F;en&#x2F;build-with-claude&#x2F;prompt...</a>
      • esperent23 hours ago
        Surely other agent CLIs are not dumb enough to invalidate cache on every turn over something so obvious?
        • chillfox23 hours ago
          I don&#x27;t think any the agents breaks caching on every turn, but they might do things like current list of files, or available tools depending upon plan&#x2F;build mode... or lots of other things that breaks caching multiple times during a session.
        • brookst23 hours ago
          Probably not that exactly, but there is a tradeoff between effectiveness of the prompt and cache hit rate. If putting the user’s datetime in the middle of the prompt scores higher on evals but worsens cache hits, versus at the end of the prompt where it’s cache friendly but may not be as effective, what do you do?<p>This is still art as much as science and the different harnesses take different approaches.
        • embedding-shape23 hours ago
          Obviously not, most agents properly keep previous messages unchanged, at least the major ones I&#x27;ve been digging into the source off. Also, everything would get so much slower, that even developers creating their own agents would notice quickly how much slower theirs is, if they fuck this up.
      • theanonymousone22 hours ago
        Yes, of course you can destroy it. But how far can you &quot;improve&quot;, beyond decent &quot;common sense&quot; behaviour.
  • tylerdurden918 hours ago
    Given the number of supply chain attacks via npm, maybe the recommended approach to use should be pnpm instead of npx.
  • hebetude23 hours ago
    Wow the UI looks exactly what I vibe coded yesterday. What a coincidence
    • huqedato22 hours ago
      It&#x27;s obvious why...
  • ElenaDaibunny5 hours ago
    The caching strategy is doing most of the heavy lifting here cost-wise.
  • trollbridge16 hours ago
    Well folks here we have it: DeepSeek’s brand is now strong enough people want to jump on their brand recognition.
  • nextaccountic20 hours ago
    &gt; Tool-call repair<p>&gt; Tool arguments the model produces occasionally have JSON typos, unclosed quotes, or shape mismatches. Reasonix runs a schema-aware repair pass before dispatch so malformed args still execute.<p>So Deepseek API doesn&#x27;t have a structured output option where you give a grammar and the model promises the output will follow this grammar?<p>Or it does, but it&#x27;s buggy?
  • ricardobeat22 hours ago
    &gt; The loop is append-only, engineered around DeepSeek&#x27;s byte-stable prefix cache — long sessions hold 90%+ cache hit and input-token cost collapses to ~1&#x2F;5. Terminal-first, leave it running.<p>AI marketing slop. This is how all models and coding harnesses work, isn&#x27;t it?<p>The author claims (in another AI-written post):<p>&gt; LangChain — along with every generic agent framework I checked — rebuilds the prompt every turn. Timestamps get injected. History gets reordered. Tool schemas re-serialize with different whitespace.<p>I haven&#x27;t touched LangChain in a long, long time, but don&#x27;t think any of the current harnesses, Claude Code, Pi, Crush, OpenCode etc do that except if you change configuration? Keeping the context stable for caching is a very basic principle and not a wild innovation.<p>This posing as DeepSeek-specific is also a mystery.
  • imagetic22 hours ago
    <a href="https:&#x2F;&#x2F;shittycodingagent.ai" rel="nofollow">https:&#x2F;&#x2F;shittycodingagent.ai</a>
    • mi_lk21 hours ago
      Not sure about the story but it would be funny if pi folks actually own this domain.
      • chuckadams21 hours ago
        They do. That&#x27;s Pi&#x27;s old name.
    • peheje19 hours ago
      having issues with truncated output from deepseek v4 pro through openrouter via pi-harness on ptyxis-terminal using ubuntu<p>trying reasonix with direct api..
      • peheje19 hours ago
        first impression: the tui flickers a lot, unpleasent. very laggy to write in.
    • chabes22 hours ago
      Aka pi.dev
  • carterschonwald20 hours ago
    i cant find anything substantiated in the code that actually differentiates it from any other harness.<p>my fork of oh my pi that i have a lot of experiments in, is lterally designed to only work well with models that have decent reasoning levels, like deep seek models. check it out!<p><a href="https:&#x2F;&#x2F;github.com&#x2F;cartazio&#x2F;oh-punkin-pi&#x2F;blob&#x2F;main&#x2F;scripts&#x2F;build-binary.sh" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cartazio&#x2F;oh-punkin-pi&#x2F;blob&#x2F;main&#x2F;scripts&#x2F;b...</a> — thats the install script for after clone<p>fair warning: tis my dog food test bed as i build even fancier stuff
  • nikolay16 hours ago
    This is not an agent by DeepSeek, so the title is misleading.
  • yalogin22 hours ago
    Can someone give me a eli5 version of what this is? It really sounds useful to Claude subscribers.<p>Is this improving the cache hit and hence overall efficiency of coding workflows?<p>Does it also let me host a local llm (deepseek)? What are model min requirements for this?
    • timcobb22 hours ago
      You can also ask Claude and get an immediate answer, the power is yours
      • Salgat20 hours ago
        Certainly you realize that these comments exist for more than a single person right? You expect potentially hundreds of viewers to each burn through AI tokens instead of just getting a direct and relevant answer here? This has the same vibe as the old forum posts where the only response was a &quot;google it&quot;.
  • m10121 hours ago
    For those of you that use deepseek v4 occasionally, what harness do you use it with? I’m only familiar with claude code and codex.<p>Any comments on what you can or cannot rely on it for relative to cc and codex would be appreciated too!
    • eikenberry20 hours ago
      Maybe check out Goose. It is the standard agent harness being developed by The Linux Foundation under the AAIF. Under active development and the implementation seems to have a good leg up on the other popular agents.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;aaif-goose&#x2F;goose" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;aaif-goose&#x2F;goose</a><p><a href="https:&#x2F;&#x2F;goose-docs.ai&#x2F;" rel="nofollow">https:&#x2F;&#x2F;goose-docs.ai&#x2F;</a>
      • nsonha16 hours ago
        I see their name mentiod everywere along with Aider, presumably for being among the first agents, but I&#x27;ve never met anyone that actually uses them.
    • droidjj21 hours ago
      Check out pi.dev. OpenCode is a nice batteries-included Claude Code replacement, but I’m in love with the extensibility of Pi.
      • chuckadams20 hours ago
        Any Pi extensions you&#x27;d specifically recommend? I&#x27;m just starting out with Pi, but I&#x27;ve had mixed results with extensions. I&#x27;m using Pi with gemma4 26b locally, so anything that&#x27;s friendly to small local models would be appreciated. I think the only extension I&#x27;m using right now is pi-total-recall.
        • gck120 hours ago
          I think pi wants you to write your own extensions, adapted to your meeds.<p>I haven&#x27;t had a need for any extensions though. Maybe subagents, but I solved that with tmux. For all the rest, I just use &quot;skills&quot;.
  • cloudengineer9416 hours ago
    Quite interesting being Terminal based and the AI skills staying within a file of it&#x27;s own.<p>Will give a go and see how cache behaves
  • fouric22 hours ago
    I don&#x27;t think it&#x27;s particularly effective to create a new coding agent when there&#x27;s existing open-source agents (especially extremely extensible ones like Pi) that already optimize for cache hits, have far larger communities, and work for providers other than Deepseek.<p>I specifically use multiple different models and providers, so this wouldn&#x27;t be useful for me.<p>And it contributes to the problem of each person vibe-coding their own, incompatible, half-baked tool in a space, instead of contributing to a small set of tools and expanding them.<p>It&#x27;d be better to just extend an existing tool.
  • mmarcant20 hours ago
    &quot;byte-stable prefix cache&quot; -- give us your codebase in a way that&#x27;s even EASIER for us to train on.
  • hmokiguess22 hours ago
    Click on the download page, it&#x27;s hilarious. It has a lot of information about the &quot;smart probe&quot; on the download and it&#x27;s a realtime probe you can rerun.<p>That&#x27;s the pinnacle of AI slop over engineered garbage in my opinion. All of that information is noise.
  • singingtoday20 hours ago
    That site does not render correctly on my android. Lots of text on the right breaking the reactive layout.
  • arikrahman19 hours ago
    Saw nix suffixed and was excited a new dotfiles was about to hit the market.
  • quotemstr23 hours ago
    &gt; no reordering, no marker-based compaction<p>Is this <i>really</i> the behavior you want? Yes, doing tool-result clearing and such will blow your cache, but if you do it only occasionally, it&#x27;s still likely a win. Yes, cache hits are good, but not <i>so</i> good that it&#x27;s okay to be profligate with context to preserve those precious, precious KVs.
  • canadiantim1 day ago
    So what&#x27;s best low cost coding agent these days? Kimi 2.6? Qwen&#x27;s latest closed model? Composer 2.5? DeepSeek?
    • bwfan1231 day ago
      In my experience, it is claude-code paired with deepseek-v4. For penny-pinchers like me, I can have long coding sessions with it with no anxiety about the cost. Also, prompting it to what you want and verifying the outputs is more important than the quality of the model. So, I am better off with a cheaper model and taking the responsibility for prompting it and verifying the results.
      • raybb11 hours ago
        How to do connect deepseek to Claude code?
        • qaz_plm3 hours ago
          <a href="https:&#x2F;&#x2F;api-docs.deepseek.com&#x2F;quick_start&#x2F;agent_integrations&#x2F;claude_code" rel="nofollow">https:&#x2F;&#x2F;api-docs.deepseek.com&#x2F;quick_start&#x2F;agent_integrations...</a>
      • esperent23 hours ago
        It&#x27;s obviously much cheaper paying by the token but how does it compare to a codex subscription on cost?
      • epolanski1 day ago
        Can you quantify the actual costs in a week and the use you make?
        • wongarsu23 hours ago
          Not GP, but for my use I&#x27;d estimate $0.10-0.30 per hour of use per agent with DeepSeek v4 Pro
    • passive1 day ago
      I&#x27;ve gone through ~600m tokens in Xiaomi Mimo though Claude, and it&#x27;s been the most effective use of an agent I&#x27;ve had yet. It&#x27;s very capable, but generally not ambitious, picking simple but effective solutions to most problems I give it. Going to write something longer about the experience when I get to a billion tokens.
      • Alifatisk23 hours ago
        I do have my eyes on the coding plan, which is quite generous.<p><a href="https:&#x2F;&#x2F;mimo.mi.com" rel="nofollow">https:&#x2F;&#x2F;mimo.mi.com</a>
      • gandreani1 day ago
        Are you using Mimo 2.5 pro?
        • passive23 hours ago
          Yes. I tried a couple of weeks with non-Pro, and it was pretty good, but I had too many spare tokens, so I switched back to Pro. :)
    • ac291 day ago
      Kimi 2.6 is great. Qwen3.7-max benchmarks similarly but I havent used it yet
    • skeledrew1 day ago
      Seems to be DeepSeek.<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=48237663">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=48237663</a>
    • abalashov23 hours ago
      Although I have little interest in agentic coding, when I do use it, I have found Kimi K2.6 to give Opus-quality output, and have switched entirely to it for pretty much everything.
      • throw1092022 hours ago
        I&#x27;ve used Opus extensively and tried K2.6 on a few projects, and the gap is <i>huge</i>. K2.6 is nowhere <i>near</i> the performance of Opus. That&#x27;s fine because it&#x27;s also far cheaper, but public benchmarks line up with my own personal experience that they aren&#x27;t comparable in terms of intelligence.<p>(that is, different places on the Pareto efficiency graph)
        • abalashov19 hours ago
          No two uses are alike, I suppose. For me, whatever difference is a wash. However, I probably tend to shy away from throwing high-complexity&#x2F;long-horizon tasks at the model.
    • throw1092020 hours ago
      Cursor with Composer 2.5 seems to be competitive with frontier models (Opus and GPT-5.5) for a significant price discount. Benchmarks are gamed, as always, but $0.55&#x2F;task vs $11.02 a task definitely indicates that there&#x27;s <i>some</i> cost advantage.<p><a href="https:&#x2F;&#x2F;cursor.com&#x2F;evals" rel="nofollow">https:&#x2F;&#x2F;cursor.com&#x2F;evals</a>
    • stavros1 day ago
      For me, it&#x27;s by far Deepseek. It&#x27;s many times cheaper than competitors, and about as good as Sonnet 4.6.
      • fouric22 hours ago
        I&#x27;d generally agree about Deepseek being as good as Sonnet - but I have extreme trouble with prompt compliance with V4 Pro in a way that I&#x27;ve never had with Sonnet. I&#x27;ll tell it &quot;find the bug, but don&#x27;t fix it&quot; or &quot;please use this tool I just developed&quot; and it&#x27;ll ignore me a high fraction of the time.<p>It&#x27;s bad enough that I&#x27;m working on guardrails at the harness level because prompting appears to be useless.<p>Do you have the same issue?
        • stavros22 hours ago
          I have Opus make a fairly detailed plan, then Deepseek implements, and GPT reviews. With that setup, I have zero issues, probably because what you mention is handled (the plan keeps it on track and the reviewer catches any issues).<p>Now that you mention it, though, I <i>have</i> seen it do a few things that weren&#x27;t in the plan. The reviewer caught them, though, so they didn&#x27;t cause a problem, and it&#x27;s so cheap that overall it&#x27;s a massive improvement.
          • e2e416 hours ago
            Which CLIs are you using for each of the steps?
            • stavros16 hours ago
              OpenCode for everything: <a href="https:&#x2F;&#x2F;www.stavros.io&#x2F;posts&#x2F;how-i-write-software-with-llms&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.stavros.io&#x2F;posts&#x2F;how-i-write-software-with-llms&#x2F;</a>
              • e2e414 hours ago
                thank you; will read your post
    • lostmsu1 day ago
      Just use codex with 5.5 on low reasoning levels
  • adi_kurian19 hours ago
    There is an uncanny valley effect to websites where FE is created in full via an AI.<p>These sites have the immediate scent of &#x27;high design&#x27;, with errors that no &#x27;high designer&#x27; would dare make.<p>The italics give me nausea. Text promoted with orange fill is seemingly random. There is no thought behind the combination of art and copy. Random smattering of Title Case and Sentence case and lower case. A lack of commitment to a full stop Widowed H1s. H1s with random spaces .<p>At the same time, if I hammer CMD - to 25%, it looks fancy. Perhaps nobody gives a fuck.<p>That said, I&#x27;m excited to try this tool!
  • am17an22 hours ago
    This Claude front end skill is now soon to be slop.
    • auggierose21 hours ago
      Oh, I was wondering why all new websites look shitty in the same way.
      • aratahikaru526 minutes ago
        Not a maintainer, but I&#x27;ve fixed some of the really jarring issues on desktop (mobile needs a complete overhaul though). IMO It&#x27;s not that bad, and it gets the job done.<p>Any feedback on how to make it less &quot;shitty&quot;? I feel like doing some vibe coding tonight.
    • ricardobeat22 hours ago
      Already is. Every new website looks exactly the same.
  • jedisct117 hours ago
    It&#x27;s probably good, and the best for Deepseek models, but do we really need one harness per model?
  • Hfuffzehn22 hours ago
    This is really tickling the conspiracy theorist part of my brain.<p>&quot;Independent open-source project · not affiliated with DeepSeek&quot; &quot;Reasonix only targets DeepSeek because...&quot; &quot;Why DeepSeek only? Can I swap to Claude &#x2F; GPT? It&#x27;s a design choice, not a limitation&quot;<p>The lady doth protest too much, methinks?<p>Nicely timed shortly after the making the rebate permanent anouncement.<p>Could just be Chinese devs trying to help western devs with some software and a western facing marketing campaign to raise awareness. Could be DeepSeek astroturfing. Could be &quot;someone&quot; in China trying to get more access to western data.<p>Who knows?
  • andai22 hours ago
    But Claude made the website?
    • Alifatisk21 hours ago
      What conclusion are you drawing from that?
      • andai19 hours ago
        If Deepseek can&#x27;t even make a static site, why would I want to use it for anything else? (Not saying it can&#x27;t, just that it&#x27;s a weird choice to present your Deepseek-oriented product.)
        • Alifatisk17 hours ago
          I see your point, but as we know, devs from Google and OpenAI regularly use Claude Code because of its edge in frontend. I think using another model to build your own thing is a pragmatic engineering decision, not a sign of failure.
  • sergiotapia1 day ago
    What AI model did you use for the website design? This is the second one I see with the exact same font and color scheme. Just curious because Claude models lean towards purples for example. Thank you!
    • pcwelder23 hours ago
      Opus 4.7 selects such palette and motifs by default. Might even be first iteration of claude design.
    • franga20001 day ago
      This design still screams Claude to me, but a newer version than what you&#x27;re thinking of. At some point they added a markdown file that tells it to use obviously AI designs like lots of blue&#x2F;purple and gradients. Since then, this is its new style.
    • sheepscreek23 hours ago
      DeepSeek v4 perhaps?
    • FergusArgyll23 hours ago
      Frontend design skill by Anthropic specifically says not to use purple. I&#x27;d be surprised if it still uses purple. Have you seen that recently?
  • tw198410 hours ago
    deepseek is building an official coding harness, why would anyone waste time on such 3rd party toy when official one is coming probably in weeks?
  • sunaookami19 hours ago
    Another day, another vibeslopped &quot;product&quot; on the front page of hacker news with over 200 points. When will you guys learn?
  • ankitwarbhe21 hours ago
    you created it yourself ?
  • treexs15 hours ago
    codex generated sites are so easy to spot lmao
  • WhereIsTheTruth20 hours ago
    Y&#x27;all should not be writing js&#x2F;ts&#x2F;slop&#x2F;npm based clis anymore<p>It&#x27;s the agentic era, pick a better option<p>Just stop
    • Alifatisk20 hours ago
      Whats that option?
    • fHr18 hours ago
      yep codex opensource rust cli clears this night and day long
  • MultiAgt46 minutes ago
    [flagged]
  • claud_ia5 hours ago
    [flagged]
  • mevinbuilds6 hours ago
    [flagged]
  • antegugga1 hour ago
    [dead]
  • dahuangf8 hours ago
    [flagged]
  • codepack11 hours ago
    [flagged]
  • aplomb102621 hours ago
    [flagged]
  • embirdating5 hours ago
    [dead]
  • the_mitsuhiko1 day ago
    [dead]
  • benjiro300021 hours ago
    [dead]
  • grekares8 hours ago
    [dead]