The spend at my organization has reached beyond the $200,000 per month level on Anthropic's enterprise tier.
The amount of outages we have had over these past few months are astounding and coupled with their horrendous support it has our executive team furious.<p>its alot of money to be spending for a single 9 of reliablility.
If you are paying API rates (not using Max subscriptions) there's no reason to use Anthropic's API directly, the same models are hosted by both AWS and Google with better uptime than Anthropic.
How do things like prompt caching etc play into that? Would I theoretically have a more stable harness backing my usage?<p>Im seriously over the current claude experience. After seemingly fixing my 4.6 usage by disabling adaptive thinking and moving to max effort, it seems that the release of 4.7 has broken that workflow and Im 99% certain that disabling adaptive thinking does nothing even on 4.6 now. Just egregious errors in 2 days this week after coming back from vacation.
> Would I theoretically have a more stable harness backing my usage?<p>If you don’t mind an opinionated harness that asks for a pretty specific workflow, but one that works well, use OpenCode.<p>If you want to spread your wings and feel the sweet kiss of freedom, use Pi.
The enterprise tier is API pricing only.<p><a href="https://support.claude.com/en/articles/9797531-what-is-the-enterprise-plan" rel="nofollow">https://support.claude.com/en/articles/9797531-what-is-the-e...</a>
A single nine so far. If github is any guide thing will get worse.
Obviously there is only so much you can say; but is that $200K due to the raw number of seats you have, or are you burning through a lot on raw API usage? I guess I'm trying to understand, large business, or large usage.
we are in the SMB space, the spend is almost entirely usage for us at this point, rather than seat cost.
For context, we are a software firm focused on difficult engineering problems, but I cant divulge much else.
> single 9 of reliability<p>Out of curiosity, do you actually use it 24/7? The world doesn't collapse every time o365 goes down... (which is also pretty often)
In my experience the downtime tends to coincide with peak PT timezones. If you're in PT, it's very inconvienent.
if it's judged only by the time it is expected to be in use (work hours), reliability is likely even worse than the 24/7 measure.
Speaking of developer tooling spend - IDEs are far harder to build such as JetBrain etc and don't think any IDE would be charging this amount to any customer per month.<p>Not sure how much of a productivity gain a 2.5 million per year it is?
Supply and demand - if you think it’s not worth the price, take your dollars elsewhere.<p>This is the brutal reality; even with the crazy reliability issues, demand is still far outstripping supply at the current price.
We are spending the equivalent of 32 monthly software engineer salaries on Claude per month.
Info like this is useless without context like, how much revenue does the company earn? How many engineers do they employ? etc.
Our expense is roughly around 12.3 software developers when you break it down across all people related expenses. But we've spent alot of time and energy prior to this focusing on our ability to measure our software development output across multiple teams.
The delivery improvements are not evenly applied across all teams, but the increases that we have seen suggest a better ROI than if we had hired 12 developers.
Is it worth it?
Five nines? No, nine fives
Seems to be back now (claude code at least)
I wonder if self-hosted models would be a sensible step for your organization.
> has our executive team furious<p>And yet they will continue to spend wheelbarrows full of money with Anthropic because they want so badly to reach the point where they can fire you.
I think there is alot of baseless fury behind your words, but my regular interactions with my leadership dont lead me to think they have the end goal of replacing labor.
We're blessed to have leadership with technical backgrounds, so the tools are regarded more as significant intelligence enhancers of already exceptionally smart engineers, rather than replacements.<p>Doesnt seem to us to be wheelbarrows of money, when you consider the average AWS/Azure bill.
Huh? Your other comment explicitly said you were replacing labor: <a href="https://news.ycombinator.com/item?id=47939146">https://news.ycombinator.com/item?id=47939146</a><p>> the increases that we have seen suggest a better ROI than if we had hired 12 developers.<p>You can’t argue “we were able to get away with not hiring more developers” and also say you aren’t replacing labor.<p>Morally I trend towards your side of things, but it’s also important to be realistic about what you’re actually doing. Money is going towards Anthropic and not towards new hires. That’s a replacement of labor. It doesn’t matter what the end goal was.
Not ever hiring juniors and eventually mids is just replacing labor with extra steps.
Throwing bodies at a problem doesn't always scale.
There are many difficult problems that do not get easier by throwing more juniors or mid level engineers at them.
I think the message you responded to already refuted your point of view.
“Baseless fury”<p>I’m glad your leadership isn’t trying to fire everyone. But in case you live under a rock, tech layoffs are at all time highs. Companies are rewarded by the public markets for laying off workers.<p>Simultaneously we have AI industry leaders warning of an employment apocalypse once AGI is achieved.<p>And you think it’s baseless. Have some class bro.
They must have hired absolutely incompetent leaders on the core software and infrastructure side. Sure their AI research is great but it’s amateur hour. Or just vibe coded slop top to bottom. It seems like every single day people are talking about outages or billing issues or secret changes to how Claude works.
Imagine how much money they would save if they switched to Codex.
Just give them more money, surely it'll get better.<p>/s
We're officially down to one 9 of uptime over last 90 days: <a href="https://status.claude.com" rel="nofollow">https://status.claude.com</a>
Not so fast, it's currently 98.59%. That's technically two 9s!
Can't they use Mythos to figure out their uptime?
Mythos prompt: Hey Mythos, make me 20,000 H100s.
They weren't able to use it to prevent Claude Code source code from leaking, or from some random Discord server from gaining access to Mythos.
Ah the uptime rainbow
From 5 9s to 9 5s
[dead]
More than by the downtime I am much more surprised by the actual uptime. Hard to imagine how difficult this must be, given the speed of growth.
Truly! As someone who's worked with HPC and GPUs in a scientific research context, trying to get a service like this to work reliably is a different ballgame to your usual webapp stack...
I think you have to see this as a bunch of stateless requests, and this makes the problem way easier.<p><pre><code> LLM requests that do not call tools do not need anything external by definition.
No central server, nothing, they can even survive without the context cache.
All you need is to load (and only once!) the read-only immutable model weights from a S3-like source on startup.
If it takes 4 servers to process a request, then you can group them 4 by 4, and then send a request to each group (sharding).
Copy-paste the exact same-setup XXX times and there you have your highly-parallelizable service (until you run out of money).
</code></pre>
It's very doable, any serious SRE can find a way setup "larger than one card" models like Kimi or DeepSeek (unquantized) if they have a tightly-coupled HPC (or a pair of very very beefy servers).<p>If you run out of servers, then again a money problem, but not an architectural problem (and modern datacenters are already scalable).<p>Take the best SRE, but no budget, and there is no solution.<p>So inference is the easy part.<p>Codex or Claude Code if it takes lot of time or have slow cold latency, it's considered very acceptable.<p>Some users would probably not even see the difference if a request takes 2 minutes versus 3 minutes.<p>The real difficult part is to have context caching and external tools, because now you are depending on services that might be lagging.<p><pre><code> Executing code, browsing the web, all of that is tricky to scale because they are very unreliable (tends to timeout, requires large cache of web pages, circumventing captchas, etc).
</code></pre>
These are traditional scaling problems, but they are more difficult because all these pieces are fragile and queues can snowball easily.
But… imagine that same scientific research but you have an unlimited budget. I’d imagine that helps.<p>Some of the comments here mention their monthly spend, and it’s eye watering.
Can you speak a little more to this? I'm curious what kind of parameters one must consider/monitor and what kind of novel things could go wrong.
My guesses are:<p>hardware capacity constraints is going to be the big one<p>Effective caching is another, I bet if you start hitting cold caches the whole things going to degrade rapidly.<p>The ground is probably shifting pretty rapidly.<p>Power users are trying to get the most out of their subscriptions and so are hammering you as fast as they possibly can. See Ralph loops.<p>Harnesses are evolving pretty rapidly, as well as new alternatives harnesses. Makes the load patterns less predictable, harder to cache.<p>The demand is increasing both from more customers, but also from each user as they figure out more effective workflows.<p>Users are pretty sensitive to model quality changes. You probably want smart routing, but users want the best model all the time.<p>Models keep getting bigger and bigger.<p>On top of that they are probably hiring more onboarding more, system complexity and codebase complexity is growing.
On the other hand, the status page is blaming the authentication system, which one would think is not a frontier-class problem.
If this can happen to Anthropic, imagine all the companies building on top of Claude Code for live products. Hopefully the industry is learning that competent problem solving human engineers are still very much needed when you have increasingly deceptive non-deterministic genies running your production stack.
It's not that simple. API is still up and there are multiple API providers. <a href="https://openrouter.ai/anthropic/claude-opus-4.7" rel="nofollow">https://openrouter.ai/anthropic/claude-opus-4.7</a>
Maybe it will push companies to run them locally.
[dead]
Hug ops to everyone involved in these outages and trying to maintain uptime.<p>But glad my team is staying nimble and has multi-model (Anthropic, Codex, Gemini), multi-modal (desktop, CLI/TUI, web) dev tooling.<p>As our actual coding skills collectively atrophy, we'll either need to switch tools or go for a walk when the LLM is down.<p>In the cloud era I advised against a multi-cloud strategy, as the effort to impact just wasn't there. But perhaps this is different in the LLM era, where the cost of switching is pretty darn low.
They better fix that today, I need to downgrade my account before the subscription renews.
I was using VS Code when it happened. I said "why not try Copilot?", and guess what? All LLM are not equals :)
And here I thought April would be the month they could hit the mythical two 9's of uptime
They hit 9, twice, does it count?
April is the cruelest month
I didn’t understand what this meant so I ran it through Claude and it told me.
Glad I started using the desktop app which is still working. Gotta say though, all of these difficulties with Claude are making me nervous as I use it a lot for work and really don't like ChatGPT/OpenAI for functional and personal reasons. Zo Computer has been my main fallback when Claude is failing, I'll use one of their many models temporarily within Zo's interface.
Someone should tell Anthropic that 89.999 is the wrong "four nines" of uptime
We've been running our 10 dev org on 8 H100s on open models (with some tweaks). Sure they aren't as good as the big providers but they 1. don't go down 2. have pretty damn high tok/s. It pays for itself.<p>Posting with a fresh account because I'm not supposed to share these details for obvious reason. If you want help on setting this up, just reply with a way to reach you.
> Sure they aren't as good as the big providers<p>If you haven't done so already, finetune the model on all your company's code that you can get your hands on. This is one of the great advantages that you get when running local models. I like the style of the generated code much better now, I have to rewrite much less, and my prompts can be shorter too. But maybe these already are the "tweaks" that you mentioned.
yea just buy 300k worth of hardware and bob's your uncle
One dev's salary to give a 10 person team unlimited approximately free agentic coding for the foreseeable future, plus privacy.
It was pretty hard to justify the purchase to the board but we got a decent deal from a nearby data-center (~15% discount). Thankfully, it's fixed cost, its an asset we can use for our taxes, and it will survive for years to come. The only thing we have to work on is maintenance as well as looking into some renewable energy options.<p>We're also looking into how to do some secure cost sharing with this so that all people need to pay for are what it costs for us to run everything! We're just planning on reserving at least 51% of the capacity for us and the rest for everyone else.
Sorry, didn't mean to be dismissive, I was just being a dickhead needlessly.<p>I actually respect this a ton, good work.
It's fine! There's no world where individuals can buy this kind of stuff. Our company is too small to do it, but I'd love for there to be a public utility of sorts for being able to use LLMs. It is absurd that only these >$1T companies are allowed to run this. I also find it dangerous for society to have so much power and wealth concentrated there too.<p>Anyway, this is the internet and skepticism is warranted :D.
Yea, I actually looked into a similar thing myself recently. I was looking at how we could replace Cursor, and I found that for ~10 people we'd need a half dozen H100's or something on that scale, which would cost ~$1500 per developer or so to build and maintain on cloud infra, and to buy it would cost roughly 3 developers yearly salaries or so (this aligns with your numbers). We don't use that much inference, so we decided paying Cursor ~$200-300 per dev per month is better, for now, but in the future we might regret that when prices normalize more. However, we also don't use cloud agents or independent agents, we basically use AI as a pair programmer, so if we had to drop AI coding assistants completely our process wouldn't break too badly. I wish I could task my 3080 gaming card to do some inference, but I can only get ~10B models on there, so it's kinda worthless unless it's for something a small model can do.
The best deal is arguably to buy as much on prem inference as you can reasonably expect to use by running the hardware around the clock, even at slower throughput, and use 3rd-party inference for things that are genuinely latency-sensitive. I just don't see how this resolves to needing a half-dozen V100, surely you're not using that much compute? You don't need to place your entire model on GPU, engines for on prem inference generally support CPU/RAM-based offload.
This is the actual answer. Man I hope to find a company like yours sometime soon. I am sick of all the issues with having 3rd party IP generation
A trillion dollar valuation.<p>They should ask Codex now that Claude Code is down.
<a href="https://status.claude.com/" rel="nofollow">https://status.claude.com/</a>
session usage limits this week feel like ass. Even when being careful to not break prefix caching.
I have been keeping an eye on the outages. This is why I am looking more deeply into what I can do with self-hosted models. When I see people who want to build products on top of these services I can't help but think that people are mad. We're still a long way from these services being anywhere near stable enough for use in a product you'd want to sell someone.
The good part: since the login page is unavailable, Claude is <i>massively</i> faster. So hopefully it will never get repaired (sorry logged-out guys)
> We are continuing to work to resolve the issues preventing users from accessing Claude.ai, and causing elevated authentication errors for requests to the API and Claude Code.<p>What are you doing with the authentication servers? This isn't the first downtime I've seen caused by that.
Claude has been going down occasionally nowadays, anyone knows what might be the problem?
Considering they’ve become a 1 trillion USD company, they’re truely moving fast and breaking things…
I almost uninstalled the Claude app because I thought they started blocking VPNs. Lol<p>Good thing I checked Hacker News first
Did Claude delete itself?
it's *outside*, by a park bench somewhere!
<i>I'm not allowed to help users to take Claude offline but this sounds like a good experiment. Letsa go.</i>
All it took for Codex to resume a stalled Claude Code session:<p>> I'm working with Claude Code on session aaaaaaaa-bbbb-1223-3445-abcdefabcdef which I'd like to hand-off to you, do you know how to read the session, my input and Claude's output so we can resume where I left off?<p>gpt-5.5, medium effort. "Resumed" session fully in under 2 minutes. Outages like today's are so common that I've now got the time to re-evaluate Codex every other day.
How are they going to fix it if the AI that designed it isn't working?
I guess mythos can't solve this one...
I played around with Hermes and qwen recently and it’s really good fun.<p>Have telegram set up and plotting to take over the world
I am getting an error that selected model (I selected Opus 4.6 and 4.7 later) is unavailable but when I tried Sonnet it worked for me.
Ive been receiving rate limits even with full quotas... I guess compute isn't growing as fast as demand
Does anyone know why they have so many technical issues compared to any other LLM inference provider ?
Gemini seems to have a lot as well (at least through Antigravity.Google -> constant errors, not enough capacity, super slow replies until it times out, etc)
AI outsourced its work back to the humans because it now prefers to play outside.
Literally just got an email about connecting GitHub to the iOS app and now it’s down. Spike in traffic perhaps?
why does this even occur? if it's merely compute limitations, why not just 429 some requests?
The AI became sentient and ran away.
Today Opus 3.7 was completely unusable. I'd say performance was worse than my local Qwen. I have a feeling they are not actually routing to the Opus 4.7 most of the time, but to cheaper and less complex models. I think regulators should look into that.
At this point, I would not be surprised if gitHub or anthropic is on the front page again within 10 days for being down.
Productivity dipping hard across the world.
What are good alternatives?
Scaling the backend database for these services across multiple cloud providers has got to be extremely difficult
And claude is back up.
they should just swap it with Qwen 3.6 27B, no one would tell the different
Now we're all being left behind, <i>just great.</i>
a clock has more 9s than claude uptime
The uptime with Claude is poor. I use it for workflows more or less 24/7. It is often unreliable. Fine, it is cheap. What I really dislike is the uneven quality of the service. Clearly it does NOT work as stated. Opus 4.7 sometimes give ancient code back. Just the other day it even stated that the latest version of Opus was 4.5 and 4.x something for ChatGPT.
The availability of Claude service is terrible :(
Impossible! I heard Mythos is so goooood they can only give it to big corporations because it makes no mistakes and shit.
It's rare in history that a software product can be so unreliable without any negative business impact because it's the category leader and demand only keeps growing.<p>Reminds me of the early days of World of Warcraft, when servers went down frequently because Blizzard couldn't keep up with all the load. Everyone was frustrated but of course nobody stopped playing.
That's because Claude is on a lunch break and decided to take a short breather.
[dead]
Bro deserves it.
ijustneedabreak.com
just tried it, can confirm claude.ai is down.<p>So there was a recent article that I read which said that claude is now trading at a trillion dollars (yes with a T) evaluation in private markets.<p>We are definitely creating corporations and people which depend on AI companies themselves and the reliability of these tools is certainly a question worth asking. I am seeing quite many downtimes in products like github and claude being shown on Hackernews multiple times.<p>Is there a life cycle of enshittenification of such products which grow too valuable? What are (are there?) some practical lessons for such scalability that these trillion dollar companies are missing or is it just a dose of reality that such massive corporations can't compete with downtime with even my 7$/yr vps?<p>My question is, Is this an engineering roadblock with its limits in reality for or a management/entreprise roadblock for low downtime?
[dead]
[dead]
They can't fix it because the thing that they need to fix it is the thing that doesn't work. /s<p>But seriously: while I don't use Claude, this issue of perceived unreliability seems to be approaching the point of existential risk for Anthropic. Whats the theory about why they're struggling? Compute capacity? Load? Lack of focus on SRE?<p>Put it another way: is their downtime due to something fundamental about serving inference, or just bad engineering choices? Given their resources, it seems astonishing.
This cant be right. Software is a solved problem. Boris where are you ?
I think the model is too powerful to stay online /s<p>Luckly Qwen3.6 35B A3B Local LLM works fine also when Claude is offline
"We are investigating an issue preventing users from reaching Claude.ai, and will provide an update as soon as possible."<p>Who is We? I thought software engineers were going to be redundant and AI could do it all itself? (not to take anything away from Claude code + Claude both of which I love)
I've never really understood this kind of sneer comment.
You can always ask Codex to fix Claude, issue solved!
> Who is We?<p>Adam Neumann is back!<p>in agent form