Incident March 30th, 2026 – Accidental CDN Caching

(blog.railway.com)

45 points by cebert5 hours ago

9 comments

varun_chopra4 hours ago
The status page [1] has the actual root cause (enabling "Surrogate Keys" silently bypassed their CDN-off logic). The blog post doesn't. That's backwards."0.05% of domains" is a vanity metric -- what matters is how many requests were mis-served cross-user. "Cache-Control was respected where provided" is technically true but misleading when most apps don't set it because CDN was off. The status page is more honest here too: they confirmed content without cache-control was cached.They call it a "trust boundary violation" in the last line but the rest of the post reads like a press release. No accounting of what data was actually exposed.[1] <a href="https://status.railway.com/incident/X0Q39H56" rel="nofollow">https://status.railway.com/incident/X0Q39H56</a>
stingraycharles4 hours ago
This write up doesn’t make sense. Authenticated users are the ones without a Set-Cookie? Surely the ones with the cookie set are the authenticated ones?There are dozens of contradictions, like first they say:“this may have resulted in potentially authenticated data being served to unauthenticated users”and then just a few sentences later say“potentially unauthenticated data is served to authenticated users”which is the opposite. Which one is it?Am I missing something, or is this article poorly reviewed?
- justjake4 hours ago
 Fixed the typo in that second paragraph and aligned the section on the Set-Cookie stuff. Anything else that can be made more clear?
 - DrewADesign3 hours ago
 It appears that your company experienced an incident during which a blog entry was made available in which readers became informed about certain information about a server condition that resulted in certain users receiving a barrage of indirect clauses etc. etc. etc.Be more direct. Be concise. This blog post sounds like a cagey customer service CYA response. It defeats the purpose of publishing a blog post showing that you’re mature, aware, accountable, and transparent.
 - codechicago2773 hours ago
 The problem is that these visible errors make us wonder what other errors in the post are less visible. Fixing them doesn’t fix the process that led to them.
 - slopinthebag3 hours ago
 I'm pretty sure it's AI.<a href="https://x.com/JustJake/status/2007730898192744751" rel="nofollow">https://x.com/JustJake/status/2007730898192744751</a>I wouldn't be surprised if most of Railway's infra is running on Claude at this point.
 - antics3 hours ago
 The CEO says it's not: <a href="https://x.com/JustJake/status/2038799619640250864" rel="nofollow">https://x.com/JustJake/status/2038799619640250864</a>A lot of people are confident in enough in their ability to spot AI infra that they are willing to dismiss a firsthand source on this, and I admit I have no idea why. There isn't any upside to making this claim, and anyway, I assure you that people need no help at all from AI to make these kinds of mistakes.
 slopinthebag1 hour ago
 Their reply doesn't make much sense, they're supposedly soc2 compliant. How are they compliant but letting a single engineer push out a change like that?I'm sure Claude didn't literally ship the feature itself with no oversight, but I also find it hard to believe that their approach to adopting AI didn't factor in at all. Even just like, the mental overhead of moving faster and adopting AI code with less stringent review leading to an increase in codebase complexity could cause it. Couple that with an AI hallucinating an answer to the engineer who shipped this change, I'm not sure why people are so quick to discount this as a potential source of the issue. Surely none of us want our infra to become less secure and reliable, and so part of preventing that from happening is being honest about the challenges of integrating AI into our development processes.
 antics29 minutes ago
 > I'm not sure why people are so quick to discount [AI] as a potential source of the issue.Because (per the link above) the CEO said that (1) it was their fault, and (2) it had nothing to do with AI.I understand that on this forum statements like this are inevitably greeted with some amount of skepticism, but right now I'm seeing no particular reason to disbelieve Jake, and the reason that "if they did use AI they'd deny it" should frankly not be considered good enough to fly around here. Like probably everyone in this comment section I'm open to evidence that they used AI to slop-incident themselves, but until we can reach that standard let's please calm down and focus on what we actually know to be true.
 - stingraycharles2 hours ago
 It's fine they use AI, it's not fine they don't proofread things.
heyethan1 hour ago
Caching is one of those systems that works perfectly — until it amplifies the wrong thing. Feels like you have to define what’s safe to cache before optimizing for speed.
rileymichael3 hours ago
pretty hard to find this on their blog, looks like incidents are tucked away at the bottom. an issue of this size deserve a higher spot.(also looks like two versions of the 'postmortem' are published at <a href="https://blog.railway.com/engineering" rel="nofollow">https://blog.railway.com/engineering</a>)
sebmellen4 hours ago
Almost three years ago now, Railway poached one of our smartest engineers. They were smart to do so. I have a lot of respect for the Railway team and I’m impressed with their execution.I think this is their first major security incident. Good that they are transparent about it.If possible (@justjake) it would be helpful to understand if there was a QA/test process before the release was pushed. I presume there was, so the question is why this was not caught. Was this just an untested part of the codebase?
muragekibicho3 hours ago
Does Stripe use Railway? The dashboard was down today and this is the only incident report I've encountered and the timeline matches Stripe's downtime.
sublinear4 hours ago
I'm curious if having unique URLs per user session would mitigate this.I think that's already best practice in most API designs anyway?
- kay_o2 hours ago
 Probably.No, it isn't. Ive not seen this in an API ever and only in webapps ?phpsessid= back in childhood
algolint50 minutes ago
[dead]
wokgr3t44 hours ago
[dead]