We architected an edge caching layer to eliminate cold starts

(mintlify.com)

35 points by skeptrune54 days ago

7 comments

infogulch53 days ago
Automatic version detection, revalidation, prewarming... caching seems so complicated these days. Forgive me for starting a sentence with "why don't we just"... but why don't we just use the hash of the object as the cache key and be done with it? You get integrity validation as a bonus to boot.<pre><code> <link rel="stylesheet" href="main.css?hash=sha384-5rcfZgbOPW7..." integrity="sha384-5rcfZgbOPW7..."/> Etag: "sha384-5rcfZgbOPW7..." Cache-Control: max-age=31536000, immutable</code></pre>
- rob7453 days ago
 Sure, but where's the fun in that? Then you wouldn't be able to write "we architected a caching layer"! To their credit, at least this isn't the actual title of the article, but it still left me wondering if an actual architect (you know, the kind of architect that designs buildings) would say "I architected this"?
- skeptrune53 days ago
 Because you want the ability to invalidate the cache for an entire site at the same time. So you would still need some map between domain and hash.
 - infogulch53 days ago
 You don't need to invalidate anything if the cache is keyed on the hash of the served objects. To put it another way, a hash-keyed cache results in perfectly precise, instant, distributed cache invalidation. Read the code in my comment again.
samdoesnothing53 days ago
A lot of people are criticizing this for unnecessary complexity, but it's a little more complicated than that. I actually think it makes sense given where they are at right now. The complexity stems from Vercel and Next.js - had they used a different tech, say Cloudflare directly and architected their own systems designed to handle rapidly changing static content none of this would have been necessary. So I guess it depends on your definition of unnecessary complexity. It's definitely unnecessary for the problem space, but probably necessary for their existing stack.
pyrolistical53 days ago
I just don’t get it. Their last paragraph describes how they changed their dynamic site to be static. So then why do you need workers at all? Just deploy to a CDN.How do you do version updates? Add content hash to all files except for root index.html.Cache everything forever, except for index.htmlTo deploy new version upload all files, making sure index.html is last.Since all files are unique, old version continues to be served.No cache invalidating required since all files have unique paths, expect index.html which was never cached.You have to ensure you absolutely have properly content hashes for everything. Images, css, js. Everything
- Borealid53 days ago
 What happens in the event of a hash collision?
 - pyrolistical53 days ago
 Have the deploy fail and dont update index.html, users stay in current version.For example cloudfront with s3, you use If-None-Match when uploading to ensure deploy fails on conflict
 - immibis53 days ago
 You win a million dollar prize in cryptography.
owenthejumper53 days ago
The invalidation queue is interesting, but building a custom cache key manually? Even Cloudflare now supports Cache-Tags
- skeptrune53 days ago
  We chose to do a custom cache key to avoid modifying the origin host NextJS app as much as possible. If we had more confidence in modifying the host then I agree cache-tags would have been better.
0x3f54 days ago
Sometimes I feel like work and needless infra complexity grows perfectly to match headcount and nominally available resources.
- amichal54 days ago
 I feel the same, 72 million monthly page views is about 8 pages per second even if in a single timezone (72e6 / 8h * 30d * 3600h/s) - even with today's heavy weight pages we are talking under well under 1000 req/s. Assuming they are not super image/asset heavy i would expect this to comfortably be served by a couple of reasonable old school ngnix servers[1]. If each page was a full megabyte of uncached content we are < 10Gbits/sec. Probably under 1The build logic to decide which things to rebuild of course is probably the interesting bits but we dont need all these services... </grey-beard-rant>[1] <a href="https://openbenchmarking.org/test/pts/nginx&eval=c18b8feaeca6235b318667a0c1159c7eb54ce634#metrics" rel="nofollow">https://openbenchmarking.org/test/pts/nginx&eval=c18b8feaeca...</a>edit: to be less ranty they are more or less building static sites out of their Next.js codebase but on-demand updated etc which is indeed interesting but none of this needs cloudflare/hyerscaler techNot sure how many customers/sites they have. Perhaps they don't want to spend CPU regenerating all sites on every deployment? They do describe a content-driven pre-warmer but I'm still unclear why this couldn't be a content-driven static site generator running on some build machine
 - 0x3f53 days ago
 The thing is you can still stick a CDN in front of your old school servers and just use a 'stale-while-revalidate' header to get exactly the effect described here.
 - SkiFire1353 days ago
 We do this, but if you're redeploying fast enough thre's a change that a user loads a cached old page (or performs a client-side navigation to an old page) and makes a requests for a URL that's no longer served by the origin nor is cached by the CDN.
 - cloudflare72853 days ago
 I have done this with Next.js. Next.js doesn't support this header or I don't know how.I already had HAProxy setup. So I have added stale while revalidate compatible header from HAProxy. Cloudflare handle the rest.Edit: I am not using vercel. Self hosted using docker on EC2.
 - amichal53 days ago
 Yeah, as a salty greybeard i tried to tell our FE tech-lead to just render the proper HTTP Cache-Control headers in the Next.js site we recently built. He tried and then linked me to <a href="https://nextjs.org/docs/app/guides/caching" rel="nofollow">https://nextjs.org/docs/app/guides/caching</a> and various version of their docs on when you can and cannot set Cache-Control headers (e.g. <a href="https://nextjs.org/docs/app/api-reference/config/next-config-js/headers#cache-control" rel="nofollow">https://nextjs.org/docs/app/api-reference/config/next-config...</a>) and I got several hours of head-ache before calling it a problem for another day. That site is not high traffic enough to care but this is not the first time that i've gotten the "not the Next.js way" talk and was not happy.I obviously can be done but clearly is not the intended solution which really bothers me
 - 0x3f53 days ago
 Well, part of the Vercel game is to lock you in to their platform and extract $$$, but as I recall you can spec out headers in NextJS config?. And possibly on CloudFlare itself via cache rules?
 cloudflare72853 days ago
 I am self hosting using Docker. Next.js config to change header didn't work for me. I had cache rules in Cloudflare, but Next.js header for page (no-cache) doesn't allow Cloudflare to apply stale-while-revalidate.Now that I have proper header added by HAProxy, Cloudflare cache rules for stale-while-revalidate works.If anyone can reach Cloudflare. Please let us forcefully use stale-while-revalidate even when upstream server tells otherwise.
 - amichal53 days ago
 this too...
 - skeptrune53 days ago
 Stale-while-revalidate as implemented in the post was easier for us and required less resources than migrating from our dynamic site architecture to static. Ideally we would have migrated to fully static sites, but the engineering effort required to make that happen wasn't in scope.
- 9rx53 days ago
 Something I noticed a long time ago is that Vercel turns everything they touch into being 10 times harder than it needs to be.I have come to conclude it is that way because they focus on optimizing for a demo case that presents well to non-technical stakeholders. Doing one particular thing that looks good at a glance gets the buy-in, and then those who bought in never have to deal with the consequences of the decision once it is time to build something other than the demo.
 - 0x3f53 days ago
 I'm no fan of Vercel, but it's kind of the symptom of a wider pattern, right? I see crazy architecture astronaut setups in so many places. It's true non-technical stakeholders can cause problems but I often see it pushed from inside the tech org too. I'm thinking it's some combination of resume-driven development, misunderstanding of 'scalability'/when it's needed, and intra-org working-together problems where it's easier to just make a new service and assert your dominion over it.
 - skeptrune53 days ago
 I blame this more on NextJS than Vercel, but agree in spirit. Their architecture creates a pit of failure where you're encouraged to fall into a fully dynamic pattern and is a huge trap.However, it's probably more inexperience than anything. Nobody senior was around to tell our founders that they should go for a SSG architecture when they started /shrug. It's mostly worked out anyways though haha.
- immibis53 days ago
 If true, it's one of the only things preventing totls economic collapse due to lack of jobs.
 - 0x3f53 days ago
 I would suggest that UBI in fact already exists, just in a subset of tech jobs where you have to engage in a certain kind of theater to get it. It's only by construction though that losing these jobs would be a problem. We have pointless busywork (and a ton of other problems) because housing is a failed market, essentially.
ricardobeat53 days ago
2025, the world rediscovers simple static caching. You could do the same with varnish/nginx or wp-cache with 10% of the complexity. Or a CDN.“Incremental Static Regeneration” is also one of the funniest things to come out of this tech cycle.
- skeptrune53 days ago
 I have an existential crisis about joining a company so deeply bought into NextJS dark patterns every day.