Why Your Load Balancer Still Sends Traffic to Dead Backends

(singh-sanjay.com)

23 points by singhsanjay125 hours ago

4 comments

dastbe42 minutes ago
kind of right, kind of wrong* for client-side load balancing, it's entirely possible to move active healthchecking into a dedicated service and have its results be vended along with discovery. In fact, more managed server-side load balancers are also moving healthchecking out of band so they can scale the forwarding plane independently of probes.* for server-side load balancing, it's entirely possible to shard forwarders to avoid SPOFs, typically by creating isolated increments and then using shuffle sharding by caller/callee to minimize overlap between workloads. I think Alibaba's canalmesh whitepaper covers such an approach.As for scale, I think for almost everybody it's completely overblown to go with a p2p model. I think a reasonable estimate for a centralized proxy fleet is about 1% of infrastructure costs. If you want to save that, you need to have a team that can build/maintain your centralized proxy's capabilities in all the languages/frameworks your company uses, and you likely need to be build the proxy anyways for the long-tail. Whereas you can fund a much smaller team to focus on e2e ownership of your forwarding plane.Add on top that you need a safe deployment strategy for updating the critical logic in all of these combinations, and continuous deployment to ensure your fixes roll out to the fleet in a timely fashion. This is itself a hard scaling problem.
dotwaffle1 hour ago
I've never quite understood why there couldn't be a standardised "reverse" HTTP connection, from server to load balancer, over which connections are balanced. Standardised so that some kind of health signalling could be present for easy/safe draining of connections.
- snowhale0 minutes ago
 gRPC's health checking protocol (<a href="https://github.com/grpc/grpc/blob/master/doc/health-checking.md" rel="nofollow">https://github.com/grpc/grpc/blob/master/doc/health-checking...</a>) is roughly this -- servers expose a standard health streaming endpoint the LB can subscribe to, and servers can send SERVING/NOT_SERVING signals proactively when they want to drain. not universally adopted and HTTP/1 has nothing equivalent, but the spec exists for the server-initiated signaling direction.
- bastawhiz39 minutes ago
 Whether the load balancer connects to the server or reverse, nothing changes. A modern H2 connection is pretty much just that: one persistent connection between the load balancer and server, who initiates it doesn't change much.The connection being active doesn't tell you that the server is healthy (it could hang, for instance, and you wouldn't know until the connection times out or a health check fails). Either way, you still have to send health checks, and either way you can't know between health checks that the server hasn't failed. Ultimately this has to work for every failure mode where the server can't respond to requests, and in any given state, you don't know what capabilities the server has.
AuthAuth2 hours ago
It seems like passive is the best option here but can someone explain why one real request must fail? So the load balancer is monitoring for failed requests. If it receives one can it not forward the initial request again?
- jayd162 hours ago
  Not every request is idempotent and its not known when or why a request has failed. GETs are ok (in theory) but you can't retry a POST without risk of side effects.
  - bdangubic8 minutes ago
    I am a contractor and have been fixing shit large part of my career. non-idempotent POSTs are just about always at the top of the list of shit to fix immediately. To this day (30 years in) I do not understand how can someone design a system where POSTs are not idempotent… I mean I know why, the vast majority of people in our industry are just not good at what they do but still…
- cormacrelf2 hours ago
  For GET /, sure, and some mature load balancers can do this. For POST /upload_video, no. You'd have to store all in-flight requests, either in-memory or on disk, in case you need to replay the entire thing with a different backend. Not a very good tradeoff.
singhsanjay125 hours ago
I wrote this after seeing cases where instances were technically “up” but clearly not serving traffic correctly.The article explores how client-side and server-side load balancing differ in failure detection speed, consistency, and operational complexity.I’d love input from people who’ve operated service meshes, Envoy/HAProxy setups, or large distributed fleets — particularly around edge cases and scaling tradeoffs.
- owenthejumper40 minutes ago
 Modern LBs, like HAProxy, support both active & passive health checks (and others, like agent checks where the app itself can adjust the load balancing behavior). This means that your "client scenario" covering passive checks can be done server side too.Also, in HAProxy (that's the one I know), server side health checks can be in millisecond intervals. I can't remember the minimum, I think it's 100ms, so theoretically you could fail a server within 200-300ms, instead of 15seconds in your post.
 - bastawhiz34 minutes ago
 > theoretically you could fail a server within 200-300ms, instead of 15seconds in your post.You need to be careful here, though, because the server might just be a little sluggish. If it's doing something like garbage collection, your responses might take a couple hundred milliseconds temporarily. A blip of latency could take your server out of rotation. That increases load on your other servers and could cause a cascading failure.If you don't need sub-second reactions to failures, don't worry too much about it.
- firefoxd1 hour ago
 Hi author, a tangent:<pre><code> <meta name="viewport" content="width=device-width, initial-scale=1" /> </code></pre> For us who need to zoom in on mobile devices.
- Noumenon721 hour ago
 Thanks for writing something that's accessible to someone who's only used Nginx server-side load balancing and didn't know client-side load balancing existed at higher scale.
- jeffbee37 minutes ago
 I don't think you really need sub-millisecond detection to get sub-millisecond service latency. You mainly need to send backup requests, where appropriate, to backup channels, when the main request didn't respond promptly, and your program needs to be ready for the high probability that the original request wins this race anyway. It's more than fine that Client A and Client B have differing opinions about the health of the channel to Server C at a given time, because there really isn't any such thing as the atomic health of Server C anyway. The health of the channel consists of the client, the server, and the network, and the health of AC may or may not impact the channel BC. It's risky to let clients advertise their opinions about backend health to other clients, because that leads to the event where a bad client shoots down a server, or many servers, for every client.