7 comments

  • duskwuff39 days ago
    Before you get too excited, keep two things in mind:<p>1) Using a single compression context for the whole stream means you have to keep that context active on the client and server while the connection is active. This may have a nontrivial memory cost, especially at high compression levels. (Don&#x27;t set the compression window any larger than it needs to be!)<p>2) Using a single context also means that you can&#x27;t decompress one frame without having read the whole stream that led up to that. This prevents some possible useful optimizations if you&#x27;re &quot;fanning out&quot; messages to many recipients - if you&#x27;re compressing each message individually, you can compress it once and send the same compressed message to every recipient.
    • adzm39 days ago
      The analogy to h264 in the original post is very relevant. You can fix some of the downsides by using the equivalent of keyframes, basically. Still a longer context than a single message but able to be broken up for recovery or etc.
    • yellow_lead39 days ago
      &gt; This may have a nontrivial memory cost, especially at high compression levels. (Don&#x27;t set the compression window any larger than it needs to be!)<p>It sounds like these contexts should be cleared when they reach a certain memory limit, or maybe reset periodically, i.e every N messages. Is there another way to manage the memory cost?
      • michaelt39 days ago
        LZ77 compression (a key part of gzip and zip compression) uses a &#x27;sliding window&#x27; where the compressor can tell the decompressor &#x27;repeat the n bytes that appeared in the output stream m bytes ago&#x27;. The most widely used implementation uses a 15 bit integer for m - so the decompressor never needs to look more than 32,768 bytes back in its output stream.<p>Many compression standards include memory limits, to guarantee compatibility, and the older the standard the lower that limit is likely to be. If the standards didn&#x27;t dictate this stuff, DVD sellers could release a DVD that needed a 4MB decompression window, and it&#x27;d fail to play on players that only had 2MB of memory - setting a standard and following it avoids this happening.
      • treyd39 days ago
        That&#x27;s a misunderstanding. Compression algorithms are typically designed with a tunable state size paramter. The issue is if you have a large transfer that might have one side crash and resume, you need to have some way to persist the state to be able to pick up where you left off.
  • lambdaloop39 days ago
    Does streaming compression work if some packets are lost or arrive in a different order? Seems like the compression context may end up different on the encoding&#x2F;decoding side.. or is that handled somehow?
    • gkbrk39 days ago
      WebSockets [1] run over TCP, and the messages are ordered.<p>There is RFC 9220 [2] that makes WebSockets go over QUIC (which is UDP-based). But that&#x27;s still expected to expose a stream of bytes to the WebSocket, which still keeps the ordering guarantee.<p>[1]: <a href="https:&#x2F;&#x2F;datatracker.ietf.org&#x2F;doc&#x2F;html&#x2F;rfc6455" rel="nofollow">https:&#x2F;&#x2F;datatracker.ietf.org&#x2F;doc&#x2F;html&#x2F;rfc6455</a><p>[2]: <a href="https:&#x2F;&#x2F;datatracker.ietf.org&#x2F;doc&#x2F;rfc9220&#x2F;" rel="nofollow">https:&#x2F;&#x2F;datatracker.ietf.org&#x2F;doc&#x2F;rfc9220&#x2F;</a>
    • dgoldstein039 days ago
      I think the underlying protocol would have to guarantee in order delivery - either via tcp (for http1, 2, or spdy), or in http3, within a single stream.
    • duskwuff39 days ago
      It sounds as though the data is being transferred over HTTP, so packet loss&#x2F;reordering is all handled by TCP.
      • dgoldstein039 days ago
        Yes, or by http3&#x27;s in order guarantees on the individual streams (as http3 is udp)
  • efitz39 days ago
    When I worked at Microsoft years ago, me and my team (a developer and a tester) built a high volume log collector.<p>We used a streaming compression format that was originally designed for IBM tape drives.<p>It was fast as hell and worked really well, and was gentle on CPU and it was easy to control memory usage.<p>In the early 2000s on a modest 2-proc AMD64 machine we ran out of fast Ethernet way before we felt CPU pressure.<p>We got hit by the SOAP mafia during Longhorn; we couldn’t convince the web services to adopt it; instead they made us enshittify our “2 bytes length, 2 bytes msgtype, structs-on-the-wire” speed demon with their XML crap.
  • vlovich12339 days ago
    Using zstd with a tuned small file custom dictionary probably gets you most of the benefit without giving up independence of compression.
  • bob102938 days ago
    There is a proposal out there for serving &amp; using custom compression dictionaries over HTTP:<p><a href="https:&#x2F;&#x2F;www.ietf.org&#x2F;archive&#x2F;id&#x2F;draft-ietf-httpbis-compression-dictionary-05.html" rel="nofollow">https:&#x2F;&#x2F;www.ietf.org&#x2F;archive&#x2F;id&#x2F;draft-ietf-httpbis-compressi...</a>
  • almaight38 days ago
    mwss <a href="https:&#x2F;&#x2F;github.com&#x2F;go-gost&#x2F;x&#x2F;blob&#x2F;master&#x2F;dialer&#x2F;mws&#x2F;dialer.go" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;go-gost&#x2F;x&#x2F;blob&#x2F;master&#x2F;dialer&#x2F;mws&#x2F;dialer.g...</a>
  • masklinn39 days ago
    Surely that is obvious to anyone who has compared zip and tgz?
    • skulk38 days ago
      MUD clients and servers use MCCP which is essentially keeping a zlib stream open, adding text to it, and flushing it whenever something is received. I think this has been around since 2000.<p><a href="https:&#x2F;&#x2F;tintin.mudhalla.net&#x2F;protocols&#x2F;mccp&#x2F;" rel="nofollow">https:&#x2F;&#x2F;tintin.mudhalla.net&#x2F;protocols&#x2F;mccp&#x2F;</a>