this is cool. i built something similar a while back using wavelets and matching pursuit in a similar manner but with a different goal; i wanted to make an image compressor that had different visual effects when the file format was glitched. here are some examples of it moving variance from the original above to the compressed image below: <a href="https://youtube.com/shorts/f2pZyZNXY0Q?si=HXf14pOs9DaAk7MZ" rel="nofollow">https://youtube.com/shorts/f2pZyZNXY0Q?si=HXf14pOs9DaAk7MZ</a> <a href="https://youtube.com/shorts/-LIALRpU63o?si=p_MiFnT8MMX0C0b4" rel="nofollow">https://youtube.com/shorts/-LIALRpU63o?si=p_MiFnT8MMX0C0b4</a>
One other thing to compare it against is actual tiny JPEGs.<p>When you save a series of images as 16x16 JPEGs at the same JPEG quality level without optimization, you notice that there is a whole lot of common data between those files. Common data includes things like the file header (FF D8, FF E0 blocks), the Quantization tables, and the Huffman tables. If you cut away all the common data, the actual size of the image data is extremely tiny, usually under 64 bytes, though not a fixed size.<p>Here are the sizes of the four example images (just the unique image data) when resized to 16x16, then saved at quality 20:<p>First image: 48 bytes<p>Second image: 42 bytes<p>Third image: 31 bytes<p>Fourth image: 35 bytes<p>After appending back the 625 bytes of common data, you end up with a regular JPEG that can be decoded and displayed using fast native code from the browser.<p>ThumbHash page includes a comparison against "Potato WebP" which is probably a similar idea.
Interesting, but my testing suggests that SplatHash is very weak at preserving global features, at least for synthetic images [1]. Both BlurHash and ThumbHash were able to preserve most of them, at the expense of worse (but still non-zero) local feature reproduction, but SplatHash simply discarded <i>all</i> global features! I guess you need to store both local features (Gaussian splats) and global features (cosine bases) for the best result. The currently unused padding bit might be useful for that...<p>[1] I used my own avatars and icons as a test set. For example, <a href="https://avatars.githubusercontent.com/u/323836?s=400&v=4" rel="nofollow">https://avatars.githubusercontent.com/u/323836?s=400&v=4</a>
Very cool. To my eye, the splats are sometimes having too much contrast -- implying more "stark" visual features that don't actually manifest in the real image. Presumably the radius and the opacity curve of the gradients can be tuned to taste at the decoding phase, to make the splats softer?
The 6 blobs of colors look very weird after testing a few images, I feel like ThumbHash is much more natural and the downsides are minimal compare to SplatHash.
Thanks for sharing. I didn’t even know this type of thing had multiple algorithms.<p>Can you share what are the reasons someone may want to compress and image to 16 bytes?
For image placeholders while the real image is loading. At 16 bytes, that can easily be just another attribute on an html img tag.
I've seen the alternative where you make a tiny JPEG file (discarding the huffman and quantization tables), and use that as the placeholder. Just glue the header and tables back on, and let the browser handle JPEG decoding and stretching. It's not as small as 16 bytes, but the code for handling it is fast and simple.<p>The trick of using common huffman and quantization tables for multiple images has been done for a long time, notably Flash used it to make embedded JPEGs smaller (for when they were saved at the same quality level).
These things are called Low-Quality Image Placeholders (LQIP) and frequently used for front-end performance engineering.
so you turn images into colored bubbles? Why do people use this?
As a game engine dev, if I have an asset management app, it’s pretty reasonable that it might load the list of asset names and hashes before doing the significant work of decoding/generating thumbnails. This could give the app instant low quality thumbnails from loading the tiny array of data that’s already necessary just to get started.
Inline it into a website's HTML to provide a low-res preview of an image as opposed to a blank placeholder or layout shift.
For privacy preservation and progressive revelation.