Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction

(mixedbread.com)

70 points by breadislove2 days ago

13 comments

Zagreus214214 minutes ago
``` We evaluated several precision pairings across our internal retrieval benchmark suite. Scores are NDCG@10 averaged across the suite, scaled to 0–100. NDCG@10 (Normalized Discounted Cumulative Gain at rank 10) measures how well the top 10 results are ordered against the ideal ranking, rewarding relevant documents more when they appear higher, with 100 being a perfect ranking. The full-precision baseline averages 90.26. Int8 query against binary documents averages 89.65, a 0.61 point drop, while reducing document-vector storage by 32x ```Saying "Near lossless" to mean 90% accurate retrieval of saved vectors is simply a lie. Lossy-ness is binary, not something you can paper over with getting close enough. And 90% is not close. Sure, LLMs are all about gradient descent on noisy data sets so I guess this is acceptable in this field but that terminology usage still bothered me
kaizenite7 minutes ago
To people smarter than me, how impressive and/or revolutionary is this?
elil173 hours ago
I would love to see real examples of what reduced quality means in practice. Are you able to recover a document from the vector in a human readable format? If so, what sort of changes come up?I could imagine a scenario where differences tend to be more substantive than you'd expect because of how less frequent words with fine distinctions in meaning - the very words that make the document special - may be embedded in the vector space.
- yorwba2 hours ago
 Most of the fine distinctions are already lost when a document is processed through a pile of linear algebra to turn it into a fixed-size list of floating-point numbers, as you can see from the NDCG@10. Vector search is not a tool for fine distinctions. It's a tool for reducing a large pile of documents to a smaller selection of candidates, which you can then check individually with some more expensive method.
purple-leafy3 hours ago
Hey breadislove; amazing article, I’ll be sending mixedbread an email in the morning that may interest you (email will be <5-characters>@pm.me)I have also been working in compression and performance engineering, and managed to get a 99+% compression unlock versus conventional approaches (100+KB down to 1KB) in the scenario of 30 minute massive multiplayer game replays for a “game+engine” I’m developingI think there’s a synergy between these 2 concepts I’d love to chat some more
- palinnilap19 minutes ago
 Any way I can read about this or the use case? I have a hobby interest
alfiedotwtf44 minutes ago
If you squint hard enough, it sounds like their storage layer is a bloom filter
Ameo5 hours ago
I can't wait until we get to 100% storage/cost/compute reduction for LLMs. Every thought you could have thought pre-conceived in high-fidelity super-resolution. Every action you could have taken predicted and simulated in advance courtesy of Openthropic and the USA Sovereign Wealth Fund.
- throwaway20272 hours ago
  You would obviously be trading storage for compute and time to retrieve the storage.
- throwaw124 hours ago
  100% reduction is impossible for something which should work, because -100% means it is now 0
  - neonstatic4 hours ago
    They were clearly being sarcastic
- peheje4 hours ago
  Reminds me of 'Learning to be me' by Greg Egan
functionmouse2 hours ago
there is no such thing as "near lossless"
- ttoinou2 hours ago
  There is, after you define what you’re ready to loose and understand the lossy space. That’s how we came up with mobile cellphones, audio and video codecs etc. Literally powering all modern devices we use.
  - greenleafone75 minutes ago
    So then ... "lossy"
  - functionmouse51 minutes ago
    Actually, all of those things are considered "lossy".
rq13 hours ago
The Pi compression algorithm is better.
- luma1 hour ago
  Doubtful. The problem with the pi idea is that you need to include the offset, which will likely be as long as or longer than your data.
nathan_compton1 hour ago
" A single document produces more then one embedding, depending on the complexity of the document it can produce hundreds or thousands of vectors."That typo up there is kind of endearing in the AI slop era.
m_m_carvalho1 hour ago
[dead]
mv_d5339e315 hours ago
[dead]
johnathan1016 hours ago
[flagged]
TradingReality1 day ago
[flagged]