Zvec: A lightweight, fast, in-process vector database

32 points by dvrp1 day ago

2 comments

clemlesne1 hour ago
Did someone compared with uSearch (<a href="https://github.com/unum-cloud/USearch" rel="nofollow">https://github.com/unum-cloud/USearch</a>)?
skybrian36 minutes ago
Are these sort of similarity searches useful for classifying text?
- CuriouslyC6 minutes ago
  Embeddings are good at partitioning document stores at a coarse grained level, and they can be very useful for documents where there's a lot of keyword overlap and the semantic differentiation is distributed. They're definitely not a good primary recall mechanism, and they often don't even fully pull weight for their cost in hybrid setups, so it's worth doing evals for your specific use case.
- OutOfHere18 minutes ago
  It altogether depends on the quality and suitability of the provided embedding vector that you provide. Even with a long embedding vector using a recent model, my estimation is that the classification will be better than random but not too accurate. You would typically do better by asking a large model directly for a classification. The good thing is that it is often easy to create a small human labeled dataset and estimate the error confusion matrix via each approach.