2 comments

  • clemlesne1 hour ago
    Did someone compared with uSearch (<a href="https:&#x2F;&#x2F;github.com&#x2F;unum-cloud&#x2F;USearch" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;unum-cloud&#x2F;USearch</a>)?
  • skybrian36 minutes ago
    Are these sort of similarity searches useful for classifying text?
    • CuriouslyC6 minutes ago
      Embeddings are good at partitioning document stores at a coarse grained level, and they can be very useful for documents where there&#x27;s a lot of keyword overlap and the semantic differentiation is distributed. They&#x27;re definitely not a good primary recall mechanism, and they often don&#x27;t even fully pull weight for their cost in hybrid setups, so it&#x27;s worth doing evals for your specific use case.
    • OutOfHere18 minutes ago
      It altogether depends on the quality and suitability of the provided embedding vector that you provide. Even with a long embedding vector using a recent model, my estimation is that the classification will be better than random but not too accurate. You would typically do better by asking a large model directly for a classification. The good thing is that it is often easy to create a small human labeled dataset and estimate the error confusion matrix via each approach.