3 comments

  • centamiv6 hours ago
    OP here. I wrote this implementation to deeply understand the mechanics behind HNSW (layers, entry points, neighbor selection) without relying on external libraries. While PHP isn&#x27;t the typical choice for vector search engines, I found it surprisingly capable for this use case, especially with JIT enabled on PHP 8.x. It serves as a drop-in solution for PHP monoliths that need semantic search features without adding the complexity of a separate service like Qdrant or Pinecone. If you want to jump straight to the code, the open-source repo is here: <a href="https:&#x2F;&#x2F;github.com&#x2F;centamiv&#x2F;vektor" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;centamiv&#x2F;vektor</a> Happy to answer any questions about the implementation details!
    • lukan4 hours ago
      Thanks a lot, I liked the fantasy based examples to explain the concept.<p>Programming <i>is</i> chanting magic incarnations and spells after all. (And fighting against evil spirits and demons)
      • centamiv3 hours ago
        I&#x27;m really glad you liked the article! Thanks so much for reading the previous one too, I really appreciate it.
    • hu35 hours ago
      Great writeup. Thanks for talking the time to organise and share.<p>It&#x27;s tempting to use this in projects that use PHP.<p>Is it useable with a corpus of like 1.000 3kb markdown files? And 10.000 files?<p>Can I also index PHP files so that searches include function and class names? Perhaps comments?<p>How much ram and disk memory we would be talking about?<p>And the speed?<p>My first goal would to index a PHP project and its documentation so that an LLM agent could perform semantic search using my MCP tool.
      • centamiv4 hours ago
        I tested it myself with 1k documents (about 1.5M vectors) and performance is solid (a few milliseconds per search). I haven&#x27;t run more aggressive benchmarks yet.<p>Since it only stores the vectors, the actual size of the Markdown document is irrelevant; you just need to handle the embedding and chunking phases carefully (you can use a parser to extract code snippets).<p>RAM isn&#x27;t an issue because I aim for random data access as much as possible. This avoids saturating PHP, since it wasn&#x27;t exactly built for this kind of workload.<p>I&#x27;m glad you found the article and repo useful! If you use it and run into any problems, feel free to open an issue on GitHub.
    • hilti4 hours ago
      Great article! I also read your other post and love it! This is exactly my thinking: Locality of Behavior (LoB)<p>Never heard this term before, but I like it.<p><a href="https:&#x2F;&#x2F;centamori.com&#x2F;index.php?slug=basics-of-web-development" rel="nofollow">https:&#x2F;&#x2F;centamori.com&#x2F;index.php?slug=basics-of-web-developme...</a>
      • centamiv3 hours ago
        Thanks for checking out the other posts too! I wasn&#x27;t familiar with the term &#x27;Locality of Behavior&#x27; until recently, but it perfectly captures what I strive for: readability and simplicity.
    • Random094 hours ago
      The only small thing you forgot to mention - it requires use of AI. Open Ai to be specific. I&#x27;ve got baited.
      • centamiv4 hours ago
        Apologies if it felt that way! I used OpenAI in the examples just because it&#x27;s the quickest &#x27;Hello World&#x27; for embeddings right now, but the library itself is completely agnostic.<p>HNSW is just the indexing algorithm. It doesn&#x27;t care where the vectors come from. You can generate them using Ollama (locally) HuggingFace, Gemini...<p>As long as you feed it an array of floats, it will index it. The dependency on OpenAI is purely in the example code, not in the engine logic.
        • devmor1 hour ago
          I think you&#x27;d get a lot more people interested in trying your project out if you included steps on how to generate vectors for the search as a document.<p>I love PHP, but I will realistically admit that most people interested in using PHP probably don&#x27;t have the experience to know how to do such a thing offhand.
  • rvnx4 hours ago
    Cool blog post, smart guy, very thoughtful and not a copy-paste of Python code like 99% of folks. Nice to see
    • centamiv4 hours ago
      Thank you, really appreciate that
  • fithisux5 hours ago
    It makes perfect sense to implement it in a high level language that allows understandability.<p>Very good contribution.
    • centamiv5 hours ago
      Thank you! That was exactly the goal. Modern PHP turned out to be surprisingly expressive for this kind of &#x27;executable pseudocode&#x27;. Glad you appreciated it!