6 comments

  • handfuloflight2 hours ago
    One moment you&#x27;re speaking about context but talking in kilobytes, can you confirm the token savings data?<p>And when you say only returns summaries, does this mean there is LLM model calls happening in the sandbox?
    • mksglu1 hour ago
      For your second question: No LLM calls. Context Mode uses algorithmic processing — FTS5 indexing with BM25 ranking and Porter stemming. Raw output gets chunked and indexed in a SQLite database inside the sandbox, and only the relevant snippets matching your intent are returned to context. It&#x27;s purely deterministic text processing, no model inference involved.
      • handfuloflight1 hour ago
        Excellent, thank you for your responses. Will be putting it through a test drive.
        • mksglu1 hour ago
          Sure, thank you for your comment!
    • mksglu2 hours ago
      Hey! Thank you for your comment! There are test examples in the README. Could you please try them? Your feedback is valuable.
  • robbomacrae28 minutes ago
    Really cool. A tangential task that seems to be coming up more and more is masking sensitive data in these calls for security and privacy. Is that something you considered as a feature?
    • mksglu6 minutes ago
      Good question.<p>The SQLite database is ephemeral — stored in the OS temp directory (&#x2F;tmp&#x2F;context-mode-{pid}.db) and scoped to the session process. Nothing persists after the session ends. For sensitive data masking specifically: right now the raw data never leaves the sandbox (it stays in the subprocess or the temp SQLite store), and only stdout summaries enter the conversation. But a dedicated redaction layer (regex-based PII stripping before indexing) is an interesting idea worth exploring. Would be a clean addition to the execute pipeline.
  • vicchenai1 hour ago
    The BM25+FTS5 approach without LLM calls is the right call - deterministic, no added latency, no extra token spend on compression itself.<p>The tradeoff I want to understand better: how does it handle cases where the relevant signal is in the &quot;low-ranked&quot; 310 KB, but you just haven&#x27;t formed the query that would surface it yet? The compression is necessarily lossy - is there a raw mode fallback for when the summarized context produces unexpected downstream results?<p>Also curious about the token count methodology - are you measuring Claude&#x27;s tokenizer specifically, or a proxy?
    • mksglu1 hour ago
      Great questions.<p>--<p>On lossy compression and the &quot;unsurfaced signal&quot; problem:<p>Nothing is thrown away. The full output is indexed into a persistent SQLite FTS5 store — the 310 KB stays in the knowledge base, only the search results enter context. If the first query misses something, you (or the model) can call search(queries: [&quot;different angle&quot;, &quot;another term&quot;]) as many times as needed against the same indexed data. The vocabulary of distinctive terms is returned with every intent-search result specifically to help form better follow-up queries.<p>The fallback chain: if intent-scoped search returns nothing, it splits the intent into individual words and ranks by match count. If that still misses, batch_execute has a three-tier fallback — source-scoped search → boosted search with section titles → global search across all indexed content.<p>There&#x27;s no explicit &quot;raw mode&quot; toggle, but if you omit the intent parameter, execute returns the full stdout directly (smart-truncated at 60% head &#x2F; 40% tail if it exceeds the buffer). So the escape hatch is: don&#x27;t pass intent, get raw output.<p>On token counting:<p>It&#x27;s a bytes&#x2F;4 estimate using Buffer.byteLength() (UTF-8), not an actual tokenizer. Marked as &quot;estimated (~)&quot; in stats output. It&#x27;s a rough proxy — Claude&#x27;s tokenizer would give slightly different numbers — but directionally accurate for measuring relative savings. The percentage reduction (e.g., &quot;98%&quot;) is measured in bytes, not tokens, comparing raw output size vs. what actually enters the conversation context.
  • rcarmo50 minutes ago
    Nice trick. I’m going to see how I can apply it to tool calls in pi.dev as well
    • mksglu42 minutes ago
      That means a lot, thank you! Would love to hear your feedback once you try it — and an upvote would be much appreciated if you find it useful
  • sim04ful2 hours ago
    Looks pretty interesting. How could i use this on other MCP clients e.g OpenCode ?
    • mksglu2 hours ago
      Hey! Thank you for your comment! You can actually use an MCP on this basis, but I haven&#x27;t tested it yet. I&#x27;ll look into it as soon as possible. Your feedback is valuable.
      • nightmunnas1 hour ago
        nice, I&#x27;d love to se it for codex and opencode
        • mksglu1 hour ago
          Thanks! Context Mode is a standard MCP server, so it works with any client that supports MCP — including Codex and opencode.<p>Codex CLI:<p><pre><code> codex mcp add context-mode -- npx -y context-mode </code></pre> Or in ~&#x2F;.codex&#x2F;config.toml:<p><pre><code> [mcp_servers.context-mode] command = &quot;npx&quot; args = [&quot;-y&quot;, &quot;context-mode&quot;] </code></pre> opencode:<p>In opencode.json:<p><pre><code> { &quot;mcp&quot;: { &quot;context-mode&quot;: { &quot;type&quot;: &quot;local&quot;, &quot;command&quot;: [&quot;npx&quot;, &quot;-y&quot;, &quot;context-mode&quot;], &quot;enabled&quot;: true } } } </code></pre> We haven&#x27;t tested yet — would love to hear if anyone tries it!