open-source plugin · ELv2 · part of Context Mode
The other half of the context problem. An MCP plugin that sandboxes tool output and indexes it into a local FTS5 store. Your AI coding agent searches that store instead of re-sending raw bytes every turn. 15 platforms. 287,000+ developers. Runs entirely on your machine.
§ 01 · the problem
Here's the thing about Claude Code, Cursor, Copilot, Codex, Gemini CLI. The LLM has no memory. Every turn, the full conversation gets re-sent — every file read, every command output, every grep result. It piles up.
Run gh issue list once. 59 KB of JSON comes back. About 15,000 tokens. Next turn, those 15,000 tokens get sent again. Turn after that, again. Fifty turns in, that one command has cost you 750,000 input tokens.
Now do that 20 times in a session. Average output per call: 30 KB. That's 600 KB sitting in your conversation, re-sent 50 times over. 30 MB of input tokens on tool output alone.
That's where your limit goes.
Eventually the context window fills up and the agent compacts. Summarizes the conversation. Throws away the originals. You start over.
drop at 32K tokens. 11 of 12 models tested. Your agent makes worse decisions as context fills up.
NoLiMa, 2025
spent re-establishing context after each reset. Developers re-explain their codebase from scratch, every time.
Internal telemetry
per year for a 50-seat team on an agent that forgets every 20 minutes. $100/seat/month, burned on re-sent tool output.
50-seat reference org
§ 02 · the solution
context-mode is an MCP server that sits between your agent and its tools. When a tool call would dump large output into the conversation, context-mode intercepts it and runs it in a sandboxed subprocess. Raw data never touches context. It goes into a local FTS5 database with BM25 ranking. The agent searches it when it needs to.
The plugin hooks into five points in the session lifecycle:
curl / wget, redirect large output to sandbox.§ 03 · results
Real numbers from real sessions. Not projections.
Without context-mode, those 20 tool calls put 600 KB into the conversation. Over 50 turns, 30 MB re-sent. With it, the same calls produce 20 KB. Over 50 turns, 1 MB. Same work. Same answers. 30× fewer tokens.
§ 04 · paradigm
Sandboxing tool output handles the result side. But there's still a problem on the input side. When the agent wants to analyze 47 files, it calls Read() on each. Every file enters context. Every file stays.
v1.0.64 made this a rule, not a suggestion. We call it Think in Code. The idea fits in one sentence: if you need to analyze, count, filter, or process data, write code that does it. Stop pulling raw data into context to process mentally.
The LLM writes a script. The script runs in a sandbox. Only stdout enters context. Your CPU does the work for free. Tokens cost money.
§ 05 · continuity
Most tools lose everything when context compacts. context-mode does not. Every file operation, every decision, every error and its fix gets written to a local SQLite database. Next session, it loads back in.
26 events carry over through each reset. Nothing lost. You do not explain your codebase twice.
§ 06 · for engineering orgs
Org observability for AI-assisted engineering. Same plugin your team already runs — opt-in event forwarding into a private workspace. $20 / seat / month. Built on the structural events context-mode already captures locally.
platform.context-mode.com →
§ 07 · platforms
Works on Claude Code, Cursor, Copilot, Codex, Gemini CLI, and ten more. Platforms with hook support get automatic routing. The rest use instruction-based rules. Same plugin everywhere.
§ 08 · security
context-mode runs entirely on your local machine. No cloud. No telemetry. No accounts. No tracking. Your code, prompts, file paths, session data stay on your disk. Source-available under Elastic License 2.0.
curl, wget, inline HTTP, rm -rf blocked at PreToolUse. Redirected to sandbox or denied.
60+ env vars stripped from sandbox execution. API keys, tokens, secrets never exposed to subprocess.
No data sent anywhere. No insight. No heartbeat. No phone-home. Fully air-gapped operation.
Elastic License 2.0. Full source on GitHub. Audit the code yourself. No compiled blobs.
§ 09 · install
No configuration. No accounts. No tokens. Install the plugin, restart your editor, the routing engine takes over on the next tool call.
/plugin marketplace add mksglu/context-mode
npm install -g context-mode
Open source · Elastic License 2.0 · Node.js 18+ · Works with Bun