open-source plugin · ELv2 · by B. Mert Köseoğlu

context-mode — the other half of the context problem.

Name: context-mode
Rating: 5 (6487 reviews)
Author: B. Mert Köseoğlu

An MCP plugin that sandboxes tool output and indexes it into a local FTS5 store. Your AI coding agent searches that store instead of re-sending raw bytes every turn. 15 platforms. 66,000+ developers. Runs entirely on your machine.

Get Started → View Source

↗ Context Mode Platform · Insights ELv2 HN #1 · 570+ pts Discord

66.3k+

users

56.1k+

npm

10.2k+

marketplace

platforms

—

GitHub stars

§ 01 · the problem

The context window problem.

Here's the thing about Claude Code, Cursor, Copilot, Codex, Gemini CLI. Under the hood, they all work the same way. The LLM has no memory. Every turn, the full conversation gets re-sent to the API — every file read, every command output, every grep result. It piles up.

Run gh issue list once. 59 KB of JSON comes back. About 15,000 tokens. Next turn, those 15,000 tokens get sent again. Turn after that, again. Fifty turns in, that one command has cost you 750,000 input tokens.

Now do that 20 times in a session. Average output per call: 30 KB. That's 600 KB sitting in your conversation, re-sent 50 times over. 30 MB of input tokens on tool output alone.

That's where your limit goes.

Eventually the context window fills up and the agent compacts. It summarizes the conversation. Throws away the originals. The agent forgets what it read, what it decided, what it was in the middle of building. You start over.

Accuracy

50%

accuracy drop at 32K tokens. 11 of 12 models tested. Your agent makes worse decisions as context fills up.

NoLiMa, 2025

Latency

15 min

spent re-establishing context after each reset. Developers re-explain their codebase from scratch, every time.

Internal telemetry

Cost

$60K

per year for a 50-seat team on an agent that forgets every 20 minutes. $100 / seat / month, burned on re-sent tool output.

50-seat reference org

56 KB

per snapshot

Playwright

59 KB

per query

GitHub Issues

45 KB

per analysis

Access Logs

30 MB

re-sent / session

Total Cost

§ 02 · the solution

Intercept, sandbox, index.

context-mode is an MCP server that sits between your agent and its tools. When a tool call would dump large output into the conversation, context-mode intercepts it and runs it in a sandboxed subprocess instead. Raw data never touches context. It goes into a local FTS5 database with BM25 ranking. The agent searches it when it needs to.

The plugin hooks into five points in the session lifecycle:

01PreToolUseRoute tool calls. Block curl / wget, redirect large output to sandbox.

02PostToolUseCapture events. File ops, git, errors, decisions → SessionDB.

03SessionStartRestore state. Inject resume snapshot, reload indexed knowledge.

04PreCompactPreserve state. Build snapshot right before context wipe.

05UserPromptSubmitCapture intent. Track decisions, corrections, session mode.

Every tool call passes through a routing engine. Calls to curl, wget, or inline HTTP get blocked and rerouted to the sandbox. Build tools, large file reads, web fetches — same treatment. Small commands that won't cause problems pass through untouched.

§ 03 · results

Measured, not estimated.

Real numbers from real sessions. Not projections.

99.5%

56.2 KB → 299 B

Playwright

98%

58.9 KB → 1.1 KB

GitHub Issues

99.7%

45.1 KB → 155 B

Access Logs

98%

315 KB → 5.4 KB

Full Session

Without context-mode, those 20 tool calls put 600 KB into the conversation. Over 50 turns, 30 MB re-sent. With it, the same calls produce 20 KB. Over 50 turns, 1 MB. Same work. Same answers. 30 times fewer tokens.

Watch the 3-minute demo →

§ 04 · paradigm

Think in Code.

Sandboxing tool output handles the result side. But there is still a problem on the input side. When the agent wants to analyze 47 files, it calls Read() on each. Every file enters context. Every file stays there.

v1.0.64 made this a rule, not a suggestion. We call it Think in Code. It is mandatory across all 15 platform configs. The idea fits in one sentence: if you need to analyze, count, filter, or process data, write code that does it. Stop pulling raw data into context to process mentally.

// Before: 47 files × Read() = 700 KB in context
// After: 1 ctx_execute() = 3.6 KB in context

ctx_execute("javascript", `
  const fs = require('fs');
  const files = fs.readdirSync('src').filter(f => f.endsWith('.ts'));
  files.forEach(f => {
    const lines = fs.readFileSync('src/'+f,'utf8').split('\n').length;
    console.log(f + ': ' + lines + ' lines');
  });
`);

// 47 files analyzed. 15,314 LoC processed.
// Context used: 3.6 KB. Reduction: 200×.

The LLM writes a script. The script runs in a sandbox. Only stdout enters context. Your CPU does the work for free. Tokens cost money.

§ 05 · continuity

Your AI never starts from scratch.

Most tools lose everything when context compacts. context-mode does not. Every file operation, every decision, every error and its fix gets written to a local SQLite database. Next session, it loads back in.

PostToolUse captures events across 15 categories as they happen. PreCompact builds a reference-based snapshot right before the wipe. SessionStart restores it on the next turn. The agent picks up mid-thought.

✓ Your prompts

✓ Files tracked

✓ Project rules

✓ Your decisions

✓ Git operations

✓ Errors & fixes

✓ Environment

✓ Session mode

✓ Tool patterns

26 events carry over through each reset. Nothing lost. You do not explain your codebase twice.

§ 06 · analytics

See how you actually use AI.

/ctx-insight opens a local analytics dashboard in your browser. 15+ metrics built from your session data. No cloud, no telemetry — everything stays on your machine.

✓ Tool usage breakdown

✓ Session activity timeline

✓ Error rate tracking

✓ Parallel work patterns

✓ Explore / execute ratio

✓ Mastery curve

Available as an MCP tool (ctx_insight), CLI command (context-mode insight), or slash command (/ctx-insight). Built with TanStack Router and shadcn/ui. Runs on 127.0.0.1, works with both Node.js and Bun.

§ 07 · platforms

15 platforms, one plugin.

Works on Claude Code, Cursor, Copilot, Codex, Gemini CLI, and ten more. Platforms with hook support get automatic routing. The rest use instruction-based rules. Same plugin everywhere.

Claude CodeCursorCodex CLIVS Code CopilotJetBrains CopilotGemini CLIQwen CodeKiroOpenCodeKiloCodeZedOpenClawPiOh My PiAntigravity

§ 08 · security

Privacy-first. Your code never leaves your machine.

context-mode runs entirely on your local machine. No cloud. No telemetry. No accounts. No tracking. Your code, your prompts, your file paths, your session data stay on your disk. The plugin is source-available under the Elastic License 2.0.

✓ Dangerous command blocking

curl, wget, inline HTTP, rm -rf blocked at PreToolUse. Redirected to sandbox or denied.

✓ Environment variable denylist

60+ env vars stripped from sandbox execution. API keys, tokens, secrets never exposed to subprocess.

✓ Zero network access

No data sent anywhere. No analytics. No heartbeat. No phone-home. Fully air-gapped operation.

✓ Source-available

Elastic License 2.0. Full source on GitHub. Audit the code yourself. No compiled blobs.

§ 09 · install

Two commands, two minutes.

No configuration. No accounts. No tokens. Install the plugin, restart your editor, the routing engine takes over on the next tool call.

Claude Code recommended

/plugin marketplace add mksglu/context-mode

All platforms

npm install -g context-mode

Open source · Elastic License 2.0 · Node.js 18+ · Works with Bun