SaltPepper classifies every message with a local Gemma2 2B model, then routes to the cheapest Claude tier that handles it without quality loss. Runs fully native on Windows.
Every message is classified by Gemma2 2B locally before a single API call is made. Expensive models only activate when the task genuinely requires them.
| Tier | Model | Cost | Best for |
|---|---|---|---|
| LOCAL | Gemma2 2B via Ollama | Free | Greetings, simple Q&A, definitions, quick lookups |
| FAST | Claude Haiku 4.5 | $1 / MTok | Quick code snippets, syntax help, boilerplate generation |
| MED | Claude Sonnet 4.6 | $3 / MTok | Debugging, refactoring, multi-step reasoning, explanations |
| HIGH | Claude Opus 4.6 | $5 / MTok | System architecture, security review, deep research, novel problems |
The router intercepts every Claude Code message before it hits Anthropic's API. Classification runs locally in milliseconds.
Gemma2 2B runs via ollama on your machine. It reads the message
and assigns a complexity class: LOCAL, FAST, MED,
or HIGH. No API call, no latency, no cost.
The router maps the class to the cheapest model that can handle it well. Routine messages stay local. Complex tasks escalate automatically. You never configure this manually.
Claude Code receives the response as normal. A subtle header shows which tier was used and estimated cost, so you always know where your tokens went.
The original SaltPepper used LiteRT for on-device inference, which requires Linux or Mac. This port replaces LiteRT with Ollama, which ships a native Windows binary and supports GPU acceleration out of the box.
Ollama's Windows build handles model download, VRAM management, and serving via a local REST endpoint.
CUDA and ROCm are supported automatically. Gemma2 2B fits in 4 GB VRAM with room to spare.
No compiled extensions. A single setup_win.py registers the router as a Claude Code hook.
Same routing logic and tier definitions as the original. Switch from Linux/Mac with no prompt changes.
Requires Python 3.10+, an Anthropic API key in your environment, and Ollama installed on Windows.
# Install the Ollama Python client PS> pip install ollama # Pull Gemma2 2B — ~1.6 GB download, runs on CPU or GPU PS> ollama pull gemma2:2b
Ollama for Windows: download the installer from ollama.com/download/windows. The Ollama service starts automatically on login.
PS> git clone https://github.com/akbknight/saltpepper-win.git PS> cd saltpepper-win PS> python setup_win.py
The setup script writes a Claude Code hook configuration and verifies Ollama is reachable at localhost:11434 before finishing.
PS> claude # SaltPepper is now intercepting every message. # Tier and estimated cost appear in the response header.
Set ANTHROPIC_API_KEY in your environment before launching. SALTPEPPER_DEFAULT_TIER can override automatic routing if needed.