Windows Native via Ollama

Smart routing for
Claude Code on Windows

SaltPepper classifies every message with a local Gemma2 2B model, then routes to the cheapest Claude tier that handles it without quality loss. Runs fully native on Windows.

40-70%
Token cost saved
4
Routing tiers
$0
Classifier cost
2B
Local model params

Four tiers. One smart router.

Every message is classified by Gemma2 2B locally before a single API call is made. Expensive models only activate when the task genuinely requires them.

Tier Model Cost Best for
LOCAL Gemma2 2B via Ollama Free Greetings, simple Q&A, definitions, quick lookups
FAST Claude Haiku 4.5 $1 / MTok Quick code snippets, syntax help, boilerplate generation
MED Claude Sonnet 4.6 $3 / MTok Debugging, refactoring, multi-step reasoning, explanations
HIGH Claude Opus 4.6 $5 / MTok System architecture, security review, deep research, novel problems

Three steps, zero friction.

The router intercepts every Claude Code message before it hits Anthropic's API. Classification runs locally in milliseconds.

01 — Classify

Local inference

Gemma2 2B runs via ollama on your machine. It reads the message and assigns a complexity class: LOCAL, FAST, MED, or HIGH. No API call, no latency, no cost.

02 — Route

Tier selection

The router maps the class to the cheapest model that can handle it well. Routine messages stay local. Complex tasks escalate automatically. You never configure this manually.

03 — Respond

Transparent output

Claude Code receives the response as normal. A subtle header shows which tier was used and estimated cost, so you always know where your tokens went.

Built for Windows, natively.

The original SaltPepper used LiteRT for on-device inference, which requires Linux or Mac. This port replaces LiteRT with Ollama, which ships a native Windows binary and supports GPU acceleration out of the box.

[OL]

Ollama backend

Ollama's Windows build handles model download, VRAM management, and serving via a local REST endpoint.

[GPU]

GPU acceleration

CUDA and ROCm are supported automatically. Gemma2 2B fits in 4 GB VRAM with room to spare.

[PY]

Pure Python setup

No compiled extensions. A single setup_win.py registers the router as a Claude Code hook.

[RX]

Drop-in replacement

Same routing logic and tier definitions as the original. Switch from Linux/Mac with no prompt changes.

Up in three commands.

Requires Python 3.10+, an Anthropic API key in your environment, and Ollama installed on Windows.

1
Install Ollama and pull the classifier model
PowerShell
# Install the Ollama Python client
PS> pip install ollama

# Pull Gemma2 2B — ~1.6 GB download, runs on CPU or GPU
PS> ollama pull gemma2:2b

Ollama for Windows: download the installer from ollama.com/download/windows. The Ollama service starts automatically on login.

2
Clone the repo and run the setup script
PowerShell
PS> git clone https://github.com/akbknight/saltpepper-win.git
PS> cd saltpepper-win
PS> python setup_win.py

The setup script writes a Claude Code hook configuration and verifies Ollama is reachable at localhost:11434 before finishing.

3
Start Claude Code — routing is active
PowerShell
PS> claude

# SaltPepper is now intercepting every message.
# Tier and estimated cost appear in the response header.

Set ANTHROPIC_API_KEY in your environment before launching. SALTPEPPER_DEFAULT_TIER can override automatic routing if needed.