Windows Native via Ollama

Smart routing for
Claude Code on Windows

SaltPepper classifies every message with a local Gemma2 2B model, then routes to the cheapest Claude tier that handles it without quality loss. Runs fully native on Windows.

View on GitHub Quick install

40-70%

Token cost saved

Routing tiers

Classifier cost

Local model params

Routing tiers

Four tiers. One smart router.

Every message is classified by Gemma2 2B locally before a single API call is made. Expensive models only activate when the task genuinely requires them.

Tier	Model	Cost	Best for
LOCAL	Gemma2 2B via Ollama	Free	Greetings, simple Q&A, definitions, quick lookups
FAST	Claude Haiku 4.5	$1 / MTok	Quick code snippets, syntax help, boilerplate generation
MED	Claude Sonnet 4.6	$3 / MTok	Debugging, refactoring, multi-step reasoning, explanations
HIGH	Claude Opus 4.6	$5 / MTok	System architecture, security review, deep research, novel problems

How it works

Three steps, zero friction.

The router intercepts every Claude Code message before it hits Anthropic's API. Classification runs locally in milliseconds.

01 — Classify

Local inference

Gemma2 2B runs via ollama on your machine. It reads the message and assigns a complexity class: LOCAL, FAST, MED, or HIGH. No API call, no latency, no cost.

02 — Route

Tier selection

The router maps the class to the cheapest model that can handle it well. Routine messages stay local. Complex tasks escalate automatically. You never configure this manually.

03 — Respond

Transparent output

Claude Code receives the response as normal. A subtle header shows which tier was used and estimated cost, so you always know where your tokens went.

Windows port

Built for Windows, natively.

The original SaltPepper used LiteRT for on-device inference, which requires Linux or Mac. This port replaces LiteRT with Ollama, which ships a native Windows binary and supports GPU acceleration out of the box.

[OL]

Ollama backend

Ollama's Windows build handles model download, VRAM management, and serving via a local REST endpoint.

[GPU]

GPU acceleration

CUDA and ROCm are supported automatically. Gemma2 2B fits in 4 GB VRAM with room to spare.

[PY]

Pure Python setup

No compiled extensions. A single setup_win.py registers the router as a Claude Code hook.

[RX]

Drop-in replacement

Same routing logic and tier definitions as the original. Switch from Linux/Mac with no prompt changes.

Installation

Up in three commands.

Requires Python 3.10+, an Anthropic API key in your environment, and Ollama installed on Windows.

Install Ollama and pull the classifier model

                PowerShell
              
                # Install the Ollama Python client
PS> pip install ollama

# Pull Gemma2 2B — ~1.6 GB download, runs on CPU or GPU
PS> ollama pull gemma2:2b

Ollama for Windows: download the installer from ollama.com/download/windows. The Ollama service starts automatically on login.

Clone the repo and run the setup script

                PowerShell
              
                PS> git clone https://github.com/akbknight/saltpepper-win.git
PS> cd saltpepper-win
PS> python setup_win.py

The setup script writes a Claude Code hook configuration and verifies Ollama is reachable at localhost:11434 before finishing.

Start Claude Code — routing is active

                PowerShell
              
                PS> claude

# SaltPepper is now intercepting every message.
# Tier and estimated cost appear in the response header.

Set ANTHROPIC_API_KEY in your environment before launching. SALTPEPPER_DEFAULT_TIER can override automatic routing if needed.

Smart routing for Claude Code on Windows