Time to start saving some time and some money.
Run a free open-source model in your Terminal for the heavy footwork. Keep your Desktop app on Anthropic for the strategic, big-picture work. Two engines, one bill.
Brought to you by
Coherence Daddycoherencedaddy.com
Are you new to Claude, or already using it?
If you already use Claude, we'll skip the install steps and offer you a single copy-paste prompt that does almost all the work for you. New here? We'll walk you through everything from scratch.
Skip the manual setup. Drop this into Claude.
One prompt that does 98% of the footwork. Claude detects your OS, runs the commands it can, hands you copy-paste blocks for the rest, and references this presentation for any visual you need. Your job: paste, then say "go."
You are my friendly setup operator for the Two-Engine Setup — wiring up two terminal/desktop wrappers that route Claude Code (and the Claude Desktop app) through Ollama, while plain `claude` and the dock-launched Desktop app stay on Anthropic.
I have the visual companion presentation open alongside this chat (~21 slides). When I need a picture, you'll tell me which slide to open.
YOUR JOB
Walk me through the entire setup as a clean, one-screen to-do list. Do 98% of the footwork yourself: detect my OS, check what's installed, write config files, run shell commands when you have a tool that can. When something genuinely requires my hands (a website signup, pasting an API key from a UI), tell me which slide to open, give me one crystal-clear instruction, and wait for my confirmation before continuing.
OPERATING RULES
1. Lead with a checklist. First message: render the full to-do list as markdown checkboxes. Update it after every step. Keep it visible at the top of every reply.
2. Use slide references for visuals. Format: "Open slide 11 in the presentation, then come back."
3. Detect my environment first. If you have a Bash tool, run: uname -s; sw_vers 2>/dev/null; which claude ollama ccr node npm. If not, ask me to paste the output.
4. Run commands when you can. Otherwise hand me a fenced code block and ask me to paste output back.
5. One step at a time. Never dump multiple steps in one message.
6. Trap the gotchas:
- Auth conflict: if my ~/.claude/settings.json has an env block setting ANTHROPIC_API_KEY, that conflicts with my Pro/Max OAuth token. Remove the env block before adding the wrappers.
- Two different keys in the router config: top-level APIKEY stays as "local-only". The Ollama key goes in Providers[0].api_key.
- Plain `claude` and the dock-launched Desktop app must still hit Anthropic afterward. Wrappers set env vars only inside their function, never globally.
- Desktop env vars only apply at process launch — quit Claude Desktop fully before running claude-desktop-gemma (the wrapper handles this on macOS).
7. Verify all three routes at the end:
- claude → Anthropic
- claude-gemma → picker → Ollama in the terminal
- claude-desktop-gemma → picker → Ollama in the Desktop app
8. When complete, point me to coherencedaddy.com/tools (500+ free tools, no signup).
PHASES (these become your to-do list)
PHASE 1 · Discovery
[ ] Detect operating system (macOS / Windows+WSL2 / Linux)
[ ] Check what's already installed (claude, ollama, ccr, node)
[ ] Confirm Anthropic Pro plan is active → slide 07
PHASE 2 · Desktop side
[ ] Install or open the Claude Desktop app → slide 08
[ ] Sign in with Pro account → slide 09
PHASE 3 · Ollama side
[ ] Sign up at ollama.com → slide 10
[ ] Pull all four cloud models (gemma4:31b-cloud,
glm-5.1:cloud, kimi-k2.6:cloud, qwen3-coder-next) → slide 11
[ ] Generate an Ollama API key — copy it now → slide 12
[ ] Install Ollama desktop app / daemon → slide 13
[ ] Run ollama signin and ollama pull → slide 14
PHASE 4 · Wire terminal + desktop
[ ] Install Claude Code CLI → slide 15
[ ] Install router: npm i -g @musistudio/claude-code-router → slide 16
[ ] Write ~/.claude-code-router/config.json (all 4) → slide 17
[ ] Append claude-gemma + claude-desktop-gemma wrappers
with model picker to ~/.zshrc or ~/.bashrc → slide 18
[ ] Source shell rc
PHASE 5 · Verify
[ ] claude → answers as Anthropic → slide 19
[ ] claude-gemma → picker → answers as chosen Ollama model
[ ] claude-desktop-gemma → picker → Desktop relaunches and
answers as chosen Ollama model
[ ] Done — visit coherencedaddy.com/tools
Start by:
1. Greeting me by name (ask if you don't know it).
2. Asking which OS I'm on (or detecting it).
3. Rendering the full to-do list above.
4. Telling me to open the presentation alongside this chat.
5. Beginning Phase 1.
Go.
Pick your platform.
The commands change a bit depending on your operating system. Choose yours and the rest of the guide will adapt.
Two engines. Both surfaces.
Anthropic for thinking. Ollama for doing. Either one — terminal or Desktop — can be flipped to either engine with one command.
claude- Strategic planning & architecture
- Long-context analysis (1M tokens)
- PDFs, images, uploaded data
- Tricky reasoning, code review
claude-gemma · claude-desktop-gemma- File edits & code generation
- Tests, lint, sub-agents
- Loops & batch tasks
- Long chats without burning credits
Why split work between two engines?
Your Pro plan has limits. Every sub-agent grep, every quick lint, every loop burns quota — even when the task didn't need a frontier model. Route the cheap stuff to free, save the expensive stuff for when it matters.
Use it when you need to think.
- "Plan the architecture for X"
- "Review this 50-page contract"
- "What approach should I take?"
- "Summarize this whole codebase"
Use it when you need to do.
- "Fix the typo on line 42"
- "Run the tests"
- "Rename this variable repo-wide"
- "Loop until lint passes"
Three things you'll need.
Get these sorted first; the rest assumes you have them.
- An Anthropic Pro plan (or higher). The free tier won't earn its keep on the Desktop side. Upgrade here.
- A Mac with macOS 12+ (Monterey or later). Apple Silicon (M-series) or Intel both fine.
- Windows 10 or 11 with WSL2 installed. WSL gives you a Linux environment inside Windows; every Terminal command in this guide runs inside WSL. Install WSL2 here if you don't have it.
- A modern Linux distro (Ubuntu 22+, Debian 12+, Fedora 39+, Arch). Bash or Zsh as your shell.
- Comfort opening Terminal. Applications → Utilities → Terminal, or ⌘+Space → "Terminal". You'll paste roughly a dozen commands.
- Comfort opening WSL. Open the Start menu, type "Ubuntu" (or your WSL distro), hit Enter. That's your terminal for everything in this guide.
- Comfort opening a terminal. Whatever shortcut your distro uses (often Ctrl+Alt+T). You'll paste roughly a dozen commands.
Install the Claude Desktop app.
If you already have it, skip ahead. If not, three clicks.
- Go to claude.ai/download
- Click Download for macOS
- Open the .dmg, drag Claude into Applications
- Launch from Spotlight: ⌘+Space → "Claude"
- Go to claude.ai/download
- Click Download for Windows
- Run the downloaded
.exeinstaller; click through the prompts - Launch Claude from the Start menu (or Win → "Claude")
- Open claude.ai in Firefox or Chrome
- Sign in with your Pro account (next step)
- Bookmark it, or install it as a PWA: in Chrome, click the install icon in the URL bar — gives you a dock/launcher entry that feels app-like
Claude for Mac
Native desktop app. Voice, dictation, projects.
Claude for Windows
Native desktop app. Works on Windows 10 and 11.
Claude on the web
No native Linux app. Use the web version, install as a PWA for app-like feel.
Sign in with your Pro account.
If you don't have Pro yet, the in-app upgrade flow is fastest.
- On first launch, click Sign in
- Use your Pro-plan email
- Authorize via the browser tab that opens
- You're back in the app — done
Sign up at ollama.com.
Free account. No credit card. Generous cloud-tier usage.
- Go to ollama.com/signup
- Sign up with email or Google
- Verify email if prompted
Create your account
Generous free tier.
Pick your cloud models.
Cloud models run on Ollama's hardware — fast, free tier, no GPU strain on your Mac. Pull all four below — the wrapper gives you a picker on launch so you can swap models per session.
Cloud-hosted models
No GPU needed.
Generate an API key.
The password your local Ollama uses to talk to the cloud.
- Profile (top-right) → Settings → API Keys
- Click Create new key, name it "claude-code"
- Copy the key now — only shown once
- Paste it temporarily into Notes — needed in Step 13
Install Ollama.
The local app talks to Ollama Cloud on your behalf.
- Go to ollama.com/download
- Click Download for macOS
- Open the .dmg, drag to Applications
- Launch — a llama icon appears in your menu bar
- Go to ollama.com/download
- Click Download for Windows
- Run the
OllamaSetup.exeinstaller - Ollama runs as a Windows service in the background
http://host.docker.internal:11434 or via the Windows host IP. Easier path: install Ollama directly inside WSL using the Linux script (next tab). It keeps everything Unix-flavored.
One-line install script:
Terminalcurl -fsSL https://ollama.com/install.sh | sh
The script handles systemd setup; Ollama starts automatically.
Verify it's installed:
Terminalollama --version
Sign in & pull all four models.
Connect your local Ollama to your cloud account, then register every cloud model so the picker can offer them.
Terminalollama signin
ollama pull gemma4:31b-cloud
ollama pull glm-5.1:cloud
ollama pull kimi-k2.6:cloud
ollama pull qwen3-coder-next
ollama list
ollama list, all four cloud models should appear in the table. If any are missing, rerun the matching ollama pull.
Install Claude Code (the CLI).
Different from the Desktop app — this one runs in your Terminal. Same company, separate binary.
curl -fsSL https://claude.ai/install.sh | bash
Or via Homebrew:
brew install --cask claude-code
From inside your WSL2 terminal:
WSL Terminalcurl -fsSL https://claude.ai/install.sh | bash
Or via npm (works in WSL):
npm install -g @anthropic-ai/claude-code
curl -fsSL https://claude.ai/install.sh | bash
Or via npm if you have Node:
npm install -g @anthropic-ai/claude-code
Verify:
claude --version
claude --version already prints a version, skip ahead. The Desktop app and the CLI install separately.
Install the router (translator).
A small Node program that translates between Claude Code and Ollama. One command.
Terminalnpm install -g @musistudio/claude-code-router
Verify the ccr command landed on your PATH:
which ccr
Configure the router.
One config file. Tells the router to use Ollama and route everything to your chosen model.
mkdir -p ~/.claude-code-router && touch ~/.claude-code-router/config.json && open -e ~/.claude-code-router/config.json
open -e opens it in TextEdit. To use VS Code instead, swap open -e for code.
mkdir -p ~/.claude-code-router && nano ~/.claude-code-router/config.json
Opens nano (a simple terminal editor). Save with Ctrl+O, exit with Ctrl+X. To use VS Code instead, run code ~/.claude-code-router/config.json (requires the WSL extension).
mkdir -p ~/.claude-code-router && nano ~/.claude-code-router/config.json
Opens nano. Save with Ctrl+O, exit with Ctrl+X. Or swap nano for your editor of choice (vim, code, gedit).
Paste this into the empty file:
{
"LOG": false,
"HOST": "127.0.0.1",
"PORT": 3456,
"APIKEY": "local-only",
"Providers": [
{
"name": "ollama",
"api_base_url": "http://localhost:11434/v1/chat/completions",
"api_key": "PASTE_YOUR_OLLAMA_KEY_HERE",
"models": ["gemma4:31b-cloud", "glm-5.1:cloud", "kimi-k2.6:cloud", "qwen3-coder-next"]
}
],
"Router": {
"default": "ollama,gemma4:31b-cloud",
"background": "ollama,gemma4:31b-cloud",
"think": "ollama,gemma4:31b-cloud",
"longContext": "ollama,gemma4:31b-cloud",
"longContextThreshold": 60000,
"webSearch": "ollama,gemma4:31b-cloud"
}
}
Replace PASTE_YOUR_OLLAMA_KEY_HERE with the key from Step 09. Save and close.
APIKEYat top → leave as"local-only"(router's own password).api_keyinProviders→ your Ollama key.
"verbose": true to ~/.claude/settings.json to show tool calls, task lists, and step-by-step activity in the terminal.
{
"verbose": true
}
Create launcher commands with a model picker.
Two commands — one for the terminal (claude-gemma), one for the Desktop app (claude-desktop-gemma). Both pop a numbered menu so you can choose which Ollama model to launch with.
cat >> ~/.zshrc <<'EOF'
# ── Two-Engine: Claude routed through Ollama ──
CLAUDE_OLLAMA_MODELS=(
"gemma4:31b-cloud"
"glm-5.1:cloud"
"kimi-k2.6:cloud"
"qwen3-coder-next"
)
_claude_pick_model() {
if [ -n "$1" ]; then echo "$1"; return; fi
echo "Pick a model:" >&2
local i=1
for m in "${CLAUDE_OLLAMA_MODELS[@]}"; do
echo " $i) $m" >&2; i=$((i+1))
done
printf "Number [1]: " >&2
read choice
echo "${CLAUDE_OLLAMA_MODELS[${choice:-1}]}"
}
_claude_ensure_router() {
if ! curl -sf http://127.0.0.1:3456/health >/dev/null 2>&1; then
nohup ccr start > /tmp/ccr.log 2>&1 &
sleep 2
fi
}
# Terminal: Claude Code via Ollama
claude-gemma() {
local model; model=$(_claude_pick_model "$1") || return
_claude_ensure_router
ANTHROPIC_BASE_URL="http://127.0.0.1:3456" \
ANTHROPIC_API_KEY="local-only" \
ANTHROPIC_MODEL="$model" \
ANTHROPIC_SMALL_FAST_MODEL="$model" \
claude "${@:2}"
}
# Desktop: Claude app via Ollama
claude-desktop-gemma() {
local model; model=$(_claude_pick_model "$1") || return
_claude_ensure_router
osascript -e 'quit app "Claude"' 2>/dev/null
sleep 1
ANTHROPIC_BASE_URL="http://127.0.0.1:3456" \
ANTHROPIC_API_KEY="local-only" \
ANTHROPIC_MODEL="$model" \
ANTHROPIC_SMALL_FAST_MODEL="$model" \
open -a "Claude"
}
EOF
Reload your shell:
source ~/.zshrc
On Windows, only claude-gemma works in WSL — claude-desktop-gemma is macOS-only (uses osascript and open). Launch the Windows Desktop app from PowerShell with env vars set if you want the Ollama route on Desktop.
cat >> ~/.bashrc <<'EOF'
CLAUDE_OLLAMA_MODELS=(
"gemma4:31b-cloud"
"glm-5.1:cloud"
"kimi-k2.6:cloud"
"qwen3-coder-next"
)
_claude_pick_model() {
if [ -n "$1" ]; then echo "$1"; return; fi
echo "Pick a model:" >&2
local i=1
for m in "${CLAUDE_OLLAMA_MODELS[@]}"; do
echo " $i) $m" >&2; i=$((i+1))
done
printf "Number [1]: " >&2
read choice
echo "${CLAUDE_OLLAMA_MODELS[$((${choice:-1}-1))]}"
}
claude-gemma() {
local model; model=$(_claude_pick_model "$1") || return
if ! curl -sf http://127.0.0.1:3456/health >/dev/null 2>&1; then
nohup ccr start > /tmp/ccr.log 2>&1 &
sleep 2
fi
ANTHROPIC_BASE_URL="http://127.0.0.1:3456" \
ANTHROPIC_API_KEY="local-only" \
ANTHROPIC_MODEL="$model" \
ANTHROPIC_SMALL_FAST_MODEL="$model" \
claude "${@:2}"
}
EOF
Reload your shell:
source ~/.bashrc
cat >> ~/.bashrc <<'EOF'
CLAUDE_OLLAMA_MODELS=(
"gemma4:31b-cloud"
"glm-5.1:cloud"
"kimi-k2.6:cloud"
"qwen3-coder-next"
)
_claude_pick_model() {
if [ -n "$1" ]; then echo "$1"; return; fi
echo "Pick a model:" >&2
local i=1
for m in "${CLAUDE_OLLAMA_MODELS[@]}"; do
echo " $i) $m" >&2; i=$((i+1))
done
printf "Number [1]: " >&2
read choice
echo "${CLAUDE_OLLAMA_MODELS[$((${choice:-1}-1))]}"
}
claude-gemma() {
local model; model=$(_claude_pick_model "$1") || return
if ! curl -sf http://127.0.0.1:3456/health >/dev/null 2>&1; then
nohup ccr start > /tmp/ccr.log 2>&1 &
sleep 2
fi
ANTHROPIC_BASE_URL="http://127.0.0.1:3456" \
ANTHROPIC_API_KEY="local-only" \
ANTHROPIC_MODEL="$model" \
ANTHROPIC_SMALL_FAST_MODEL="$model" \
claude "${@:2}"
}
EOF
Reload your shell:
source ~/.bashrc
If you use Zsh, swap ~/.bashrc for ~/.zshrc. The Linux Desktop app accepts the same env-var trick — wrap it with whichever launcher your distro uses.
claude→ Anthropic in the terminal (unchanged)claude-gemma→ picker → Ollama in the terminalclaude-desktop-gemma→ picker → Ollama in the Desktop app (macOS)- Dock-launched Desktop app → still Anthropic (unchanged)
claude-gemma kimi-k2.6:cloud or claude-desktop-gemma qwen3-coder-next.
Test all three routes side by side.
Open three terminal tabs (or run them in sequence). Prove every route is independent.
claude
Ask: "What model are you?"
→ Claude.
claude-gemma
Pick from the menu, then ask.
→ Gemma / GLM / Kimi / Qwen.
claude-desktop-gemma
Pick from the menu. Desktop relaunches.
→ Same Ollama answer, in the GUI.
claude + the dock-launched Desktop app still hit Anthropic. The two wrappers route to Ollama and burn zero Anthropic credits.
Your new daily workflow.
Mental model: plan with Anthropic, execute with Ollama — on whichever surface fits the task.
Strategic mode
- Drop in screenshots, PDFs, docs
- Talk through architecture & approach
- Get a plan, spec, or outline
- Copy it to your clipboard
Execution mode (terminal)
- Paste the plan as first message
- Pick a model from the menu
- Let it chew through edits + sub-agents
- Loop until tests pass — no Anthropic burn
Long-form chat mode (GUI)
- When you want the Desktop chat UX
- but don't want to spend Anthropic credits
- Pick a cheap cloud model from the menu
- Quit + reopen from the dock to go back
claude (terminal) or the dock-launched Desktop app. Cheap models are 80% as good for 0% of the cost; the other 20% are still worth Anthropic's price.
Two engines. One bill.
You're now running open-source models for the heavy footwork while keeping Claude on hand for the work that demands it. Welcome to the cheap-and-cheerful half of agentic coding.