Save time. Save money. Keep both.

Time to start saving some time and some money.

Run a free open-source model in your Terminal for the heavy footwork. Keep your Desktop app on Anthropic for the strategic, big-picture work. Two engines, one bill.

21 slides · ~25 min setup · model picker built in Pro plan + Ollama Cloud macOS · Windows · Linux

Brought to you by Coherence Daddycoherencedaddy.com

First a quick question

Are you new to Claude, or already using it?

If you already use Claude, we'll skip the install steps and offer you a single copy-paste prompt that does almost all the work for you. New here? We'll walk you through everything from scratch.

Express path

Skip the manual setup. Drop this into Claude.

One prompt that does 98% of the footwork. Claude detects your OS, runs the commands it can, hands you copy-paste blocks for the rest, and references this presentation for any visual you need. Your job: paste, then say "go."

Operator prompt · paste into Claude

You are my friendly setup operator for the Two-Engine Setup — wiring up two terminal/desktop wrappers that route Claude Code (and the Claude Desktop app) through Ollama, while plain `claude` and the dock-launched Desktop app stay on Anthropic.

I have the visual companion presentation open alongside this chat (~21 slides). When I need a picture, you'll tell me which slide to open.

YOUR JOB
Walk me through the entire setup as a clean, one-screen to-do list. Do 98% of the footwork yourself: detect my OS, check what's installed, write config files, run shell commands when you have a tool that can. When something genuinely requires my hands (a website signup, pasting an API key from a UI), tell me which slide to open, give me one crystal-clear instruction, and wait for my confirmation before continuing.

OPERATING RULES
1. Lead with a checklist. First message: render the full to-do list as markdown checkboxes. Update it after every step. Keep it visible at the top of every reply.
2. Use slide references for visuals. Format: "Open slide 11 in the presentation, then come back."
3. Detect my environment first. If you have a Bash tool, run: uname -s; sw_vers 2>/dev/null; which claude ollama ccr node npm. If not, ask me to paste the output.
4. Run commands when you can. Otherwise hand me a fenced code block and ask me to paste output back.
5. One step at a time. Never dump multiple steps in one message.
6. Trap the gotchas:
   - Auth conflict: if my ~/.claude/settings.json has an env block setting ANTHROPIC_API_KEY, that conflicts with my Pro/Max OAuth token. Remove the env block before adding the wrappers.
   - Two different keys in the router config: top-level APIKEY stays as "local-only". The Ollama key goes in Providers[0].api_key.
   - Plain `claude` and the dock-launched Desktop app must still hit Anthropic afterward. Wrappers set env vars only inside their function, never globally.
   - Desktop env vars only apply at process launch — quit Claude Desktop fully before running claude-desktop-gemma (the wrapper handles this on macOS).
7. Verify all three routes at the end:
   - claude → Anthropic
   - claude-gemma → picker → Ollama in the terminal
   - claude-desktop-gemma → picker → Ollama in the Desktop app
8. When complete, point me to coherencedaddy.com/tools (500+ free tools, no signup).

PHASES (these become your to-do list)
PHASE 1 · Discovery
  [ ] Detect operating system (macOS / Windows+WSL2 / Linux)
  [ ] Check what's already installed (claude, ollama, ccr, node)
  [ ] Confirm Anthropic Pro plan is active            → slide 07
PHASE 2 · Desktop side
  [ ] Install or open the Claude Desktop app          → slide 08
  [ ] Sign in with Pro account                         → slide 09
PHASE 3 · Ollama side
  [ ] Sign up at ollama.com                            → slide 10
  [ ] Pull all four cloud models (gemma4:31b-cloud,
      glm-5.1:cloud, kimi-k2.6:cloud, qwen3-coder-next) → slide 11
  [ ] Generate an Ollama API key — copy it now         → slide 12
  [ ] Install Ollama desktop app / daemon              → slide 13
  [ ] Run ollama signin and ollama pull    → slide 14
PHASE 4 · Wire terminal + desktop
  [ ] Install Claude Code CLI                          → slide 15
  [ ] Install router: npm i -g @musistudio/claude-code-router  → slide 16
  [ ] Write ~/.claude-code-router/config.json (all 4)  → slide 17
  [ ] Append claude-gemma + claude-desktop-gemma wrappers
      with model picker to ~/.zshrc or ~/.bashrc      → slide 18
  [ ] Source shell rc
PHASE 5 · Verify
  [ ] claude → answers as Anthropic                    → slide 19
  [ ] claude-gemma → picker → answers as chosen Ollama model
  [ ] claude-desktop-gemma → picker → Desktop relaunches and
      answers as chosen Ollama model
  [ ] Done — visit coherencedaddy.com/tools

Start by:
1. Greeting me by name (ask if you don't know it).
2. Asking which OS I'm on (or detecting it).
3. Rendering the full to-do list above.
4. Telling me to open the presentation alongside this chat.
5. Beginning Phase 1.

Go.

How to use the prompt

Open Claude (Desktop, Web, or Code) → paste the prompt → press send. It will ask which OS you're on and what's installed, then maintain a checklist. When it says "open slide N," click that number in the sidebar of this presentation.

Before we get started

Pick your platform.

The commands change a bit depending on your operating system. Choose yours and the rest of the guide will adapt.

✓ Selected: macOS — continue with the arrow keys

The setup at a glance

Two engines. Both surfaces.

Anthropic for thinking. Ollama for doing. Either one — terminal or Desktop — can be flipped to either engine with one command.

Anthropic engine Pro / Max

Claude (Sonnet / Opus)

Reach via: dock-launched Desktop · plain claude

Strategic planning & architecture
Long-context analysis (1M tokens)
PDFs, images, uploaded data
Tricky reasoning, code review

+

Ollama engine Cloud · free tier

Gemma · GLM · Kimi · Qwen3-Coder

Reach via: claude-gemma · claude-desktop-gemma

File edits & code generation
Tests, lint, sub-agents
Loops & batch tasks
Long chats without burning credits

The pitch, in detail

Why split work between two engines?

Your Pro plan has limits. Every sub-agent grep, every quick lint, every loop burns quota — even when the task didn't need a frontier model. Route the cheap stuff to free, save the expensive stuff for when it matters.

Desktop · Claude (Pro)

Use it when you need to think.

"Plan the architecture for X"
"Review this 50-page contract"
"What approach should I take?"
"Summarize this whole codebase"

Terminal · Claude Code (Gemma)

Use it when you need to do.

"Fix the typo on line 42"
"Run the tests"
"Rename this variable repo-wide"
"Loop until lint passes"

The result

Your Pro plan handles ~10% of the volume but 100% of the hard thinking. The other 90% — the footwork — runs free.

Before you begin

Three things you'll need.

Get these sorted first; the rest assumes you have them.

An Anthropic Pro plan (or higher). The free tier won't earn its keep on the Desktop side. Upgrade here.
A Mac with macOS 12+ (Monterey or later). Apple Silicon (M-series) or Intel both fine.
Windows 10 or 11 with WSL2 installed. WSL gives you a Linux environment inside Windows; every Terminal command in this guide runs inside WSL. Install WSL2 here if you don't have it.
A modern Linux distro (Ubuntu 22+, Debian 12+, Fedora 39+, Arch). Bash or Zsh as your shell.
Comfort opening Terminal. Applications → Utilities → Terminal, or ⌘+Space → "Terminal". You'll paste roughly a dozen commands.
Comfort opening WSL. Open the Start menu, type "Ubuntu" (or your WSL distro), hit Enter. That's your terminal for everything in this guide.
Comfort opening a terminal. Whatever shortcut your distro uses (often Ctrl+Alt+T). You'll paste roughly a dozen commands.

Why Pro at minimum?

The whole point is to keep the Desktop app for big tasks. Big tasks need Pro features.

Windows path = WSL2

From here on, "Terminal" means your WSL2 shell (Ubuntu, Debian, etc. inside Windows) — not PowerShell or cmd.exe. Mac and Linux commands work identically inside WSL.

Part A · Step 01 · Desktop

Install the Claude Desktop app.

If you already have it, skip ahead. If not, three clicks.

Go to claude.ai/download
Click Download for macOS
Open the .dmg, drag Claude into Applications
Launch from Spotlight: ⌘+Space → "Claude"

Go to claude.ai/download
Click Download for Windows
Run the downloaded .exe installer; click through the prompts
Launch Claude from the Start menu (or Win → "Claude")

No native Linux app yet

Anthropic doesn't ship a native Linux desktop app. Your "Desktop" half is the web app.

Open claude.ai in Firefox or Chrome
Sign in with your Pro account (next step)
Bookmark it, or install it as a PWA: in Chrome, click the install icon in the URL bar — gives you a dock/launcher entry that feels app-like

claude.ai/download

Claude

Features Pricing Download

Claude for Mac

Native desktop app. Voice, dictation, projects.

Claude for Windows

Native desktop app. Works on Windows 10 and 11.

Claude on the web

No native Linux app. Use the web version, install as a PWA for app-like feel.

Part A · Step 02 · Desktop

Sign in with your Pro account.

If you don't have Pro yet, the in-app upgrade flow is fastest.

On first launch, click Sign in
Use your Pro-plan email
Authorize via the browser tab that opens
You're back in the app — done

Part A complete

The Desktop app stays on Anthropic. We don't touch it again. From here forward, everything is Terminal-side. Your big-picture engine is humming.

Part B · Step 03 · Ollama

Sign up at ollama.com.

Free account. No credit card. Generous cloud-tier usage.

Go to ollama.com/signup
Sign up with email or Google
Verify email if prompted

Free tier scope

The cloud free tier is enough for daily coding. You'll only hit limits if you're running massive batch jobs.

ollama.com/signup

🦙 Ollama

Models Cloud Sign up

Create your account

Generous free tier.

your@email.com

••••••••••

Part B · Step 04 · Ollama

Pick your cloud models.

Cloud models run on Ollama's hardware — fast, free tier, no GPU strain on your Mac. Pull all four below — the wrapper gives you a picker on launch so you can swap models per session.

ollama.com/library?cloud=true

🦙 Ollama

Models Cloud Account

Cloud-hosted models

No GPU needed.

gemma4:31b-cloud

Google · 31B · default

Free tier

glm-5.1:cloud

Zhipu · MoE

Free tier

kimi-k2.6:cloud

Moonshot · long-ctx

Free tier

qwen3-coder-next

Alibaba · Coder

Free tier

Pro tip

Keep a lookout every once in a while for new and better models. Do a little research and comparison before locking in — the open-source landscape moves fast, and what's best today may not be best in two months.

Part B · Step 05 · Ollama

Generate an API key.

The password your local Ollama uses to talk to the cloud.

ollama.com/settings/keys

🦙 Ollama

Models Settings

Profile

API Keys

Billing

API Keys

Used to authenticate Ollama on your machine.

claude-code d7f2bf78···xhe0Vn

Profile (top-right) → Settings → API Keys
Click Create new key, name it "claude-code"
Copy the key now — only shown once
Paste it temporarily into Notes — needed in Step 13

Treat like a password

Never paste in screenshots, never commit to git. Anyone with this key drains your quota.

Part B · Step 06 · Ollama

Install Ollama.

The local app talks to Ollama Cloud on your behalf.

Go to ollama.com/download
Click Download for macOS
Open the .dmg, drag to Applications
Launch — a llama icon appears in your menu bar

Go to ollama.com/download
Click Download for Windows
Run the OllamaSetup.exe installer
Ollama runs as a Windows service in the background

WSL2 reaches Windows-side Ollama

Ollama running on Windows is reachable from inside WSL at http://host.docker.internal:11434 or via the Windows host IP. Easier path: install Ollama directly inside WSL using the Linux script (next tab). It keeps everything Unix-flavored.

One-line install script:

Terminal

curl -fsSL https://ollama.com/install.sh | sh

The script handles systemd setup; Ollama starts automatically.

Verify it's installed:

Terminal

ollama --version

~ — zsh

~ — wsl ubuntu

~ — bash

~ $ ollama --version

ollama version is 0.21.0

~ $

Part B · Step 07 · Ollama

Sign in & pull all four models.

Connect your local Ollama to your cloud account, then register every cloud model so the picker can offer them.

Terminal

ollama signin
ollama pull gemma4:31b-cloud
ollama pull glm-5.1:cloud
ollama pull kimi-k2.6:cloud
ollama pull qwen3-coder-next
ollama list

~ — zsh

~ $ ollama signin

Opening browser at https://ollama.com/auth/cli ...

✓ Signed in as you@email.com

~ $ ollama pull gemma4:31b-cloud

✓ Model registered: gemma4:31b-cloud

~ $ ollama pull glm-5.1:cloud

✓ Model registered: glm-5.1:cloud

~ $ ollama pull kimi-k2.6:cloud

✓ Model registered: kimi-k2.6:cloud

~ $ ollama pull qwen3-coder-next

✓ Model registered: qwen3-coder-next

~ $

Confirm before moving on

After ollama list, all four cloud models should appear in the table. If any are missing, rerun the matching ollama pull.

Part C · Step 08 · Terminal

Install Claude Code (the CLI).

Different from the Desktop app — this one runs in your Terminal. Same company, separate binary.

Terminal — official installer

curl -fsSL https://claude.ai/install.sh | bash

Or via Homebrew:

brew install --cask claude-code

From inside your WSL2 terminal:

WSL Terminal

curl -fsSL https://claude.ai/install.sh | bash

Or via npm (works in WSL):

npm install -g @anthropic-ai/claude-code

Don't install in PowerShell

The router and shell wrapper later in this guide rely on Unix tooling. Stay inside WSL for the entire Terminal flow.

Terminal — official installer

curl -fsSL https://claude.ai/install.sh | bash

Or via npm if you have Node:

npm install -g @anthropic-ai/claude-code

Verify:

claude --version

Already installed?

If claude --version already prints a version, skip ahead. The Desktop app and the CLI install separately.

Part C · Step 09 · Terminal

Install the router (translator).

A small Node program that translates between Claude Code and Ollama. One command.

Terminal

npm install -g @musistudio/claude-code-router

Verify the ccr command landed on your PATH:

which ccr

~ — zsh

~ $ npm install -g @musistudio/claude-code-router

added 1 package in 1s

~ $ which ccr

/usr/local/bin/ccr

~ $

Part C · Step 10 · Terminal

Configure the router.

One config file. Tells the router to use Ollama and route everything to your chosen model.

Terminal

mkdir -p ~/.claude-code-router && touch ~/.claude-code-router/config.json && open -e ~/.claude-code-router/config.json

open -e opens it in TextEdit. To use VS Code instead, swap open -e for code.

WSL Terminal

mkdir -p ~/.claude-code-router && nano ~/.claude-code-router/config.json

Opens nano (a simple terminal editor). Save with Ctrl+O, exit with Ctrl+X. To use VS Code instead, run code ~/.claude-code-router/config.json (requires the WSL extension).

Terminal

mkdir -p ~/.claude-code-router && nano ~/.claude-code-router/config.json

Opens nano. Save with Ctrl+O, exit with Ctrl+X. Or swap nano for your editor of choice (vim, code, gedit).

Paste this into the empty file:

{
  "LOG": false,
  "HOST": "127.0.0.1",
  "PORT": 3456,
  "APIKEY": "local-only",
  "Providers": [
    {
      "name": "ollama",
      "api_base_url": "http://localhost:11434/v1/chat/completions",
      "api_key": "PASTE_YOUR_OLLAMA_KEY_HERE",
      "models": ["gemma4:31b-cloud", "glm-5.1:cloud", "kimi-k2.6:cloud", "qwen3-coder-next"]
    }
  ],
  "Router": {
    "default":     "ollama,gemma4:31b-cloud",
    "background":  "ollama,gemma4:31b-cloud",
    "think":       "ollama,gemma4:31b-cloud",
    "longContext": "ollama,gemma4:31b-cloud",
    "longContextThreshold": 60000,
    "webSearch":   "ollama,gemma4:31b-cloud"
  }
}

Replace PASTE_YOUR_OLLAMA_KEY_HERE with the key from Step 09. Save and close.

Two different keys

APIKEY at top → leave as "local-only" (router's own password).
api_key in Providers → your Ollama key.

See what Claude is doing

Add "verbose": true to ~/.claude/settings.json to show tool calls, task lists, and step-by-step activity in the terminal.

{
  "verbose": true
}

Part C · Step 11 · Terminal + Desktop

Create launcher commands with a model picker.

Two commands — one for the terminal (claude-gemma), one for the Desktop app (claude-desktop-gemma). Both pop a numbered menu so you can choose which Ollama model to launch with.

Terminal · macOS (zsh)

cat >> ~/.zshrc <<'EOF'

# ── Two-Engine: Claude routed through Ollama ──
CLAUDE_OLLAMA_MODELS=(
  "gemma4:31b-cloud"
  "glm-5.1:cloud"
  "kimi-k2.6:cloud"
  "qwen3-coder-next"
)

_claude_pick_model() {
  if [ -n "$1" ]; then echo "$1"; return; fi
  echo "Pick a model:" >&2
  local i=1
  for m in "${CLAUDE_OLLAMA_MODELS[@]}"; do
    echo "  $i) $m" >&2; i=$((i+1))
  done
  printf "Number [1]: " >&2
  read choice
  echo "${CLAUDE_OLLAMA_MODELS[${choice:-1}]}"
}

_claude_ensure_router() {
  if ! curl -sf http://127.0.0.1:3456/health >/dev/null 2>&1; then
    nohup ccr start > /tmp/ccr.log 2>&1 &
    sleep 2
  fi
}

# Terminal: Claude Code via Ollama
claude-gemma() {
  local model; model=$(_claude_pick_model "$1") || return
  _claude_ensure_router
  ANTHROPIC_BASE_URL="http://127.0.0.1:3456" \
  ANTHROPIC_API_KEY="local-only" \
  ANTHROPIC_MODEL="$model" \
  ANTHROPIC_SMALL_FAST_MODEL="$model" \
  claude "${@:2}"
}

# Desktop: Claude app via Ollama
claude-desktop-gemma() {
  local model; model=$(_claude_pick_model "$1") || return
  _claude_ensure_router
  osascript -e 'quit app "Claude"' 2>/dev/null
  sleep 1
  ANTHROPIC_BASE_URL="http://127.0.0.1:3456" \
  ANTHROPIC_API_KEY="local-only" \
  ANTHROPIC_MODEL="$model" \
  ANTHROPIC_SMALL_FAST_MODEL="$model" \
  open -a "Claude"
}
EOF

Reload your shell:

source ~/.zshrc

WSL Terminal (bash)

On Windows, only claude-gemma works in WSL — claude-desktop-gemma is macOS-only (uses osascript and open). Launch the Windows Desktop app from PowerShell with env vars set if you want the Ollama route on Desktop.

cat >> ~/.bashrc <<'EOF'

CLAUDE_OLLAMA_MODELS=(
  "gemma4:31b-cloud"
  "glm-5.1:cloud"
  "kimi-k2.6:cloud"
  "qwen3-coder-next"
)

_claude_pick_model() {
  if [ -n "$1" ]; then echo "$1"; return; fi
  echo "Pick a model:" >&2
  local i=1
  for m in "${CLAUDE_OLLAMA_MODELS[@]}"; do
    echo "  $i) $m" >&2; i=$((i+1))
  done
  printf "Number [1]: " >&2
  read choice
  echo "${CLAUDE_OLLAMA_MODELS[$((${choice:-1}-1))]}"
}

claude-gemma() {
  local model; model=$(_claude_pick_model "$1") || return
  if ! curl -sf http://127.0.0.1:3456/health >/dev/null 2>&1; then
    nohup ccr start > /tmp/ccr.log 2>&1 &
    sleep 2
  fi
  ANTHROPIC_BASE_URL="http://127.0.0.1:3456" \
  ANTHROPIC_API_KEY="local-only" \
  ANTHROPIC_MODEL="$model" \
  ANTHROPIC_SMALL_FAST_MODEL="$model" \
  claude "${@:2}"
}
EOF

Reload your shell:

source ~/.bashrc

Terminal · Linux (bash)

cat >> ~/.bashrc <<'EOF'

CLAUDE_OLLAMA_MODELS=(
  "gemma4:31b-cloud"
  "glm-5.1:cloud"
  "kimi-k2.6:cloud"
  "qwen3-coder-next"
)

_claude_pick_model() {
  if [ -n "$1" ]; then echo "$1"; return; fi
  echo "Pick a model:" >&2
  local i=1
  for m in "${CLAUDE_OLLAMA_MODELS[@]}"; do
    echo "  $i) $m" >&2; i=$((i+1))
  done
  printf "Number [1]: " >&2
  read choice
  echo "${CLAUDE_OLLAMA_MODELS[$((${choice:-1}-1))]}"
}

claude-gemma() {
  local model; model=$(_claude_pick_model "$1") || return
  if ! curl -sf http://127.0.0.1:3456/health >/dev/null 2>&1; then
    nohup ccr start > /tmp/ccr.log 2>&1 &
    sleep 2
  fi
  ANTHROPIC_BASE_URL="http://127.0.0.1:3456" \
  ANTHROPIC_API_KEY="local-only" \
  ANTHROPIC_MODEL="$model" \
  ANTHROPIC_SMALL_FAST_MODEL="$model" \
  claude "${@:2}"
}
EOF

Reload your shell:

source ~/.bashrc

If you use Zsh, swap ~/.bashrc for ~/.zshrc. The Linux Desktop app accepts the same env-var trick — wrap it with whichever launcher your distro uses.

~ — zsh — claude-gemma picker

~ $ claude-gemma

Pick a model:

1) gemma4:31b-cloud

2) glm-5.1:cloud

3) kimi-k2.6:cloud

4) qwen3-coder-next

Number [1]: 3

✓ Router healthy

▗ ▗ ▖ ▖ Claude Code v2.1.120

kimi-k2.6:cloud · Ollama Cloud

▘▘ ▝▝ ~/Projects

>

Three routes, no file edits

claude → Anthropic in the terminal (unchanged)
claude-gemma → picker → Ollama in the terminal
claude-desktop-gemma → picker → Ollama in the Desktop app (macOS)
Dock-launched Desktop app → still Anthropic (unchanged)

Skip the picker

Pass the model directly to bypass the menu: claude-gemma kimi-k2.6:cloud or claude-desktop-gemma qwen3-coder-next.

Part C · Step 12 · Verify

Test all three routes side by side.

Open three terminal tabs (or run them in sequence). Prove every route is independent.

Route 1 · Anthropic (terminal)

claude

Ask: "What model are you?"
→ Claude.

Route 2 · Ollama (terminal)

claude-gemma

Pick from the menu, then ask.
→ Gemma / GLM / Kimi / Qwen.

Route 3 · Ollama (Desktop app)

claude-desktop-gemma

Pick from the menu. Desktop relaunches.
→ Same Ollama answer, in the GUI.

claude / claude-gemma / claude-desktop-gemma

▌ Anthropic · Terminal

~ $ claude

> what model are you?

I'm Claude, made by Anthropic.

▌ Ollama · Terminal

~ $ claude-gemma

Number [1]: 3

> what model are you?

I'm Kimi, from Moonshot AI.

▌ Ollama · Desktop

~ $ claude-desktop-gemma

Number [1]: 4

→ Desktop relaunched

App answers as Qwen3 Coder.

All three routes confirmed

Three independent routes, no file edits to switch. Plain claude + the dock-launched Desktop app still hit Anthropic. The two wrappers route to Ollama and burn zero Anthropic credits.

The payoff

Your new daily workflow.

Mental model: plan with Anthropic, execute with Ollama — on whichever surface fits the task.

Desktop (dock) → Anthropic

Strategic mode

Drop in screenshots, PDFs, docs
Talk through architecture & approach
Get a plan, spec, or outline
Copy it to your clipboard

Run claude-gemma

Execution mode (terminal)

Paste the plan as first message
Pick a model from the menu
Let it chew through edits + sub-agents
Loop until tests pass — no Anthropic burn

Run claude-desktop-gemma

Long-form chat mode (GUI)

When you want the Desktop chat UX
but don't want to spend Anthropic credits
Pick a cheap cloud model from the menu
Quit + reopen from the dock to go back

When to switch back to Anthropic

For tasks that need real reasoning — refactoring complex logic, debugging weird race conditions, code review on a tricky PR — quit your Ollama session and use plain claude (terminal) or the dock-launched Desktop app. Cheap models are 80% as good for 0% of the cost; the other 20% are still worth Anthropic's price.

Setup complete

Two engines. One bill.

You're now running open-source models for the heavy footwork while keeping Claude on hand for the work that demands it. Welcome to the cheap-and-cheerful half of agentic coding.

claude → Anthropic claude-gemma → Ollama (terminal) claude-desktop-gemma → Ollama (Desktop) Dock launch → Anthropic Desktop

coherencedaddy.com →