Why one prompt isn't enough

A single check tells you one thing: did the engine name you on this exact question, this exact minute.

Useful. Not enough.

These engines aren't deterministic. Same prompt twice, you'll get two slightly different answers. Same prompt on a different day, sometimes a different brand entirely. Profound admits this in their own marketing. So does Anthropic. The output drifts with model temperature, retrieval context, and whatever just got indexed.

Quick example. We ran "best AI brand monitoring tool under $50" against ChatGPT five times in one Tuesday morning. Got Otterly 3 times, Watchtower 2 times, Peec 1 time, Profound 0 times. Same prompt, same engine, same morning. Five different stories. A single sample tells you almost nothing about your standing.

So a "no mention" result on a one-shot check could be a real gap. Or it could be a Tuesday. You can't tell which without a baseline.

The five engines work very differently

Matters for strategy, not just for picking which monitoring tool to buy.

ChatGPT is mostly trained-data with optional Bing on top. If you're not in the training cut and Bing doesn't surface you for the query, you don't appear. Tools that only check ChatGPT (Profound, $99/mo) measure one closed system with a slow refresh cadence.
Claude also draws from training data, but it won't name a specific product unless the prompt asks for one. A neutral mention in Claude is rare. Usually a stronger quality signal than a positive mention in ChatGPT.
Perplexity cites web sources live. Mentions here track whatever's currently ranking on the web. Shipped a hit blog post last week? Perplexity sees it before ChatGPT does.
Gemini is training data plus live Google. Fastest to pick up fresh content. Most sensitive to schema markup, structured data, and your Google Business profile.
Grok (xAI) leans heavily on live X/Twitter plus its own training cut. Brands with active founder presence or community traction on X tend to surface here even when they're invisible to ChatGPT. The blind spot most competitor tools have.

The refresh cadence matters too. ChatGPT's training cut updates a few times a year. Getting into it takes either organic momentum or a Bing-surfaced source the model picks up. Claude updates on a similar timeline. Perplexity is real-time. Your blog post from this morning could be cited by lunch if Google indexed it. Gemini sits in between, leaning on Google's freshness signals. Grok skews toward whatever's trending on X right now — a viral thread can move you in hours.

So if you ship content tomorrow, the engines react in this order: Perplexity (hours), Grok (hours, if it lands on X), Gemini (days), Claude (next training cycle), ChatGPT (next training cycle). Plan accordingly. Don't measure Perplexity wins against ChatGPT timelines.

Strong in Perplexity, weak in ChatGPT? Your content's distribution is the bottleneck. Strong in ChatGPT, weak in Gemini? Your structured data is. You don't see any of that pattern from a single check on a single engine.

What "mentioned" actually means

Case-insensitive substring match. If your brand name shows up anywhere in the engine's response, we count it. That's the whole algorithm.

We tell you that on every digest. Other tools in this category claim "AI-classified" detection and won't publish the methodology. Read between the lines.

Substring matching misses paraphrases ("the project management one"). It false-positives on common-word names. If you're called "Lever," it'll register every literal lever the engine mentions. We don't pretend the classifier is smarter than it is. We give you the raw response and let you check. Got a common-word brand name? Drop your domain in the optional field. Domain matches are a stronger signal and we'll flag the false-positive risk in the result.

Sentiment is best-effort. Lightweight classifier. Three buckets: positive, neutral, negative. Wrong sometimes. Useful as a rough sort.

What we don't track yet, on purpose:

Share of voice. Counting your mentions versus competitor mentions per prompt is on the roadmap. v1 just tells you whether you appeared.
Citation links. Perplexity lists source URLs. We surface the full text response but don't yet break out which URLs got cited. v2.
Paraphrase detection. If an engine says "the design tool with the keyboard shortcuts" instead of "Linear," we miss it. Hard problem, working on it.

We'll add these. The reason we shipped v1 with substring matching is that the alternative is silently using an unpublished AI classifier, which is what every other tool in this category does. We think that's gross.

When a single check is the right tool

Three reasons to run a one-shot:

Confirming an engine can find you at all.
Checking how the engines describe you vs. one specific competitor.
Sanity-checking a high-stakes prompt before a launch or a sales call.

Everything else needs a trend line. Tracking drift, catching regressions, justifying a content investment to anyone holding a budget. The cheapest version of a trend line is doing this check yourself, manually, every Monday, across 25 prompts, in a spreadsheet. Try it for three weeks. You'll stop. Everyone stops.

That's why Watchtower exists. $49/mo for all five engines including Grok, no per-engine add-ons — and the LLM bill still comes in around $0.06 per customer per week. We don't run an enterprise sales org.

How to actually read the weekly digest

Each Monday's digest shows the diff from the previous week. The thing to look for is movement, not the absolute numbers. Going from 6/25 to 9/25 across Perplexity in one week means a piece of content landed and is getting cited. Going from 12/25 to 4/25 means something shifted in the training data, or someone is pushing a new competitor harder.

Three actions to take, in rough order of frequency:

Movement up. Figure out what content drove it. Do more of that.
Movement down. Check the excerpts. Sometimes you're getting mentioned as a foil ("better than X") and the substring match still counts that as a hit. Sometimes the engines started citing a competitor instead. Different fixes.
Flat for three weeks across all five engines. Your content distribution is the bottleneck, not the engines. Ship something. Pitch a podcast. Get cited somewhere new.

Most weeks you'll spend two minutes glancing at the email. The point of the digest is to flag the weeks where you should spend more.

How AI answer monitoring differs from SEO

Google Analytics and Search Console measure traffic that already happened. Someone searched, clicked, landed on your site. You see them.

AI answer monitoring measures the upstream layer. Whether your brand shows up in the answer that decides if the user clicks anywhere at all. ChatGPT, Claude, Perplexity, Gemini, and Grok increasingly answer the question directly. The traffic that used to land on your site never gets there. If you only watch downstream analytics, you miss the entire shift.

It's not a replacement for SEO. It's a parallel surface. Both matter. AI-answer mentions are roughly correlated with traditional SERP rank for now, but the correlation is loosening as RAG-based engines (Perplexity, Gemini live) start citing sources that Google ranks lower but that match the user's intent more precisely.

Who built this and why

We were running our own brands. Paid Profound. Got priced out. Tried Peec. Got priced out again. Most of the value in those tools was running 25 prompts against five engines every week and emailing a diff. So we wrote the script, scheduled it on a cron, and sold the result for what the API bill costs plus a sane margin.

The roadmap is whatever our own brands need next. Share-of-voice, citation breakdowns, paraphrase detection. Real changes shipped to real customers. No demo gate. No SDR. Cancel link in every email.

Tool	Entry $/mo	Engines (entry)	Self-serve?
Watchtower	$49	All 5 (incl. Grok, no add-ons)	Yes, 60s
Otterly Lite	$29	3 (Gemini paid)	Yes
Profound	$99	1 (ChatGPT only)	No, demo
Peec.ai	~$95	2 (Claude/Gemini extra)	Yes
Athena	$295	8 (credits-based)	Yes
Brandlight / Bluefish	Contact sales	5+	No, Fortune 500

Use case	One prompt	Watchtower
Curiosity check — am I in there at all?	✓	—
Pre-investor diligence — do I show up under buyer queries?	—	✓
Pre-launch sanity check before a campaign	✓	—
Tracking the impact of new content / schema changes	—	✓
Competitor watch (are they pulling ahead?)	—	✓
Catching a sudden drop after a model update	—	✓
Validating a single hypothesis quickly	✓	—

Is your brand mentioned in ChatGPT, Claude, Perplexity, Gemini, and Grok?

Why Watchtower at $49?

Why one prompt isn't enough

The five engines work very differently

What "mentioned" actually means

When a single check is the right tool

How to actually read the weekly digest

How AI answer monitoring differs from SEO

Who built this and why

How AI engines actually decide which brands to mention.

Parametric memory

Retrieval-augmented context

Recency + freshness

Comparative grounding

Decision matrix: free check vs. weekly Watchtower.

Watchtower runs this same check across 25 of your prompts, every Monday.

Common questions about AI brand monitoring.