The Five Metrics That Actually Measure AI Visibility

Every other AI visibility tool gives you a made-up rank. We give you five metrics built for how LLMs actually work. Here's what each one measures, why it matters, and what to do about it.


Metric 1 of 5

Latent Brand Association

What the model already "believes" about your brand before it ever touches the web

Every LLM has opinions about your brand.

Nobody programmed them in. They formed on their own from billions of documents the model absorbed during training.

If your brand showed up a lot in positive contexts on authoritative sites, the model learned to like you. If your competitor dominated those conversations instead, the model learned to prefer them.

That's what Latent Brand Association measures. The strength and direction of what the model has already internalized about you, your competitors, and the categories you compete in.

LBA answers one specific question: what does the model believe about you?

Is the association positive? Negative? Outdated? Dominated by a competitor?

This is different from Top of Mind, which measures whether the model recalls you at all. A brand can have strong recall but carry negative associations. Or it can have favorable associations that are too weak to surface on their own. Both matter, but they diagnose completely different problems.

Example

Ask ChatGPT: "What's the best CRM for mid-market companies?" Do it five times. If Salesforce appears in every response and HubSpot shows up in three, that's not random. The model has a stronger latent association between "mid-market CRM" and Salesforce because its training data reinforced that connection more heavily.

Why You Can't Fix This Overnight

These associations live in the model's neural weights. They were formed during training, and they stick around until the next major training cycle.

That means the model might "remember" your brand as it was two or three years ago.

You could have completely repositioned your product, launched a new category, or doubled your market share since then. The model doesn't know. It's working from a snapshot baked into its weights.

Real-World Implication

Notion spent years being associated with "personal note-taking" in online discussions. Even after their aggressive push into team workspace and enterprise features, early LLMs continued recommending Notion primarily for personal use and defaulted to Confluence or Jira for team collaboration. The training data hadn't caught up to the repositioning.

What LBA Tells You
LBA Signal What It Means What to Do
Strong positive The model naturally associates your brand with the category. Protect and reinforce. Keep building content that maintains the association.
Weak or absent The model doesn't have a meaningful opinion about your brand. Build visibility on authoritative sites that future training data will capture.
Negative or inaccurate The model "remembers" outdated or wrong information. Create corrective content at scale. Target high-authority sources the model trusts.
Competitor-dominated A competitor holds the default association for your category. Long-term content strategy to close the gap before the next training cycle.
The Bottom Line

LBA is the geological layer of AI visibility. It changes slowly, but it determines the foundation everything else is built on. If a model is biased against you, even perfect real-time content won't fully overcome it. If a model is biased toward you, you have a structural advantage your competitors can't easily replicate.


Metric 2 of 5

LLM Authority Score

How often and how prominently AI models feature your brand across dozens of responses

In traditional SEO, rank is a single number. You're position 3 for "project management software" and that's your data point.

AI doesn't work like that.

Ask the same question ten times and you'll get ten different answers with different brands, different features, and different recommendations. Any single response is noise. Authority only becomes visible across many responses.

LLM Authority Score combines two dimensions into one metric: frequency (how often your brand appears) and prominence (where in the response it shows up). Both matter. Neither alone tells the full story.

Frequency Alone Will Fool You
Example

Imagine you ask Claude "What are the best email marketing platforms?" ten times. Mailchimp appears in 9 out of 10 responses, always mentioned first or second. ConvertKit appears in 8 out of 10, but always listed last as an afterthought. Both have high frequency. But Mailchimp has high authority. ConvertKit has high frequency with low prominence, which is a very different position to be in.

One Response Means Nothing

A single AI response is a coin flip. The model might mention you first today and skip you entirely tomorrow.

Traditional "rank tracking" for AI responses fails because it treats each response like a stable SERP. It's not. The model's output is probabilistic.

LLM Authority Score solves this by measuring across many responses. Instead of reporting "you were #2 in this one response," it tells you: "Across 50 responses to this query, you appeared 72% of the time with a weighted average position of 2.3."

That's actionable. That's something you can track over time and actually improve.

Oscillation: The Hidden Danger

When a brand's Authority Score fluctuates wildly from one measurement period to the next, that's oscillation. It means the model doesn't have a stable opinion about you. Shopify's Authority Score for "best ecommerce platform" tends to be high and stable. A newer competitor like BigCommerce might show high authority one week and mediocre authority the next. The instability itself is the diagnosis: the model hasn't committed to a view about your brand yet.

What Your Authority Score Is Telling You
Pattern Diagnosis
High frequency, high prominence The model treats you as a go-to answer. You're the brand it confidently recommends.
High frequency, low prominence You're "known" but not "preferred." The model mentions you out of completeness, not conviction.
Low frequency, high prominence When the model does mention you, it's favorable. But it often forgets you entirely. Niche strength, weak reach.
Low frequency, low prominence The model doesn't know you well enough to recommend you. Start with LBA to understand why.
The Bottom Line

LLM Authority Score is the metric that replaces "AI rank tracking" with something that actually works. It accounts for the inherent randomness of AI outputs by measuring your brand's presence as a distribution, not a position. Track it over time to see whether your optimization efforts are moving the needle, or whether you're investing in strategies that aren't changing how the model sees you.


Metric 3 of 5

Top of Mind

Whether the model recalls your brand from memory, or only finds you through web search

AI systems have two ways to bring your brand into a response.

The first is recall: the model already knows about you from training data and mentions you from memory. The second is retrieval: the model searches the live web, finds your content, and pulls it in.

To the end user, these look identical. For your brand, they're completely different.

Top of Mind measures the first path. When the model can't search the web, does it still mention you?

Where LBA tells you what the model believes about your brand, TOM tells you whether it recalls you at all without external help.

Recall vs. Retrieval: The Gap Most SEOs Miss
Example

Ask ChatGPT "What are the best project management tools?" and it will mention Asana, Monday.com, and Trello. Now ask the same question with web search turned off. Do the same brands appear? Asana and Monday.com likely still show up because they have high TOM scores. They were mentioned so frequently across the training data that the model internalized them. But a newer tool like ClickUp might disappear entirely because its visibility depends on being found through real-time web search, not on being remembered.

Why This Changes Everything

Retrieval-dependent visibility is fragile.

If the AI tool decides not to search the web for a particular query (which happens), you vanish. If a competitor's content outranks yours in the retrieval step, they replace you. If the AI system changes its retrieval algorithm, your visibility shifts overnight.

Recall-based visibility is the opposite. It's encoded in the model's weights. It persists whether or not the model searches the web.

Think of it as the AI equivalent of a consumer instinctively naming your brand when asked about a category.

The Risk of Borrowed Visibility

A fintech startup publishes strong content and ranks well in Google. When AI tools use web search, they retrieve this content and cite the brand. The team celebrates their "great AI visibility." Then Perplexity changes how it weighs sources, and the brand vanishes from responses overnight. Their entire AI presence was borrowed. Zero Top of Mind. They were renting visibility they thought they owned.

The Advantage of Earned Recall

Stripe has such strong Top of Mind for "payment processing API" that virtually every LLM mentions it unprompted, regardless of whether web search is involved. This didn't happen by accident. Years of consistent developer documentation, technical blog posts, and community presence across authoritative sources meant the training data was saturated with positive Stripe associations. That's TOM you can't lose to an algorithm change.

What Your TOM Score Is Really Saying
TOM Level What It Means Strategic Implication
High TOM The model names you from memory, unprompted. Defend the position. Keep building authoritative presence for the next training cycle.
Moderate TOM The model sometimes remembers you, sometimes doesn't. Strengthen associations on high-authority sites. Frequency in training data is the lever.
Low / Zero TOM The model doesn't recall your brand without web search. Your AI presence is rented, not owned. Build a long-term content strategy targeting training-data sources.
The Bottom Line

Top of Mind separates brands with durable AI visibility from those that are one algorithm change away from disappearing. High TOM means the model genuinely knows you. Low TOM means you're dependent on retrieval, and retrieval can be taken away. For long-term AI strategy, TOM is the metric that tells you whether you're building on rock or sand.


Metric 4 of 5

Semantic Self-Sufficiency

Whether your content still makes sense after an AI model rips it out of context

AI models don't read your pages top to bottom. They shred them.

RAG systems break your content into chunks (typically 200-500 words) and evaluate each chunk on its own. The chunk that best answers the user's question gets pulled into the response. Everything else gets thrown away.

Semantic Self-Sufficiency scores how well your content chunks survive this process. Can each fragment stand on its own? Or does it need the surrounding paragraphs to make sense?

Here's what makes this metric special: unlike the other four, which measure AI behavior, this one measures your content's readiness for AI. It's the one metric you fully control.

How AI Shreds Your Content (And Why That Matters)
Fails the Self-Sufficiency Test

A SaaS pricing page has this paragraph: "Our premium plan includes all of the above features, plus advanced analytics, priority support, and custom integrations. This costs less than comparable solutions."

When the AI isolates this chunk, "all of the above features" refers to nothing. "Comparable solutions" names no competitors. "This" has no antecedent. The model has zero context for what "above" means. It either skips the chunk or hallucinates the details.

Passes the Self-Sufficiency Test

The same content rewritten: "Ahrefs' Advanced plan includes site audit, rank tracking, keyword research, content explorer, and backlink analysis, plus advanced SERP analytics, priority email support, and custom API integrations. At $399/month, it's priced below SEMrush's comparable Business plan at $449/month."

Every claim is specific. The brand is named. Pricing is concrete. Competitors are identified. An AI model can pull this chunk and use it directly without guessing at anything.

Four Patterns That Kill Your Chunks
Content Pattern Why It Fails Self-Sufficient Alternative
"As mentioned above..." The model doesn't have "above." The chunk is isolated. Restate the specific point being referenced.
"Our tool does this better." "This" has no referent. "Better" than what? "Figma reduces design handoff time by 60% compared to static mockup tools like InVision."
"We offer competitive pricing." Vague. The model can't cite a claim with no substance. "Datadog's Pro plan starts at $15/host/month, compared to New Relic's equivalent at $25/host/month."
"See the table below for details." There is no "below" in an isolated chunk. Inline the key data points directly in the paragraph.
Your Chunk vs. Their Chunk

When a RAG system retrieves ten candidate chunks to answer a question, the specific, self-contained ones get selected. The vague ones get thrown out.

Your chunk isn't graded on a curve. It's in a head-to-head against the other nine chunks competing for the same answer.

Why This Matters in Practice

Imagine a user asks an AI tool: "How much does Zapier cost for a small team?" Two chunks compete. Chunk A says "Our plans start at a competitive price point for small teams." Chunk B says "Zapier's Starter plan costs $19.99/month and includes 750 tasks for up to 10 users." The model picks Chunk B every time. Not because the source is more authoritative, but because the chunk is more useful in isolation.

The Bottom Line

Semantic Self-Sufficiency is how you write for a world where AI reads your content in fragments, not in full. Every paragraph needs to carry its own meaning, name its own brands, cite its own numbers, and make its own claims. The content that wins in AI retrieval isn't always the most comprehensive. It's the most self-contained.


Metric 5 of 5

Citation Capture Rate

How often the model actually uses your content in its answer, not just finds it

Being indexed isn't enough. Being retrieved isn't enough.

The only thing that matters is whether the model includes your content in the response the user actually sees.

Citation Capture Rate measures that final conversion: from retrieved candidate to cited source.

Think of it as a funnel. Your content gets indexed by AI systems. A subset gets retrieved when relevant queries come in. But only a fraction of what's retrieved actually makes it into the generated response. This metric measures that last, critical step.

Where Your Content Dies in the Funnel
Example

A user asks Perplexity: "What's the best password manager for families?" The system retrieves chunks from 15 different sources: Wirecutter reviews, Reddit threads, product comparison pages, vendor sites. But the response only cites 4 of them. Your content was in the retrieval pool but didn't make the cut. That gap between "retrieved" and "cited" is exactly what Citation Capture Rate measures.

What Makes the Model Pick You

AI models select sources that reduce their uncertainty. When the model is building an answer and needs to ground a claim, it reaches for the chunk that gives it the most confidence.

Three factors determine whether your content gets picked:

Selection Factor What the Model Wants Example
Specificity Concrete facts, numbers, named entities. "1Password Family plan costs $4.99/month and supports up to 5 users" beats "1Password offers affordable family pricing."
Self-sufficiency The chunk stands alone. No context needed. A paragraph that names the product, states the claim, and provides the evidence in one block.
Alignment The chunk directly answers the user's question. A chunk about "best password manager for families" beats a generic "password manager features" list.
You're Always in a Head-to-Head

Your content is never evaluated alone. It's always competing against every other chunk that answers the same query.

The model picks whichever chunk makes its job easiest.

Loses the Head-to-Head

Your content says: "We provide excellent customer support with fast response times and dedicated account managers." A competitor's content says: "Zendesk's Professional plan includes live chat with a median first-response time of 1 minute 42 seconds, plus a dedicated account manager for teams over 50 seats." The model doesn't need to think about which one to cite.

Wins the Head-to-Head

Your content says: "Intercom's resolution bot resolves 33% of customer questions before they reach a human agent, with median response time under 30 seconds and CSAT scores averaging 94% across 25,000+ businesses." This is the kind of chunk a model reaches for. Specific, complete, and immediately citable.

Citation Rate Is the New CTR

In traditional SEO, you optimized title tags and meta descriptions to earn clicks. In AI, you optimize content chunks to earn citations.

The mechanics are different, but the principle is identical. You're competing for selection from a list of candidates. The winner is the one that gives the selector (human or model) the most confidence.

Traditional SEO (CTR) AI Visibility (Citation Rate)
Selector Human scanning search results Model evaluating retrieved chunks
What you optimize Title tag, meta description, rich snippets Content chunks: specificity, self-sufficiency, alignment
Competition 10 blue links on a SERP 10-20 candidate chunks in the retrieval pool
Failure mode Visible but not clicked Retrieved but not cited
The Bottom Line

Citation Capture Rate is where AI visibility becomes tangible. You can have great LBA, strong Authority, high TOM, and self-sufficient content, but if the model consistently picks a competitor's chunk over yours in the final selection step, you're losing the last mile. This metric closes the gap between being in the system and being in the answer.


How the Five Metrics Work Together

Each metric measures a different layer of your AI visibility. Together, they tell you exactly where you stand and why.

Metric What It Measures The Question It Answers
Latent Brand Association Pre-trained model bias toward your brand What does the model already believe about us?
LLM Authority Score Frequency + prominence across responses How seriously does the model take us overall?
Top of Mind Unprompted recall without web search Would the model mention us without looking us up?
Semantic Self-Sufficiency Fragment-level content clarity Does our content survive being pulled apart by AI?
Citation Capture Rate Final selection from the retrieval pool When the model has our content, does it actually use it?
Two Sides of the Same Problem

The five metrics split into two categories.

The first two measure what the AI model has internalized. You can't change them quickly because they depend on training cycles that take months or years.

The last two measure your content's effectiveness in real-time retrieval. You can improve them today by changing how you write.

LLM Authority Score sits in the middle, reflecting both.

Layer Metric You Control It? Timeframe to Change
Model-side Latent Brand Association Indirectly, through content that enters future training data 6-18+ months (next training cycle)
Model-side Top of Mind Indirectly, through sustained authoritative presence 6-18+ months (next training cycle)
Bridge LLM Authority Score Partially, combines model recall and retrieval performance Weeks to months (retrieval side) + months to years (recall side)
Content-side Semantic Self-Sufficiency Directly, you control how your content is structured Immediate (content changes)
Content-side Citation Capture Rate Directly, you control specificity, alignment, and self-sufficiency Weeks (as AI systems re-index your content)
Why This Matters for Strategy

When your LLM Authority Score is low, the layer model tells you where to look. If the problem is model-side (weak LBA or zero TOM), no amount of content optimization will fix it. You need a long-term strategy targeting future training data. If the problem is content-side (low SSS or CCR), you can improve it now by restructuring how you write. Misdiagnosing which layer is broken leads to wasted effort.

Diagnostic Example

A B2B SaaS company finds they have a low LLM Authority Score. Why? Three possible root causes. Their LBA is negative (the model was trained on outdated information). Their TOM is zero (the model only finds them through retrieval, and retrieval is inconsistent). Or their Citation Capture Rate is low (the model finds their content but keeps picking competitors' instead). Each diagnosis leads to a completely different strategy. Without all five metrics, you're guessing at the cause.