The Five Metrics That Actually Measure AI Visibility
Every other AI visibility tool gives you a made-up rank. We give you five metrics built for how LLMs actually work. Here's what each one measures, why it matters, and what to do about it.
Metric 1 of 5
Latent Brand Association
What the model already "believes" about your brand before it ever touches the web
Every LLM has opinions about your brand.
Nobody programmed them in. They formed on their own from billions of documents the model absorbed during training.
If your brand showed up a lot in positive contexts on authoritative sites, the model learned to like you. If your competitor dominated those conversations instead, the model learned to prefer them.
That's what Latent Brand Association measures. The strength and direction of what the model has already internalized about you, your competitors, and the categories you compete in.
LBA answers one specific question: what does the model believe about you?
Is the association positive? Negative? Outdated? Dominated by a competitor?
This is different from Top of Mind, which measures whether the model recalls you at all. A brand can have strong recall but carry negative associations. Or it can have favorable associations that are too weak to surface on their own. Both matter, but they diagnose completely different problems.
Ask ChatGPT: "What's the best CRM for mid-market companies?" Do it five times. If Salesforce appears in every response and HubSpot shows up in three, that's not random. The model has a stronger latent association between "mid-market CRM" and Salesforce because its training data reinforced that connection more heavily.
Why You Can't Fix This Overnight
These associations live in the model's neural weights. They were formed during training, and they stick around until the next major training cycle.
That means the model might "remember" your brand as it was two or three years ago.
You could have completely repositioned your product, launched a new category, or doubled your market share since then. The model doesn't know. It's working from a snapshot baked into its weights.
Notion spent years being associated with "personal note-taking" in online discussions. Even after their aggressive push into team workspace and enterprise features, early LLMs continued recommending Notion primarily for personal use and defaulted to Confluence or Jira for team collaboration. The training data hadn't caught up to the repositioning.
What LBA Tells You
| LBA Signal | What It Means | What to Do |
|---|---|---|
| Strong positive | The model naturally associates your brand with the category. | Protect and reinforce. Keep building content that maintains the association. |
| Weak or absent | The model doesn't have a meaningful opinion about your brand. | Build visibility on authoritative sites that future training data will capture. |
| Negative or inaccurate | The model "remembers" outdated or wrong information. | Create corrective content at scale. Target high-authority sources the model trusts. |
| Competitor-dominated | A competitor holds the default association for your category. | Long-term content strategy to close the gap before the next training cycle. |
The Bottom Line
LBA is the geological layer of AI visibility. It changes slowly, but it determines the foundation everything else is built on. If a model is biased against you, even perfect real-time content won't fully overcome it. If a model is biased toward you, you have a structural advantage your competitors can't easily replicate.
Metric 2 of 5
LLM Authority Score
How often and how prominently AI models feature your brand across dozens of responses
In traditional SEO, rank is a single number. You're position 3 for "project management software" and that's your data point.
AI doesn't work like that.
Ask the same question ten times and you'll get ten different answers with different brands, different features, and different recommendations. Any single response is noise. Authority only becomes visible across many responses.
LLM Authority Score combines two dimensions into one metric: frequency (how often your brand appears) and prominence (where in the response it shows up). Both matter. Neither alone tells the full story.
Frequency Alone Will Fool You
Imagine you ask Claude "What are the best email marketing platforms?" ten times. Mailchimp appears in 9 out of 10 responses, always mentioned first or second. ConvertKit appears in 8 out of 10, but always listed last as an afterthought. Both have high frequency. But Mailchimp has high authority. ConvertKit has high frequency with low prominence, which is a very different position to be in.
One Response Means Nothing
A single AI response is a coin flip. The model might mention you first today and skip you entirely tomorrow.
Traditional "rank tracking" for AI responses fails because it treats each response like a stable SERP. It's not. The model's output is probabilistic.
LLM Authority Score solves this by measuring across many responses. Instead of reporting "you were #2 in this one response," it tells you: "Across 50 responses to this query, you appeared 72% of the time with a weighted average position of 2.3."
That's actionable. That's something you can track over time and actually improve.
When a brand's Authority Score fluctuates wildly from one measurement period to the next, that's oscillation. It means the model doesn't have a stable opinion about you. Shopify's Authority Score for "best ecommerce platform" tends to be high and stable. A newer competitor like BigCommerce might show high authority one week and mediocre authority the next. The instability itself is the diagnosis: the model hasn't committed to a view about your brand yet.
What Your Authority Score Is Telling You
| Pattern | Diagnosis |
|---|---|
| High frequency, high prominence | The model treats you as a go-to answer. You're the brand it confidently recommends. |
| High frequency, low prominence | You're "known" but not "preferred." The model mentions you out of completeness, not conviction. |
| Low frequency, high prominence | When the model does mention you, it's favorable. But it often forgets you entirely. Niche strength, weak reach. |
| Low frequency, low prominence | The model doesn't know you well enough to recommend you. Start with LBA to understand why. |
The Bottom Line
LLM Authority Score is the metric that replaces "AI rank tracking" with something that actually works. It accounts for the inherent randomness of AI outputs by measuring your brand's presence as a distribution, not a position. Track it over time to see whether your optimization efforts are moving the needle, or whether you're investing in strategies that aren't changing how the model sees you.
Metric 3 of 5
Top of Mind
Whether the model recalls your brand from memory, or only finds you through web search
AI systems have two ways to bring your brand into a response.
The first is recall: the model already knows about you from training data and mentions you from memory. The second is retrieval: the model searches the live web, finds your content, and pulls it in.
To the end user, these look identical. For your brand, they're completely different.
Top of Mind measures the first path. When the model can't search the web, does it still mention you?
Where LBA tells you what the model believes about your brand, TOM tells you whether it recalls you at all without external help.
Recall vs. Retrieval: The Gap Most SEOs Miss
Ask ChatGPT "What are the best project management tools?" and it will mention Asana, Monday.com, and Trello. Now ask the same question with web search turned off. Do the same brands appear? Asana and Monday.com likely still show up because they have high TOM scores. They were mentioned so frequently across the training data that the model internalized them. But a newer tool like ClickUp might disappear entirely because its visibility depends on being found through real-time web search, not on being remembered.
Why This Changes Everything
Retrieval-dependent visibility is fragile.
If the AI tool decides not to search the web for a particular query (which happens), you vanish. If a competitor's content outranks yours in the retrieval step, they replace you. If the AI system changes its retrieval algorithm, your visibility shifts overnight.
Recall-based visibility is the opposite. It's encoded in the model's weights. It persists whether or not the model searches the web.
Think of it as the AI equivalent of a consumer instinctively naming your brand when asked about a category.
A fintech startup publishes strong content and ranks well in Google. When AI tools use web search, they retrieve this content and cite the brand. The team celebrates their "great AI visibility." Then Perplexity changes how it weighs sources, and the brand vanishes from responses overnight. Their entire AI presence was borrowed. Zero Top of Mind. They were renting visibility they thought they owned.
Stripe has such strong Top of Mind for "payment processing API" that virtually every LLM mentions it unprompted, regardless of whether web search is involved. This didn't happen by accident. Years of consistent developer documentation, technical blog posts, and community presence across authoritative sources meant the training data was saturated with positive Stripe associations. That's TOM you can't lose to an algorithm change.
What Your TOM Score Is Really Saying
| TOM Level | What It Means | Strategic Implication |
|---|---|---|
| High TOM | The model names you from memory, unprompted. | Defend the position. Keep building authoritative presence for the next training cycle. |
| Moderate TOM | The model sometimes remembers you, sometimes doesn't. | Strengthen associations on high-authority sites. Frequency in training data is the lever. |
| Low / Zero TOM | The model doesn't recall your brand without web search. | Your AI presence is rented, not owned. Build a long-term content strategy targeting training-data sources. |
The Bottom Line
Top of Mind separates brands with durable AI visibility from those that are one algorithm change away from disappearing. High TOM means the model genuinely knows you. Low TOM means you're dependent on retrieval, and retrieval can be taken away. For long-term AI strategy, TOM is the metric that tells you whether you're building on rock or sand.
Metric 4 of 5
Semantic Self-Sufficiency
Whether your content still makes sense after an AI model rips it out of context
AI models don't read your pages top to bottom. They shred them.
RAG systems break your content into chunks (typically 200-500 words) and evaluate each chunk on its own. The chunk that best answers the user's question gets pulled into the response. Everything else gets thrown away.
Semantic Self-Sufficiency scores how well your content chunks survive this process. Can each fragment stand on its own? Or does it need the surrounding paragraphs to make sense?
Here's what makes this metric special: unlike the other four, which measure AI behavior, this one measures your content's readiness for AI. It's the one metric you fully control.
How AI Shreds Your Content (And Why That Matters)
A SaaS pricing page has this paragraph: "Our premium plan includes all of the above features, plus advanced analytics, priority support, and custom integrations. This costs less than comparable solutions."
When the AI isolates this chunk, "all of the above features" refers to nothing. "Comparable solutions" names no competitors. "This" has no antecedent. The model has zero context for what "above" means. It either skips the chunk or hallucinates the details.
The same content rewritten: "Ahrefs' Advanced plan includes site audit, rank tracking, keyword research, content explorer, and backlink analysis, plus advanced SERP analytics, priority email support, and custom API integrations. At $399/month, it's priced below SEMrush's comparable Business plan at $449/month."
Every claim is specific. The brand is named. Pricing is concrete. Competitors are identified. An AI model can pull this chunk and use it directly without guessing at anything.
Four Patterns That Kill Your Chunks
| Content Pattern | Why It Fails | Self-Sufficient Alternative |
|---|---|---|
| "As mentioned above..." | The model doesn't have "above." The chunk is isolated. | Restate the specific point being referenced. |
| "Our tool does this better." | "This" has no referent. "Better" than what? | "Figma reduces design handoff time by 60% compared to static mockup tools like InVision." |
| "We offer competitive pricing." | Vague. The model can't cite a claim with no substance. | "Datadog's Pro plan starts at $15/host/month, compared to New Relic's equivalent at $25/host/month." |
| "See the table below for details." | There is no "below" in an isolated chunk. | Inline the key data points directly in the paragraph. |
Your Chunk vs. Their Chunk
When a RAG system retrieves ten candidate chunks to answer a question, the specific, self-contained ones get selected. The vague ones get thrown out.
Your chunk isn't graded on a curve. It's in a head-to-head against the other nine chunks competing for the same answer.
Imagine a user asks an AI tool: "How much does Zapier cost for a small team?" Two chunks compete. Chunk A says "Our plans start at a competitive price point for small teams." Chunk B says "Zapier's Starter plan costs $19.99/month and includes 750 tasks for up to 10 users." The model picks Chunk B every time. Not because the source is more authoritative, but because the chunk is more useful in isolation.
The Bottom Line
Semantic Self-Sufficiency is how you write for a world where AI reads your content in fragments, not in full. Every paragraph needs to carry its own meaning, name its own brands, cite its own numbers, and make its own claims. The content that wins in AI retrieval isn't always the most comprehensive. It's the most self-contained.
Metric 5 of 5
Citation Capture Rate
How often the model actually uses your content in its answer, not just finds it
Being indexed isn't enough. Being retrieved isn't enough.
The only thing that matters is whether the model includes your content in the response the user actually sees.
Citation Capture Rate measures that final conversion: from retrieved candidate to cited source.
Think of it as a funnel. Your content gets indexed by AI systems. A subset gets retrieved when relevant queries come in. But only a fraction of what's retrieved actually makes it into the generated response. This metric measures that last, critical step.
Where Your Content Dies in the Funnel
A user asks Perplexity: "What's the best password manager for families?" The system retrieves chunks from 15 different sources: Wirecutter reviews, Reddit threads, product comparison pages, vendor sites. But the response only cites 4 of them. Your content was in the retrieval pool but didn't make the cut. That gap between "retrieved" and "cited" is exactly what Citation Capture Rate measures.
What Makes the Model Pick You
AI models select sources that reduce their uncertainty. When the model is building an answer and needs to ground a claim, it reaches for the chunk that gives it the most confidence.
Three factors determine whether your content gets picked:
| Selection Factor | What the Model Wants | Example |
|---|---|---|
| Specificity | Concrete facts, numbers, named entities. | "1Password Family plan costs $4.99/month and supports up to 5 users" beats "1Password offers affordable family pricing." |
| Self-sufficiency | The chunk stands alone. No context needed. | A paragraph that names the product, states the claim, and provides the evidence in one block. |
| Alignment | The chunk directly answers the user's question. | A chunk about "best password manager for families" beats a generic "password manager features" list. |
You're Always in a Head-to-Head
Your content is never evaluated alone. It's always competing against every other chunk that answers the same query.
The model picks whichever chunk makes its job easiest.
Your content says: "We provide excellent customer support with fast response times and dedicated account managers." A competitor's content says: "Zendesk's Professional plan includes live chat with a median first-response time of 1 minute 42 seconds, plus a dedicated account manager for teams over 50 seats." The model doesn't need to think about which one to cite.
Your content says: "Intercom's resolution bot resolves 33% of customer questions before they reach a human agent, with median response time under 30 seconds and CSAT scores averaging 94% across 25,000+ businesses." This is the kind of chunk a model reaches for. Specific, complete, and immediately citable.
Citation Rate Is the New CTR
In traditional SEO, you optimized title tags and meta descriptions to earn clicks. In AI, you optimize content chunks to earn citations.
The mechanics are different, but the principle is identical. You're competing for selection from a list of candidates. The winner is the one that gives the selector (human or model) the most confidence.
| Traditional SEO (CTR) | AI Visibility (Citation Rate) | |
|---|---|---|
| Selector | Human scanning search results | Model evaluating retrieved chunks |
| What you optimize | Title tag, meta description, rich snippets | Content chunks: specificity, self-sufficiency, alignment |
| Competition | 10 blue links on a SERP | 10-20 candidate chunks in the retrieval pool |
| Failure mode | Visible but not clicked | Retrieved but not cited |
The Bottom Line
Citation Capture Rate is where AI visibility becomes tangible. You can have great LBA, strong Authority, high TOM, and self-sufficient content, but if the model consistently picks a competitor's chunk over yours in the final selection step, you're losing the last mile. This metric closes the gap between being in the system and being in the answer.
How the Five Metrics Work Together
Each metric measures a different layer of your AI visibility. Together, they tell you exactly where you stand and why.
| Metric | What It Measures | The Question It Answers |
|---|---|---|
| Latent Brand Association | Pre-trained model bias toward your brand | What does the model already believe about us? |
| LLM Authority Score | Frequency + prominence across responses | How seriously does the model take us overall? |
| Top of Mind | Unprompted recall without web search | Would the model mention us without looking us up? |
| Semantic Self-Sufficiency | Fragment-level content clarity | Does our content survive being pulled apart by AI? |
| Citation Capture Rate | Final selection from the retrieval pool | When the model has our content, does it actually use it? |
Two Sides of the Same Problem
The five metrics split into two categories.
The first two measure what the AI model has internalized. You can't change them quickly because they depend on training cycles that take months or years.
The last two measure your content's effectiveness in real-time retrieval. You can improve them today by changing how you write.
LLM Authority Score sits in the middle, reflecting both.
| Layer | Metric | You Control It? | Timeframe to Change |
|---|---|---|---|
| Model-side | Latent Brand Association | Indirectly, through content that enters future training data | 6-18+ months (next training cycle) |
| Model-side | Top of Mind | Indirectly, through sustained authoritative presence | 6-18+ months (next training cycle) |
| Bridge | LLM Authority Score | Partially, combines model recall and retrieval performance | Weeks to months (retrieval side) + months to years (recall side) |
| Content-side | Semantic Self-Sufficiency | Directly, you control how your content is structured | Immediate (content changes) |
| Content-side | Citation Capture Rate | Directly, you control specificity, alignment, and self-sufficiency | Weeks (as AI systems re-index your content) |
Why This Matters for Strategy
When your LLM Authority Score is low, the layer model tells you where to look. If the problem is model-side (weak LBA or zero TOM), no amount of content optimization will fix it. You need a long-term strategy targeting future training data. If the problem is content-side (low SSS or CCR), you can improve it now by restructuring how you write. Misdiagnosing which layer is broken leads to wasted effort.
A B2B SaaS company finds they have a low LLM Authority Score. Why? Three possible root causes. Their LBA is negative (the model was trained on outdated information). Their TOM is zero (the model only finds them through retrieval, and retrieval is inconsistent). Or their Citation Capture Rate is low (the model finds their content but keeps picking competitors' instead). Each diagnosis leads to a completely different strategy. Without all five metrics, you're guessing at the cause.

