Glossary

LLM-Score

LLM-Score is a 0–100 metric showing how correctly and how often large language models mention your brand for your topic prompts.

A 0–100 score that combines mention rate, correctness, and sentiment of how LLMs answer about your brand.
See your own LLM-Score in the demo report.

Definition

LLM-Score is a composite 0–100 metric produced by Getllmspy that shows how correctly and how often major language models — ChatGPT, Gemini, Perplexity, Claude, YandexGPT, Alice, GigaChat, DeepSeek — mention your brand when users ask questions in your category. Unlike classic SEO rankings, LLM-Score measures presence inside generated answers, not inside SERP lists. It rolls up three signals: mention rate across a prompt pack, factual correctness (no hallucinations), and tone.

Scale benchmarks

Range	How to read it
0–20	Brand is effectively invisible in LLM answers
21–45	Occasional mentions, often with factual errors
46–65	Steady presence in a subset of models
66–80	Strong visibility with rare hallucinations
81–100	Category-leading presence

Median LLM-Score across the Getllmspy dataset is 38 (April 2026 snapshot).

How it's computed

End-to-end, the pipeline looks like this:

Prompt pack — a fixed set of category questions (your brand is not pasted into the literal prompt text), see prompt pack.
Model fan-out — each model in coverage runs the same scripted steps; answers are stored as dated snapshots.
Per-answer signals — was the brand mentioned appropriately, how factually consistent is the narrative, what is the sentiment, and is there a penalty for clear contradictions.
Normalization — signals are scaled to 0–1 per model and prompt, weighted (traffic + regional relevance), then aggregated into one headline number for the brand and topic.
Reporting — the UI shows the headline score, a per-model table, and quotes; you should always read the number next to the underlying answers.

The formula and charts below are didactic; production weights can differ while keeping the same structure.

A simplified aggregation model

Illustrative only — production weights and normalization may differ, but the idea is the same: three signals minus a hallucination penalty.

Schematic formula

LLM-Score ≈ 100 × ( α·M + β·C + γ·S − δ·H )

M is the share of answers with a correct brand mention; C ∈ [0,1] is factual consistency; S is normalized sentiment; H ∈ [0,1] is a hallucination penalty; α+β+γ+δ = 1.

Headline score (example)

Illustrative — not your live score.

Presence

Share of answers that mention the brand appropriately

Factual consistency

Alignment with known facts

Normalized sentiment

Positive / neutral / negative blend

What you see in a typical report

Model roll-up

LLM-Score (snapshot): 67

WoW change: +5

Largest lift: YandexGPT (+12 quote hits)

Risk: Perplexity — wrong site URL in 2/10 answers

Answer quote

“In this category, brands X and Y are named most often; your brand appears in the context of …” — verbatim text explains *why* the score moved.

From a dated snapshot: model label + timestamp

How it works in practice

What to open first in a report

Per-model roll-up — where the score moved after a GEO launch or an AEO content refresh.
Quotes — one or two sentences from the model often explain jumps better than the scalar: wrong HQ address, confused sibling brand, etc.
Prompt-level breakdown — a high LLM-Score can still hide failures on a handful of “money” prompts; fix those scenarios before polishing the average.

Mini walkthrough

Suppose LLM-Score = 67 vs 62 last week: the model table shows YandexGPT adding +12 correct-name quotes while Perplexity still swaps your domain — prioritize org-card and source fixes before writing more blog posts.

Related lenses: GPI for overall visibility pressure, Share of Voice when competitive mention share matters most.

How to read it

Use the scale above as a rule of thumb. Always pair the score with qualitative quotes from the report: a high number with toxic context still needs a content response.

LLM-Score vs Share of Voice

LLM-Score blends correctness and sentiment with presence. Share of Voice only measures how often your brand appears compared to competitors. A brand can win SoV and still have a low LLM-Score if the models misquote it.

When to use

Tracking weekly after a llms.txt or schema change.
Comparing how different models see you (ChatGPT vs YandexGPT).
Reporting to leadership: one number per brand per month.

Share of Voice (SoV)GPI™ (Generative Presence Index)Model coverage Hallucination

See your brand's LLM-Score