Glossary
Prompt Win Rate
Binary per prompt: did we satisfy the rule (win) or not?
Complements LLM-Score by highlighting which intents still fail.
Definition
Prompt Win Rate counts how many prompts in a fixed prompt pack pass a clear success criterion after models answer. The criterion can be “brand mentioned correctly”, “no hallucinated price”, “included in top-3 alternatives”, etc. Unlike a blended 0–100 score, win rate is easy to explain to stakeholders: “we won 37 of 50 category prompts this week.”
Win-rate diagnostic table
| Intent cluster | Win rate | Common fail pattern |
|---|---|---|
| Commercial | 58% | Missing brand in shortlist |
| Comparison | 47% | Wrong competitor mapping |
| Support/troubleshooting | 73% | Outdated setup steps |
Focus first on low-win, high-revenue clusters.
How it's computed
For each prompt × model pair, run the answer through automated checks (regex + NER + classifier) and optional human review for edge cases. Win rate = wins ÷ eligible prompts. Eligibility rules exclude prompts that are out-of-scope or blocked by safety filters so the denominator stays meaningful.
Example math
Eligible prompts = 48
Wins = 31
Prompt Win Rate = 31 / 48 = 64.6%
How it works in practice
How teams use it
- Sprint retros — compare win rate before/after publishing new FAQ blocks.
- Model triage — if ChatGPT wins but YandexGPT fails, invest in regional sources.
- Pair with fanout — compute win rate across fanout queries to ensure wins are not fragile wording luck.
Rule design tips
- Keep each win rule binary and auditable.
- Avoid vague criteria like “good answer”.
- Tie rule sets to business outcomes (e.g., accurate pricing, clear recommendation).
How to read it
A high win rate with toxic sentiment still needs content fixes — always read the underlying quotes next to the percentage.
2-week improvement sprint
- Define 3 explicit win rules per intent cluster.
- Review top 20 failed prompts manually.
- Group failures by root cause.
- Ship fixes to source content.
- Re-run and track cluster-level movement.
Practical reference bands
| Prompt Win Rate | Reading |
|---|---|
| <45% | weak positioning or factual instability |
| 45-65% | mixed quality, clear optimization path |
| 65-80% | strong operational performance |
| 80%+ | category-leading consistency |
Interpret with model coverage and pack stability; raw percentages alone can mislead.
When reporting to executives, pair win rate with two representative failed prompts to keep the KPI grounded.
When to use
- When leadership wants a KPI simpler than LLM-Score.
- When you track compliance-style prompts (claims, medical, finance).
- When agencies report weekly progress on a fixed test deck.