Skip to content

Glossary

Prompt Win Rate

Prompt Win Rate is the share of scripted prompts where your brand wins the evaluation rule — for example correct mention, positive stance, or safe factual answer.
  • Binary per prompt: did we satisfy the rule (win) or not?

  • Complements LLM-Score by highlighting which intents still fail.

Definition

Prompt Win Rate counts how many prompts in a fixed prompt pack pass a clear success criterion after models answer. The criterion can be “brand mentioned correctly”, “no hallucinated price”, “included in top-3 alternatives”, etc. Unlike a blended 0–100 score, win rate is easy to explain to stakeholders: “we won 37 of 50 category prompts this week.”

Win-rate diagnostic table

Intent clusterWin rateCommon fail pattern
Commercial58%Missing brand in shortlist
Comparison47%Wrong competitor mapping
Support/troubleshooting73%Outdated setup steps

Focus first on low-win, high-revenue clusters.

How it's computed

For each prompt × model pair, run the answer through automated checks (regex + NER + classifier) and optional human review for edge cases. Win rate = wins ÷ eligible prompts. Eligibility rules exclude prompts that are out-of-scope or blocked by safety filters so the denominator stays meaningful.

Example math

Eligible prompts = 48
Wins = 31
Prompt Win Rate = 31 / 48 = 64.6%

How it works in practice

How teams use it

  • Sprint retros — compare win rate before/after publishing new FAQ blocks.
  • Model triage — if ChatGPT wins but YandexGPT fails, invest in regional sources.
  • Pair with fanout — compute win rate across fanout queries to ensure wins are not fragile wording luck.

Rule design tips

  • Keep each win rule binary and auditable.
  • Avoid vague criteria like “good answer”.
  • Tie rule sets to business outcomes (e.g., accurate pricing, clear recommendation).

How to read it

A high win rate with toxic sentiment still needs content fixes — always read the underlying quotes next to the percentage.

2-week improvement sprint

  1. Define 3 explicit win rules per intent cluster.
  2. Review top 20 failed prompts manually.
  3. Group failures by root cause.
  4. Ship fixes to source content.
  5. Re-run and track cluster-level movement.

Practical reference bands

Prompt Win RateReading
<45%weak positioning or factual instability
45-65%mixed quality, clear optimization path
65-80%strong operational performance
80%+category-leading consistency

Interpret with model coverage and pack stability; raw percentages alone can mislead.

When reporting to executives, pair win rate with two representative failed prompts to keep the KPI grounded.

When to use

  • When leadership wants a KPI simpler than LLM-Score.
  • When you track compliance-style prompts (claims, medical, finance).
  • When agencies report weekly progress on a fixed test deck.