When should you use Prompt Win Rate?

When leadership wants a KPI simpler than LLM-Score. When you track compliance-style prompts (claims, medical, finance). When agencies report weekly progress on a fixed test deck.

Glossary

Prompt Win Rate

Prompt Win Rate is the share of scripted prompts where your brand wins the evaluation rule — for example correct mention, positive stance, or safe factual answer.

Binary per prompt: did we satisfy the rule (win) or not?
Complements LLM-Score by highlighting which intents still fail.

Definition

Prompt Win Rate counts how many prompts in a fixed prompt pack pass a clear success criterion after models answer. The criterion can be “brand mentioned correctly”, “no hallucinated price”, “included in top-3 alternatives”, etc. Unlike a blended 0–100 score, win rate is easy to explain to stakeholders: “we won 37 of 50 category prompts this week.”

Win-rate diagnostic table

Intent cluster	Win rate	Common fail pattern
Commercial	58%	Missing brand in shortlist
Comparison	47%	Wrong competitor mapping
Support/troubleshooting	73%	Outdated setup steps

Focus first on low-win, high-revenue clusters.

How it's computed

For each prompt × model pair, run the answer through automated checks (regex + NER + classifier) and optional human review for edge cases. Win rate = wins ÷ eligible prompts. Eligibility rules exclude prompts that are out-of-scope or blocked by safety filters so the denominator stays meaningful.

Example math

Eligible prompts = 48
Wins = 31
Prompt Win Rate = 31 / 48 = 64.6%

How it works in practice

How teams use it

Sprint retros — compare win rate before/after publishing new FAQ blocks.
Model triage — if ChatGPT wins but YandexGPT fails, invest in regional sources.
Pair with fanout — compute win rate across fanout queries to ensure wins are not fragile wording luck.

Rule design tips

Keep each win rule binary and auditable.
Avoid vague criteria like “good answer”.
Tie rule sets to business outcomes (e.g., accurate pricing, clear recommendation).

How to read it

A high win rate with toxic sentiment still needs content fixes — always read the underlying quotes next to the percentage.

2-week improvement sprint

Define 3 explicit win rules per intent cluster.
Review top 20 failed prompts manually.
Group failures by root cause.
Ship fixes to source content.
Re-run and track cluster-level movement.

Practical reference bands

Prompt Win Rate	Reading
<45%	weak positioning or factual instability
45-65%	mixed quality, clear optimization path
65-80%	strong operational performance
80%+	category-leading consistency

Interpret with model coverage and pack stability; raw percentages alone can mislead.

When reporting to executives, pair win rate with two representative failed prompts to keep the KPI grounded.

When to use

When leadership wants a KPI simpler than LLM-Score.
When you track compliance-style prompts (claims, medical, finance).
When agencies report weekly progress on a fixed test deck.

Prompt pack LLM-Score Fanout queries Dated snapshot

Run a prompt pack on your brand