Review Standards

Reporting standards, external review, and benchmark cases

Use this page to decide whether a case is even ready to report, understand what an external review can responsibly infer, and compare real-world situations against structured benchmark examples.

Check Reporting Readiness Browse Benchmark Cases See Review Limits

Use it for

Reporting decisions, review discipline, and benchmark comparison.

What it avoids

Public accusation theater, crowd verdicts, and false certainty.

What it produces

Clearer standards, bounded conclusions, and reusable case structure.

Approach

Evidence-based external review

Last reviewed

March 25, 2026

Roles Readiness Approach Limits Benchmarks

Start from your role

The standards stay the same, but the first useful question changes depending on who is arriving here.

I think I should report a case

Start by checking whether the concern is evidence-based or just emotionally vivid.

Use the reporting readiness check before submitting anything.
Keep your report narrow: specific games, specific observations, no speculation.
Assume that some critical platform-only evidence is unavailable to you.

Reporting readiness check

Use this before you submit a report. It is designed to slow weak cases down and make stronger cases more disciplined.

Should this be reported?

Check only what is actually true about the case in front of you.

I am looking at multiple games or a broader pattern, not one surprising result.I can point to specific games, moments, or timestamps instead of describing a vague feeling.The concern is based on observable patterns such as timing anomalies or repeated engine-like alignment.I am not basing this only on rating gain, title gap, or one strong tactical performance.I intend to use an official reporting channel rather than public accusation or social pressure.I can describe the concern in a short, factual, proportionate summary.I understand that external review and public PGNs do not include internal platform telemetry.

Assessment

This does not produce a verdict. It tells you whether the case is ready for responsible escalation.

Current band

Insufficient basis

0% ready

The current case is not ready for a responsible report. The evidence is too thin or too vague.

Next step: Collect a clearer pattern or do not report yet.

Reporting readiness0 of 7 signals

Checklist score

0 / 7

What is still missing

I am looking at multiple games or a broader pattern, not one surprising result.
I can point to specific games, moments, or timestamps instead of describing a vague feeling.
The concern is based on observable patterns such as timing anomalies or repeated engine-like alignment.
I am not basing this only on rating gain, title gap, or one strong tactical performance.
I intend to use an official reporting channel rather than public accusation or social pressure.
I can describe the concern in a short, factual, proportionate summary.
I understand that external review and public PGNs do not include internal platform telemetry.

Suggested report framing

I am not ready to submit a report yet because the current concern is still too thin or too vague to summarize responsibly.

Transparent review approach

External review is most useful when it is explicit about evidence, uncertainty, and what it cannot know.

Explainability

A review should state what evidence was considered, what was not available, and how the conclusion was reached.

Proportionality

Stronger claims require stronger evidence. Weak signals can justify caution, not public certainty.

Uncertainty

A serious review system must allow inconclusive outcomes. Not every suspicious game supports a verdict.

Right to respond

When possible, the reviewed party should be able to provide context, correction, or explanation.

Evidence layers

Keep evidence categories separate so readers can see what is actually carrying the conclusion.

Game-level signals

Move quality, complexity handling, and timing patterns across a meaningful sample.

Match or account pattern

Whether suspicious play repeats across events, time controls, openings, or rating bands.

Behavioral context

Known device setup, browser extensions, transmission conditions, or platform process notes when available.

Unavailable platform-only data

Internal telemetry, device fingerprints, and proprietary anti-abuse signals that external reviewers usually cannot see.

Confidence bands

The output should be a bounded concern level, not theater.

Insufficient basis

The available evidence does not support a fair-play conclusion.

Limited concern

Some indicators warrant caution, but the case is weak or incomplete.

Material concern

Multiple indicators align and justify deeper review or formal escalation.

High concern

The evidence is strong and coherent, though still bounded by what an external reviewer can actually observe.

Limits of external review

This is the trust-preserving part. The methodology should be clear about what it cannot settle.

External review usually cannot see internal platform telemetry, device fingerprinting, anti-abuse heuristics, or account-link data.

Public PGNs can suggest concern patterns, but they do not reliably prove innocence or guilt by themselves.

One brilliant result, one upset, or one high-accuracy game is rarely enough to support a cheating claim.

Any external conclusion should stay narrower than what a platform with internal evidence may be able to determine.

Benchmark walkthroughs

These examples are meant to train judgment. Compare the shape of the case, not just the headline impression.

Benchmark caseInsufficient basis

Single brilliant game

A player produces one very high-accuracy win against a stronger opponent.

Available evidence

One public game with unusually strong tactical accuracy.
A result that feels surprising relative to rating gap.

Unavailable evidence

Broader game sample showing whether the pattern repeats.
Any platform-only timing or device context beyond the visible game record.

Alternative explanations considered

A well-prepared opening line or a genuinely sharp tactical day.
Noise from a tiny sample rather than a real pattern.

Assessment

The point is not to sound certain. The point is to show why the conclusion lands where it does.

Confidence band

Insufficient basis

Why it lands there

One striking result can justify curiosity, but not a cheating conclusion. The sample is too small and too easy to overread.

What would change the conclusion

The case would strengthen only if the same signals repeated across multiple games or events.

Reusable case template

Future benchmark cases should follow one schema so readers can audit the reasoning instead of inferring it.

Claim and scope

State exactly what is being assessed: one game, a match, a tournament stretch, or a broader account pattern.

Available data

List the evidence that was available and explicitly note what an external reviewer could not see.

Observed indicators

Group the actual observations by category such as move quality, timing, repetition, or environment context.

Alternative explanations

Show what innocent explanations were considered and why they were or were not persuasive.

Conclusion and confidence

Keep the final conclusion narrow and pair it with a confidence band rather than a dramatic binary label.

Publication rules

Benchmark publication should build trust without turning the site into an accusation archive.

Remove names, usernames, event identifiers, and other traceable details.
Publish only the evidence needed to illustrate the reasoning, not a dossier of every available fact.
Frame the case as a methodology example, not a public accusation archive.