← Back to blog

A practical checklist for evaluating “quantum advantage” claims

A no-hype framework: what to ask, what numbers matter, and how to spot apples-to-oranges comparisons.

evaluationclaimsbenchmarks

“Quantum advantage” is overloaded. Sometimes it means “we ran a circuit,” sometimes it means “we beat the best classical method.” If you want to evaluate a claim quickly (and fairly), use this checklist.

1) What is the problem, precisely?

  • Is it a real-world task (chemistry, optimization, simulation) or a synthetic benchmark?
  • Is the problem statement fully specified (input distribution, accuracy target, success probability)?
  • Is there a clear metric: runtime, cost, energy, accuracy, or a combination?

If the task is vague, you’re not evaluating performance—you’re evaluating storytelling.

2) What is being compared?

Ask for the exact comparison class:

  • Quantum device vs classical hardware (which CPU/GPU? which cluster?)
  • Quantum algorithm vs best-known classical algorithm (or a baseline?)
  • End-to-end runtime vs oracle query count vs circuit depth (these are not interchangeable)

If the classical baseline is “naive,” the result may still be interesting—but it’s not “advantage” in the common sense.

3) What resources are counted on each side?

For quantum, the common “hidden costs” are:

  • Shots (samples) required to estimate an answer
  • Compilation overhead (extra depth / SWAPs)
  • Error mitigation (extra circuits, rescaling, post-processing)

For classical, look for:

  • Hardware specs (and whether it’s CPU vs GPU vs cluster)
  • Parallelism assumptions
  • Precision / accuracy target

4) Does the classical side get to improve?

The strongest results compare against state-of-the-art classical methods as of the time of publication. If the classical method is outdated, ask: “what happens if we update the baseline?”

5) Is the quantum result about scaling or a single point?

One impressive datapoint is not the same as a scalable advantage.

Ask:

  • What happens as the problem size grows?
  • Does noise grow with size in a way that breaks the method?
  • Is the claimed runtime dominated by a part that scales poorly (shots, mitigation, routing)?

6) What’s the success criterion and confidence?

Quantum experiments are statistical.

  • Is success defined as “beats a threshold” or “matches a target distribution”?
  • What is the confidence interval?
  • How sensitive is the result to calibration drift?

7) Is the “advantage” in the thing you care about?

Sometimes the claim is true but narrow:

  • “We beat a classical sampler on this distribution” (interesting)
  • but not “we solved a practical optimization problem faster” (different claim)

Translate the headline into a single sentence you’d be willing to defend:

“Using device (X), we solved task (Y) to accuracy (A) at cost (C), compared to the best known classical approach (Z) at cost (C’).”

If you can’t write that sentence from the paper or blog post, the claim is probably doing too much work.

A simple “scorecard” (quick gut check)

  • Clear problem + metric: yes / no
  • Strong classical baseline: yes / no
  • End-to-end cost included: yes / no
  • Scaling evidence: yes / no
  • Robust statistics: yes / no

Even a “no” can still be valuable research—but it’s not a reason to believe we’re suddenly “done” with the hard parts.