Engineering · Recall

100% recall without ANN. And where HNSW still wins.

·11 min read

RecallBenchmarksHNSW

"100% recall" is a marketing phrase that actually means something specific, and when you use it you should expect the reader to ask what it means. Here is the version we stand behind, followed by the cases where you should pick ANN instead.

What we mean by recall

Recall@k over a query workload is the fraction of ground-truth nearest neighbors (computed by brute-force on the full dataset) that appear in the top-k of your index for that query. When we say exact recall, we mean recall@k = 1.0 for every query in a fixed test set against brute-force on the same dataset.

We publish the test set, the dimensionality (D=384), and the workload distribution along with every receipt. The 100M H100 receipt verifies exact recall@10 across a million-query evaluation set on the QTT-Native fp32 path.

Why exact recall matters for some workloads

  • Regulated lookups. Healthcare diagnostic similarity, pharma literature search, and legal e-discovery have a hard floor on "close-enough-is-not-enough." Missing the right neighbor 2% of the time is a failure mode you cannot just audit away later.
  • Deduplication and entity resolution. If you're deduplicating records against an embedding, a 98%-recall system will merge some records and miss others in ways that depend on the ANN index's random seed. An exact system collapses that variance to zero.
  • Retrieval-for-training. If you're mining hard negatives or nearest documents to train a downstream model, ANN bias shows up as a systematic distribution shift in the training set.

Where ANN is the right answer

If your retrieval is a first-stage filter before a re-ranker, ANN at 95% recall is usually fine. The re-ranker eats the error budget. If your query rate is in the 5–10k QPS range against a small index and you care about median latency more than anything, HNSW will beat us on p50. Use the right tool.

The decision we actually hand customers

We ask four questions during scoping, and the answers tell both of us whether HX-SDP or a well-tuned HNSW is the better fit:

  • How large is the corpus? Below ~10M, HNSW is extremely well engineered and the memory story is less relevant. At 100M+ on a single node, the compression story starts to dominate.
  • Is a re-ranker downstream? If yes, ANN recall errors are usually absorbed. If no, exact recall is materially more useful.
  • What does a recall miss cost? For ML platform workloads, often not much. For clinical or legal workloads, often a lot.
  • What's the operational model? If the team already runs Postgres and Redis, standing up pgvector or a managed vector service is cheap. Adopting HX-SDP makes sense when the memory footprint or cross-service drift is already hurting.

What the fp16 tier changes

The fp16 Scale tier trades exact recall for capacity. Up to 800M entries on a single H100 node, with recall converging from 96% at 400M to 99% at 600M to 800M. The full curve is on /benchmarks and we plot it as an SVG on the same page rather than hiding it in prose. If you need scale more than you need exactness (bulk retrieval for training, coarse recall at very large N, economic pressure on VRAM) this is the tier.

What we are not claiming

We are not claiming exact recall at arbitrary scale, in arbitrary dimension, on arbitrary hardware, with arbitrary distributions. We publish the cases we've run, the hardware they ran on, and the signed manifest that proves the run happened the way we said it did. If your workload is outside those parameters, the next step is a scoped pilot, not a press release.