Benchmarks
1 billion vectors. Single GPU. 100% exact recall.
Measured H100 80GB. No approximation. Every number cryptographically signed.
1,000,000,000
vectors on a single GPU
38.5ms
p50 latency
100%
Recall@10
73 GB
VRAM
46.5x
Reduction
Scale ceiling
fp16 | Scale Tier
QTT compression converges as the dataset grows. More data, better recall.
| Scale | Build | Serving | VRAM | p50 | p99 | R@10 |
|---|---|---|---|---|---|---|
| 400M | 485s | 26.4 GB | 29.7 GB | 15.8ms | 18.0ms | 96% |
| 500M | 699s | 33.0 GB | 36.9 GB | 20.5ms | 24.4ms | 98% |
| 600M | 813s | 39.6 GB | 44.1 GB | 23.4ms | 26.3ms | 99% |
| 800M | 1,122s | 52.8 GB | 58.5 GB | 31.0ms | 33.5ms | 99% |
| 1B | 1,418s | 66.0 GB | 72.9 GB | 38.5ms | 41.0ms | 100% |
Why recall improves
More vectors provide richer structure for the tensor train to exploit. At 1B entries the compression reaches full convergence — a mathematical property, not a tuning parameter.
Production
fp32 | 100% recall at every tested scale
| Scale | Build | Serving | VRAM | p50 | p99 | R@10 |
|---|---|---|---|---|---|---|
| 200M | 354s | 25.8 GB | 28.1 GB | 34.9ms | 39.7ms | 100% |
| 300M | 347s | 39.6 GB | 44.2 GB | 46.3ms | 47.5ms | 100% |
| 400M | 495s | 52.8 GB | 58.6 GB | 61.5ms | 62.6ms | 100% |
| 500M | 622s | 66.0 GB | 73.0 GB | 76.5ms | 77.7ms | 100% |
Exact
fp64 | Full double-precision arithmetic
| Scale | Build | p50 | p99 | R@10 | VRAM | B/entry |
|---|---|---|---|---|---|---|
| 1K | 0.0s | 0.70ms | 1.97ms | 100% | 0 MB | 420 |
| 100K | 0.2s | 0.86ms | 1.45ms | 100% | 26 MB | 278 |
| 1M | 2.1s | 1.18ms | 5.10ms | 100% | 264 MB | 277 |
| 10M | 24.2s | 2.68ms | 4.66ms | 100% | 2.6 GB | 277 |
| 50M | 164s | 9.37ms | 10.88ms | 100% | 12.9 GB | 277 |
| 100M | 104s | 20.40ms | 21.50ms | 100% | 25.8 GB | 277 |
| 200M | 212s | 40.40ms | 41.80ms | 100% | 50.4 GB | 271 |
Comparison
vs HNSW at 10M vectors
Milvus / Qdrant / Weaviate. D=384.
HX-SDP (fp32)
HNSW (fp32)
HNSW is faster at p50 for approximate search. HX-SDP delivers exact results with 5–15x less memory. At 100M+ the latency gap closes as HNSW degrades and HX-SDP scales linearly.
Durability
24-hour soak test
Continuous mixed workload. S-10M, H100 80GB.
8.6M
Inserts
2.9M
Queries
99.99992%
Success rate (9 errors)
86.3s
Cold restart (mean of 5)
VRAM drifted 17.2% over 24 hours (4.75 GB to 5.03 GB). Bounded, not zero. These are the real numbers.
Transparency
Honest tradeoffs
Economics
Flat cost. No per-query billing.
One GPU per client. The cost is the GPU. No namespace multipliers. No usage-based scaling. No per-query metering.
Every access pattern (search, analytics, inference, streaming) runs on the same representation in VRAM. One bill, one runtime.