Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pathprotocol.finance/llms.txt

Use this file to discover all available pages before exploring further.

Accuracy claims in DeFi are usually marketing. We publish three distinct accuracy surfaces because they answer different questions and because conflating them is how bad allocation decisions get made.

The three surfaces

Test set

The accuracy on the held-out evaluation slice from training time. Predictions made on data the model has never seen. Cross-validated test-set accuracy lands in the high 60s for the 1-day directional model, with per-mechanism specialists scoring in the low 80s on their own protocol family.

Live, full universe

The accuracy across every pool we monitor, post-deployment, on forward-resolved data. This is the most conservative number and the right one for “how does the model actually behave in the wild”. Live full-universe sign-based accuracy currently sits in the low 50s for the 1-day model, lifted by a calibration retrain that landed in the last week.

Live, carve-out cohort, high confidence

The accuracy on the subset of pools where Path has the deepest coverage and the highest signal density, filtered to predictions where the model assigned ≥85% confidence. This is the headline number we cite to institutional partners because it is the operational range a Strategy Manager would actually trade. Carve-out high-confidence sign-based accuracy: ~59.5%, vs. the DeFiLlama random-walk baseline of 53.5%. The lift is statistically significant at p ≈ 0.0001.

Why the gap between test and live

Test accuracy and live accuracy diverge for three reasons:
  1. Distribution shift. Live data drifts from training data over time. We mitigate with a three-day retrain cadence and a live accuracy gate that flags drift.
  2. Per-pool heterogeneity. Some pools have richer signal coverage than others. The carve-out cohort exists to surface the operational range where coverage and signal density are strongest.
  3. Confidence-band stratification. Calibrated models concentrate accuracy at the high-confidence end. Reporting the un-stratified mean obscures that. The high-confidence subset is the relevant one for an allocator who will only act on high-confidence calls.

Aspirational ceiling

Published research on stablecoin 1-day directional classifiers tops out around 71–72%. That is the ceiling we are targeting, not 90%. Direction prediction on chaotic systems has a real ceiling and overpromising is how products lose institutional credibility.

Integrity layer

A continuous-loop verifier daemon re-computes every accuracy number shown on Path surfaces (admin dashboards, public docs, partner-facing exports) from the canonical SQL every 30 minutes. Drift greater than 0.5 percentage points pages our on-call and auto-files a ticket. This is the operational hygiene behind every number on this site—numbers that drift silently are how trust gets destroyed in a single bad demo.