Accuracy claims in DeFi are usually marketing. We publish three distinct accuracy surfaces because they answer different questions and because conflating them is how bad allocation decisions get made.Documentation Index
Fetch the complete documentation index at: https://docs.pathprotocol.finance/llms.txt
Use this file to discover all available pages before exploring further.
The three surfaces
Test set
The accuracy on the held-out evaluation slice from training time.
Predictions made on data the model has never seen. Cross-validated
test-set accuracy lands in the high 60s for the 1-day directional
model, with per-mechanism specialists scoring in the low 80s on
their own protocol family.
Live, full universe
The accuracy across every pool we monitor, post-deployment, on
forward-resolved data. This is the most conservative number and
the right one for “how does the model actually behave in the
wild”. Live full-universe sign-based accuracy currently sits in
the low 50s for the 1-day model, lifted by a calibration retrain
that landed in the last week.
Live, carve-out cohort, high confidence
The accuracy on the subset of pools where Path has the deepest
coverage and the highest signal density, filtered to predictions
where the model assigned ≥85% confidence. This is the headline
number we cite to institutional partners because it is the
operational range a Strategy Manager would actually trade.
Carve-out high-confidence sign-based accuracy: ~59.5%, vs. the
DeFiLlama random-walk baseline of 53.5%. The lift is statistically
significant at p ≈ 0.0001.
Why the gap between test and live
Test accuracy and live accuracy diverge for three reasons:- Distribution shift. Live data drifts from training data over time. We mitigate with a three-day retrain cadence and a live accuracy gate that flags drift.
- Per-pool heterogeneity. Some pools have richer signal coverage than others. The carve-out cohort exists to surface the operational range where coverage and signal density are strongest.
- Confidence-band stratification. Calibrated models concentrate accuracy at the high-confidence end. Reporting the un-stratified mean obscures that. The high-confidence subset is the relevant one for an allocator who will only act on high-confidence calls.