Anticipatory models — the operational moat
real NY SPARCS + TX PUDF · isotonic calibrated · no protected attribute as feature · temporal holdoutPost-acute placement
0.78 AUROC
56% above chance
SNF · rehab · custodial · NY + TX
fair-ranking · monitor
Prolonged LOS ≥10 days
0.75 AUROC
50% above chance
max subgroup gap 0.04 after payer calibration
fair · pass
In-hospital mortality
0.83 AUROC
66% above chance
strongest model · max gap 0.006 across payer and clinical subgroups
fair · pass
Fairness — anticipatory model calibration gapsgroup = audit axis · never a model input · Section 1557 safe
TargetSubgroup findingVerdict
Post-acute placement
Medicare/elderly underprediction −0.07 · feature-limited, not group-driven
monitor
Prolonged LOS ≥10d
Max gap 0.04 after payer calibration · all groups pass threshold
pass
In-hospital mortality
Max gap 0.006 across all clinical and payer groups · cleanest model
pass
reproduce: scripts/anticipatory_calibrate_payer.py · threshold |gap| ≤ 0.05
Baseline LOS model — excess-LOS target (AUROC 0.63)
real NY SPARCS · de-identifiedMedian lead-time gained
0.0d
day-0 flag vs historical first-touch
Action completion
0%
0 closed · 0 dismissed / 152
Flag precision · recall
50% · 83%
151 flagged / 90 truly excess
Calibration error (ECE)
0.039
real SPARCS sample · lower is better
Calibration — predicted vs observed221 encounters
0–20%n=9
20–40%n=113
40–60%n=92
60–80%n=7
teal = predicted · grey = observed · matched bars = calibrated
Model card
Version
GBM + isotonic — factory pipeline, deployed
Training
real NY SPARCS sample (de-identified, no PHI) · refit on all rows for serving
Features
21 canonical admission-time features (flowos/ml/features.FEATURE_ORDER)
Abstention rate
2% of encounters withheld — a designed state, never a fake date
Is the model fair across patient groups?
real NY SPARCS subgroup audit · F5 model · temporal holdout (154,492 rows)Fairness gate: FAILED — disclosed, not deployment-readyDisclosed
The model underpredicts risk for certain demographic groups — more false negatives, patients who needed work but weren't flagged. The Black/African American calibration gap is 3.6× the White gap, and removing payer or facility priors makes it worse — available administrative features don't explain it. Group identity is a post-hoc audit axis only, never a model input.
Black/African Americann=28,507
0.072
Multi-racialn=2,212
0.047
Othern=40,334
0.035
Whiten=83,439
0.02
cal gap = |mean predicted − observed excess-LOS rate| · reproduce: scripts/run_phase18b_review_packet.py · docs/modeling/OPERATIONAL_TOPK_FAIRNESS_AUDIT.md
Generalizability — new hospital or state?
NY SPARCS + TX PUDF · excess-LOS validationSame state, after local calibration
0.63 AUROC
best case — local history available
New hospital, day one
0.58 AUROC
unseen site, no local history yet
Moved across states (NY → TX)
0.52–0.56 AUROC
cold transfer — near chance, needs local calibration
✓ Proven in this demo
Loop runs end-to-end: predict → route → act → measure
Trained backbone + isotonic calibration
Abstention withholds low-signal cases
Measurement updates from operator actions
Leakage prevented by construction (day-0 inputs)
● Not proven
Cross-validated real accuracy at scale
Conformal calibration intervals
Causal bed-day savings (needs stepped-wedge trial)
Economics / ROI
Fair operation — audit done; gate FAILS
Metric honesty
Encounters221
Flagged / truly excess151 / 90
Median resolution day0
Observed excess bed-days735.0
Precision/recall here are the real-data model's flags vs the real excess-LOS outcome on the current census (a small sample — see /technical-readiness and the model cards for cross-validated scale metrics). Observed excess-days are OBSERVED, not causally saved — causal bed-day reduction requires a prospective stepped-wedge trial. Cross-state figures from NY SPARCS + TX PUDF · excess-days observed, not causally saved · no EHR, comorbidity, or prospective claim