FlowOS · Proof

Anticipatory models — the operational moat

real NY SPARCS + TX PUDF · isotonic calibrated · no protected attribute as feature · temporal holdout

Post-acute placement

0.78 AUROC

56% above chance

SNF · rehab · custodial · NY + TX

fair-ranking · monitor

Prolonged LOS ≥10 days

0.75 AUROC

50% above chance

max subgroup gap 0.04 after payer calibration

fair · pass

In-hospital mortality

0.83 AUROC

66% above chance

strongest model · max gap 0.006 across payer and clinical subgroups

fair · pass

Fairness — anticipatory model calibration gapsgroup = audit axis · never a model input · Section 1557 safe

TargetSubgroup findingVerdict

Post-acute placement Medicare/elderly underprediction −0.07 · feature-limited, not group-driven monitor

Prolonged LOS ≥10d Max gap 0.04 after payer calibration · all groups pass threshold pass

In-hospital mortality Max gap 0.006 across all clinical and payer groups · cleanest model pass

reproduce: scripts/anticipatory_calibrate_payer.py · threshold |gap| ≤ 0.05

Baseline LOS model — excess-LOS target (AUROC 0.63)

real NY SPARCS · de-identified

Median lead-time gained

0.0d

day-0 flag vs historical first-touch

Action completion

0 closed · 0 dismissed / 152

Flag precision · recall

50% · 83%

151 flagged / 90 truly excess

Calibration error (ECE)

0.039

real SPARCS sample · lower is better

Calibration — predicted vs observed221 encounters

0–20%n=9

pred 15%

obs 11%

20–40%n=113

pred 33%

obs 31%

40–60%n=92

pred 47%

obs 53%

60–80%n=7

pred 68%

obs 71%

teal = predicted · grey = observed · matched bars = calibrated

Model card

Version

GBM + isotonic — factory pipeline, deployed

Training

real NY SPARCS sample (de-identified, no PHI) · refit on all rows for serving

Features

21 canonical admission-time features (flowos/ml/features.FEATURE_ORDER)

Abstention rate

2% of encounters withheld — a designed state, never a fake date

Is the model fair across patient groups?

real NY SPARCS subgroup audit · F5 model · temporal holdout (154,492 rows)

Fairness gate: FAILED — disclosed, not deployment-readyDisclosed

The model underpredicts risk for certain demographic groups — more false negatives, patients who needed work but weren't flagged. The Black/African American calibration gap is 3.6× the White gap, and removing payer or facility priors makes it worse — available administrative features don't explain it. Group identity is a post-hoc audit axis only, never a model input.

Black/African Americann=28,507

0.072

Multi-racialn=2,212

0.047

Othern=40,334

0.035

Whiten=83,439

0.02

cal gap = |mean predicted − observed excess-LOS rate| · reproduce: scripts/run_phase18b_review_packet.py · docs/modeling/OPERATIONAL_TOPK_FAIRNESS_AUDIT.md

Generalizability — new hospital or state?

NY SPARCS + TX PUDF · excess-LOS validation

Same state, after local calibration

0.63 AUROC

best case — local history available

New hospital, day one

0.58 AUROC

unseen site, no local history yet

Moved across states (NY → TX)

0.52–0.56 AUROC

cold transfer — near chance, needs local calibration

✓ Proven in this demo

Loop runs end-to-end: predict → route → act → measure

Trained backbone + isotonic calibration

Abstention withholds low-signal cases

Measurement updates from operator actions

Leakage prevented by construction (day-0 inputs)

● Not proven

Cross-validated real accuracy at scale

Conformal calibration intervals

Causal bed-day savings (needs stepped-wedge trial)

Economics / ROI

Fair operation — audit done; gate FAILS

Metric honesty

Encounters221

Flagged / truly excess151 / 90

Median resolution day0

Observed excess bed-days735.0

Precision/recall here are the real-data model's flags vs the real excess-LOS outcome on the current census (a small sample — see /technical-readiness and the model cards for cross-validated scale metrics). Observed excess-days are OBSERVED, not causally saved — causal bed-day reduction requires a prospective stepped-wedge trial. Cross-state figures from NY SPARCS + TX PUDF · excess-days observed, not causally saved · no EHR, comorbidity, or prospective claim

Model validation & fairness proof

Anticipatory models — the operational moat

Baseline LOS model — excess-LOS target (AUROC 0.63)

Is the model fair across patient groups?

Generalizability — new hospital or state?