Written by Joshua Belfer
Spoon Feed
Using EHR data from the first 4 hours of ED care, machine learning models accurately predicted which children would develop sepsis within 48 hours—before organ dysfunction was present.
Predicting sepsis before the crash
Early sepsis recognition remains one of the greatest challenges for pediatric emergency care. In this PECARN-led multicenter study utilizing EHR data from over 2.3 million pediatric ED visits, machine learning models that predict future sepsis were developed and validated.
Using routinely collected ED data (vital signs, ESI, age, markers of medical complexity), two different models (logistic regression model and gradient tree boosting model) performed exceptionally well. The area under the receiver operating characteristic (AUROC) was 0.92 (95%CI 0.92–0.93) for the logistic regression model and 0.94 (95%CI 0.93–0.94) for the gradient tree boosting model. Additionally, the models had similarly high AUROCs when looking at septic shock. At a prespecified sensitivity of 90%, the gradient boosting model achieved a positive likelihood ratio of at least 4.7 for sepsis and 4.2 for septic shock.
Model performance was consistent across most demographic groups. However, the gradient tree boosting model, while the best-performing, is also complex, which could limit scalability.
How will this change my practice?
If recognition of sepsis is the cornerstone of the pediatric ER, then prediction of this life-threatening condition would be a paradigm shift. By predicting future Phoenix Sepsis Criteria-defined sepsis, the models created in this paper point toward a future where risk stratification mirrors our current sepsis definitions. This sets the stage for a world in which AI models can assist in recognizing and treating impending sepsis before patients even show signs of organ dysfunction.
AI models analyze and find patterns in large datasets better than humans. The EHR is ripe for this type of study. This may help us not only with sepsis but other conditions that are time-sensitive, diagnostically noisy, and rely on patterns across multiple (sometimes weak) signals. Appendicitis? Intussusception? DKA? I look forward to future work that pairs this kind of model with clinical gestalt to improve outcomes.
Source
Derivation and Validation of Predictive Models for Early Pediatric Sepsis. JAMA Pediatr. 2025 Dec 1;179(12):1318-1325. doi: 10.1001/jamapediatrics.2025.3892. PMID: 41082207; PMCID: PMC12519407.
View JournalFeed Critical Appraisal
Critical Appraisal
Study Identification
Background
Study Question
Study Design & Conduct
Prospective / Retrospective: Retrospective
Multicenter: Yes
Unit of Allocation: Not applicable
Unit of Analysis: ED visits
Randomization Method: Not applicable
Allocation Concealment: Not applicable
Blinding: Not applicable
Follow-up Duration: 48 hours
Population
- Children aged 2 months to less than 18 years
- ED visits
- ED disposition of death or transfer
- Trauma diagnosis
- Sepsis present during predictive features window
Number Enrolled: 2,323,720
Number Analyzed: 2,323,720
Key Baseline Characteristics
Sex: Not reported
Disease Severity: Not reported
Care Setting Distribution: Emergency departments
Additional Baseline Characteristics
- Emergency severity index
- Age-adjusted vital signs
- Medical complexity
Exposures / Interventions
Description: Machine learning models using patient and physiologic characteristics
Definition / Dose: Not applicable
Timing: Within the first 4 hours of ED care
Classification Method: EHR data
Protocolized / Discretionary: Not reported
Description: Not applicable
Definition: Not applicable
Outcomes & Results
Primary Outcomes
Definition: Suspected infection with a Phoenix Sepsis Criteria (PSC) score of 2 or more or death within 48 hours of ED arrival
Time Point: 48 hours
Measurement Method: Phoenix Sepsis Criteria score
Results: AUROC of 0.92 for logistic regression and 0.94 for gradient tree boosting
Secondary Outcomes
Definition: Prediction of septic shock using machine learning models
Time Point: 48 hours
Measurement Method: AUROC
Results: AUROC of 0.92 or greater
Definition: Assessment of model performance across demographic characteristics
Time Point: Not applicable
Measurement Method: AUROC
Results: AUROC for patients with Medicaid insurance was better than for those with commercial payers
Risk of Bias
Risk of Bias - ROBINS-I
- Confounding (Low): Comprehensive data collection and model validation reduce confounding risk.
- Selection of participants (Low): Large, multicenter cohort minimizes selection bias.
- Classification of interventions (Low): Clear definition and consistent application of predictive models.
- Deviations from intended interventions (Low): Not applicable as this is an observational study.
- Missing data (Low): Comprehensive data collection with minimal missing data.
- Measurement of outcomes (Low): Objective outcome measures with clear definitions.
- Selection of the reported result (Low): All relevant outcomes reported with transparency.
Transparency
COI Statement Present: TRUE
