中国循证儿科杂志 ›› 2024, Vol. 19 ›› Issue (1): 31-35.DOI: 10.3969/j.issn.1673-5501.2024.01.006

• 论著 • 上一篇    下一篇

基于多中心队列数据的机器学习预测重症感染患儿死亡风险和筛选临床特征的研究

朱雪梅1,4 陈申成2,4 章莹莹1 陆国平1 叶琪2 阮彤2 郑英杰3   

  1. 1 复旦大学附属儿科医院重症医学科 上海,201102;2 华东理工大学计算机科学与工程学院 上海,200237;3 复旦大学公共卫生学院流行病学教研室 上海,200032;4 共同第一作者
  • 收稿日期:2024-01-25 修回日期:2024-02-23 出版日期:2024-02-25 发布日期:2024-02-25
  • 通讯作者: 陆国平

Mortality risk predicting and clinical feature screening of children with severe infection by machine learning based on multicenter cohort data

ZHU Xuemei1,4, CHEN Shencheng2,4, ZHANG Yingying1, LU Guoping1, YE Qi2, RUAN Tong2, ZHENG Yingjie3   

  1. 1 Department of Critical Care Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; 2 School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China; 3 Department of Epidemiology, School of Public Health, Fudan University, Shanghai 200032, China; 4 Co-first author
  • Received:2024-01-25 Revised:2024-02-23 Online:2024-02-25 Published:2024-02-25

摘要: 背景 科学、有效地预测重症感染患儿死亡关联因素对降低儿童病死率意义重大。既往重症患儿的病情与死亡关系多采用评分预测(如PCIS等),准确度欠佳。 目的 通过机器学习联合特征筛选的方法,挖掘对重症感染患儿死亡风险具有早期预警作用的敏感指标。 设计 队列研究。 方法 基于全国20个省级行政区域的54家PICU的儿童多中心感染性疾病协作网数据库,纳入年龄>28天至18岁、确诊感染和至少有1个器官发生功能障碍的患儿,统计122项临床特征信息,以出PICU时死亡/恶化或治愈/好转为结局,通过机器学习构建逻辑回归模型(LR)、随机森林模型(RF)、极端梯度提升树(XGB)和反向传播神经网络(BP),筛选重要的临床特征建立重症感染患儿死亡风险预测模型。 主要结局指标 模型接收者操作特征曲线下面积(AUROC)和模型筛选临床特征性能的优劣。 结果 2022年4月1日至2023年12月31日协作网数据库中入PICU时确诊重症感染且入PICU时、入PICU 24 h时和出PICU时临床特征记录均完整的(病例1 738例,经过数据预处理包括异常值处理、缺失值填充、强制值区间范围检验、归一化处理)1 738条信息进入机器学习构建模型。存活或好转患儿1 396例,死亡或恶化患儿342例(19.6%)。队列数据按4∶1分为训练集(1 390条)和验证集(348条),训练集中存活或好转1 116条,死亡或恶化274条;验证集中存活或好转280条,死亡或恶化68条。在训练集中,共输入模型122个临床特征,经过机器模型学习以及特征筛选后,在50轮的5折分层交叉验证下,验证集LR、RF和XGB的AUROC为0.74~0.78。LR、RF和XGB选择重要性大于均值的临床特征构建最优临床特征,尚无比较好的衡量BP特征重要性的方法,LR模型较RF和XGB构建的最优临床特征较为接近临床预期。 结论 机器学习预测儿童重症感染性疾病死亡/恶化结局表现一般,预测模型筛选的临床特征与临床预期尚有距离。

关键词: 机器学习, 儿童重症监护室, 感染, 随机森林模型, 极端梯度提升树

Abstract: Background It is of great significance to predict the mortality of children with severe infection scientifically and effectively. In the past, the relationship between illness and death in critically ill children was mostly predicted by scores with poor accuracy like the Pancreatitis Complications and Severity Index. Objective To explore the sensitive indicators for the early warning of the death in children with severe infection by machine learning combined with feature screening. Design Cohort study. Methods We conducted the cohort study based on the pediatric Multi-center Infectious Diseases Collaboration Network database of 54 PICUs in 20 provincial administrative regions of China. In total, 122 clinical features of 11 clinical dimensions were collected from children aged > 28 days after birth to 18 years, with confirmed infection and at least one organ dysfunction. A risk prediction model for mortality in critically ill children with infections was established by constructing logistic regression models (LR), random forest models (RF), extreme gradient boosting tree models (XGB), and backpropagation neural network models (BP) through machine learning techniques and screening important clinical features. Main outcome measures AUROC and the performance of the model in screening clinical characteristics. Results From April 1, 2022 to December 31, 2023, there were 1 738 cases of severe infection with complete records at PICU admission, at PICU 24h stay and at discharge from PICU, of whom 1 396 patients survived or improved, and 342(19.6%) died or deteriorated. After data preprocessing by outlier processing, missing value filling, mandatory value interval range testing, normalization processing, 1 738 pieces of information were entered into machine learning to build the model. According to the ration of 4∶1, 1 390 patients were enrolled in training sets and 348 were in validation sets. In training sets, 1 116 patients survived (or cured) and 274 died (or worsened), and in validation sets, 280 patients survived (or cured), and 68 died (or worsened). In training sets, a total of 122 clinical features were input. After machine learning and feature screening, the range of AUROC of LR, RF and XGB was 0.74-0.78 in validation sets after 50 rounds of 5-fold stratified cross-validation. Features with greater importance than the mean value were selected to construct the optimal clinical features in LR, RF, and XGB models. At present, there is no good method to measure the importance of BP characteristics. Clinical features constructed by the LR model were closer to clinical expectations than by RF and XGB. Conclusion Machine learning is less than perfect in predicting death of severe infectious diseases in children, and the clinical futures screened by predictive model are still far from clinical expectations.

Key words: Machine learning, Pediatric intensive care unit, Infection, Random forest model, Extreme gradient lifting tree