ABSTRACT
-
Purpose
- Pulmonary complications, including pneumonia and respiratory failure, continue to be major contributors to morbidity and mortality in patients with chest trauma. Although several artificial intelligence (AI) models have been developed to predict trauma mortality, there remains a lack of AI-based prediction models specifically targeting pulmonary complications in chest trauma. To address this gap, we developed and validated an explainable AI model for predicting pulmonary complications.
-
Methods
- This retrospective analysis included 1,040 patients with blunt chest trauma who were treated at a single regional trauma center between January 2019 and March 2023. Pulmonary complications were defined as pneumonia, prolonged mechanical ventilation (>48 hours), or other major thoracic complications necessitating surgical intervention. Machine learning algorithms, including extreme gradient boosting (XGBoost), random forest, adaptive boosting (AdaBoost), light gradient boosting machine (LightGBM), and a deep neural network, were trained using hyperparameter tuning and threefold cross-validation. Model performance was evaluated by sensitivity, specificity, accuracy, balanced accuracy, F1 score, and the area under the receiver operating characteristic curve (AUC). Model interpretability was assessed using Shapley Additive Explanations (SHAP) values.
-
Results
- Among the total cohort, 188 patients (18.1%) developed pulmonary complications. In the independent testing dataset (n=208), XGBoost achieved the highest AUC (0.856), while AdaBoost demonstrated the highest balanced accuracy (0.779). All machine learning models outperformed conventional scoring systems. SHAP analysis identified key predictors of pulmonary complications, including age, Injury Severity Score, Glasgow Coma Scale score, Abbreviated Injury Scale of the extremity or head, initial PaO2 to fraction of inspired oxygen ratio, location of the primary rib fracture, and presence of flail motion.
-
Conclusions
- The developed AI model accurately predicts pulmonary complications in patients with chest trauma and outperforms traditional prognostic tools. The model's explainability offers actionable clinical insights, supporting early risk stratification and evidence-based decision-making in trauma care.
-
Keywords: Thoracic injuries; Rib fractures; Machine learning; Explainable artificial intelligence
INTRODUCTION
- Background
- Chest trauma remains a major cause of morbidity and mortality worldwide, with outcomes ranging from minor injuries to severe, life-threatening conditions [1]. Pulmonary complications, including pneumonia, acute respiratory distress syndrome, and respiratory failure, are particularly critical because they are associated with prolonged hospitalization, increased requirements for mechanical ventilation, and higher mortality rates [2,3]. Early identification of patients at risk for these complications remains a significant challenge in trauma care.
- In recent years, artificial intelligence (AI) has been increasingly applied in medical imaging and diagnostics, demonstrating promising performance in detecting rib fractures, chest contusions, and pneumonia from imaging data. Several studies have also developed and validated AI models for predicting mortality, often surpassing conventional assessment methods [4–7]. However, there is still a significant gap in the development of AI tools for prognostic prediction in chest trauma, particularly regarding the anticipation of pulmonary complications. Previously, we developed a nomogram based on logistic regression to predict pulmonary complications in patients with chest trauma [2,8,9].
- An AI-driven model capable of predicting pulmonary complications in patients with chest trauma could substantially assist clinical decision-making, especially in emergency settings. By enabling real-time risk stratification, such a model would allow clinicians to prioritize high-risk patients for intensive monitoring and early intervention. The integration of AI into trauma care may improve patient outcomes, optimize resource utilization, and facilitate timely, evidence-based care.
- Objectives
- This study aimed to develop and validate an AI model for predicting pulmonary complications in patients with chest trauma.
METHODS
- Ethics statement
- This study was approved by the Institutional Review Board of the Chungbuk National University Hospital (No. 2024-11-011). The requirement for informed consent was waived due to the use of deidentified patient data and the retrospective nature of the study.
- Study design and data source
- This single-center retrospective observational study was conducted at a regional trauma center in Korea. The study was performed in accordance with the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) + AI guidelines [10]. Data were collected for all patients with blunt chest trauma beginning at the time of hospital admission, including the Injury Severity Score (ISS) and Abbreviated Injury Scale (AIS). The clinical course during hospitalization, such as the development of paradoxical chest wall motion or pneumonia, was documented. Rib fracture patterns and the extent of pulmonary contusion [11] were evaluated once using the initial chest computed tomography (CT) scan. At the trauma center, a whole-body CT from head to pelvis was routinely performed, with additional extremity scans conducted when clinically indicated.
- Study population
- This study enrolled consecutive patients with blunt chest trauma (AIS chest ≥1) who presented to the emergency department (ED) between January 2019 and March 2023. We excluded patients meeting any of the following criteria: (1) conditions in which the degree of pulmonary contusion could not be assessed, such as a totally collapsed lung secondary to tension pneumothorax or a one-lung state following pneumonectomy; (2) discharge or death within 24 hours after presentation; and (3) transfer to another facility during the acute period.
- Definitions
- The severity of pulmonary contusion was assessed using the Blunt Pulmonary Contusion 18 (BPC 18) scoring system [11], which segments each lung into upper, middle, and lower zones, assigning a score from 0 to 3 for each zone based on the degree of parenchymal opacification. We also evaluated chest injury severity using established scoring systems, including the Thorax Trauma Severity Score (TTSS) [12], Rib Fracture Score (RFS) [13], Chest Trauma Score (CTS) [14], and RibScore [15].
- In accordance with the guidelines of the Chest Wall Injury Society and prior research, rib displacement was graded as follows: grade 0, no rib fracture; grade 1, displacement of less than 50% of the rib width on axial CT; grade 2, displacement greater than 50% but less than 100%; and grade 3, displacement of 100% or more [3,8,16]. Rib fracture locations were classified as anterior, lateral, or posterior regions [17]. For the 1st–3rd and 9th–12th ribs, imaginary reference lines aligned with anatomical landmarks of the 4th–7th ribs were used for segmentation. A segmental rib fracture was defined as a single rib fractured in two or more separate locations; in some cases, segmental fractures occurred at two lateral sites. The term “flail chest” is considered ambiguous and has been proposed to be subdivided into flail segment and flail motion [18]. Recent studies further highlight the need to distinguish flail segment from flail motion as separate entities [2,8]. Accordingly, in this study, flail chest was further categorized into two groups: anatomical flail segment group (patients with three or more consecutive segmental rib fractures) and flail motion group (patients exhibiting paradoxical chest wall motion during the index hospitalization) [19]. Flail motion was confirmed by experienced trauma surgeons through clinical observation during the hospital stay.
- The primary outcome of this study was overall pulmonary complications, defined as the occurrence of one or more of the following events: (1) pneumonia; (2) prolonged mechanical ventilation; and (3) complications requiring surgical treatment. Pneumonia was diagnosed in ventilated patients based on clinical signs and symptoms along with a positive quantitative culture (>105 colony-forming units per milliliter) from a bronchoscopy-obtained lower respiratory tract sample. In nonventilated patients, pneumonia was defined by the presence of at least two of the following: purulent sputum, body temperature >38.3 °C, leukocytosis (>11,000 cells/dL), or worsening findings on chest radiograph. Prolonged mechanical ventilation was defined as dependence on mechanical ventilation for more than 48 hours during the index hospitalization. Complications requiring surgical treatment included empyema, descending aortic injury secondary to rib fractures, or thoracotomy for delayed hemothorax or pneumothorax [2].
- Statistical analysis
- Continuous variables are presented as medians with interquartile ranges, and categorical variables as counts with percentages. Continuous variables were compared using either the Student t-test or the Mann-Whitney U-test, as appropriate. Categorical variables were compared using the chi-square test or Fisher exact test, as appropriate. Statistical significance was defined as a two-tailed P-value <0.05. All analyses were performed using R ver. 4.1.2 (R Foundation for Statistical Computing).
- Data split, processing, balancing, and cross-validation
- Data from 1,040 patients were randomly partitioned into training and testing sets in an 8:2 ratio using stratified sampling. The testing set was reserved exclusively for independent evaluation of the AI model and was not used in model training or internal validation. For univariate analysis, features with a P-value <0.10 were selected. In total, 20 features were chosen as input variables for the machine learning models. Standard scaling was applied to all numeric data to ensure equal contribution of features. One-hot encoding was used to convert categorical data into a suitable format for the models. To assess generalizability, threefold cross-validation was performed within the training set. The training data were randomly shuffled and stratified into three equal subsets; two subsets were used for model training, and the remaining subset served as the internal validation set. This process was repeated three times, rotating the validation subset at each iteration. The finalized AI model, described below, was subsequently evaluated using the independent testing dataset. All categorical variables were processed with one-hot encoding. Given the pronounced imbalance between patients without pulmonary complications (81.9%) and those with complications (18.1%), the SMOTE (Synthetic Minority Over-sampling Technique) was applied during model development to up-sample the minority class and reduce bias [20].
- Machine learning model development
- Extreme gradient boosting (XGBoost) is a gradient-boosting framework optimized for both speed and accuracy, and is particularly effective in addressing class imbalance [21]. Hyperparameters, including learning rate, maximum tree depth, subsample ratio, early-stopping criteria, and regularization terms, were tuned to maximize predictive performance. Light gradient boosting machine (LightGBM) is a histogram-based gradient-boosting model designed for large-scale datasets, offering efficient handling of categorical variables and rapid training [22]. Its hyperparameters (learning rate, maximum depth, subsample ratio, number of leaves, and minimum sum of Hessians in a leaf) were also systematically optimized. Random forest is an ensemble method that constructs multiple decision trees on bootstrapped data subsets and aggregates their predictions to enhance generalization and reduce overfitting; relevant hyperparameters (maximum depth, minimum number of samples per leaf, minimum number of samples required to split an internal node, and regularization terms) were carefully tuned [23]. Adaptive boosting (AdaBoost) sequentially combines weak learners (typically decision stumps), reweighting misclassified instances to focus learning on more challenging cases; the number of estimators, maximum depth of the base estimator, and learning rate were optimized for best performance [24].
- A deep neural network (DNN) was implemented using Keras with a sequential architecture. The model comprised an input layer corresponding to the number of features, followed by two fully connected hidden layers with 32 and 16 neurons, respectively. Each hidden layer was followed by batch normalization and LeakyReLU activation to improve convergence and mitigate vanishing gradient effects. The output layer contained a single neuron with a sigmoid activation function to predict the binary outcome. Batch normalization was also applied to the output layer, with hyperparameters set to a momentum of 0.95 and an epsilon of 0.001. The beta and gamma parameters were initialized to zeros and ones, respectively. Models were trained with the Adam optimizer and binary cross-entropy loss function (learning rate, 0.001; batch size, 8). For explainable AI, Shapley Additive Explanations (SHAP) were implemented [25].
- All hyperparameters were tuned via grid search. Models were implemented in Python ver. 3.7.13 (Python Software Foundation) with TensorFlow ver. 2.8.0 (Google LLC), Keras ver. 2.8.0 (Google LLC), NumPy ver. 1.21.6 (Numerical Python), pandas ver. 1.3.5 (pandas Development Team), Matplotlib ver. 3.5.1 (Matplotlib Development Team), and Scikit-learn ver. 1.0.2 (scikit-learn developers). Model performance was evaluated using sensitivity, precision, specificity, F1 score, accuracy, balanced accuracy, and area under the receiver operating characteristic curve (AUC).
RESULTS
- Of the 1,040 patients included in the final analysis, 188 (18.1%) developed one or more pulmonary complications. Baseline characteristics for patients with and without pulmonary complications are presented in Table 1.
- K-fold cross-validation result
-
Table 2 summarizes the results of threefold cross-validation for hyperparameter tuning. Among the models, random forest achieved the highest balanced accuracy (0.696) and AUC (0.786), while the DNN yielded the lowest AUC (0.731), and LightGBM exhibited the lowest balanced accuracy (0.654).
- Final performance of machine learning and conventional models on the testing dataset
- Using the testing dataset (n=208), the final performance of all machine learning models is summarized in Table 3. Performance metrics for conventional models previously used for patient prognosis are also reported in Table 3. XGBoost achieved the highest AUC (0.86), while AdaBoost provided the highest balanced accuracy (0.78). Notably, the machine learning models—particularly XGBoost, random forest, AdaBoost, LightGBM, and DNN—outperformed the conventional models (TTSS, RFS, CTS, RibScore, and ISS). Fig. 1 illustrates the receiver operating characteristic curves for all models.
- Feature importance analysis for explainable AI
-
Fig. 2 presents SHAP summary plots illustrating the 20 variables with the greatest impact on the four highest-performing machine learning models: XGBoost, random forest, AdaBoost, and LightGBM. Fig. 3 displays the mean absolute SHAP values for each feature, indicating overall feature importance across these models. Among the four AI models, the following features consistently appeared within the 10 most influential variables: ISS, age, Glasgow Coma Scale (GCS) score, AIS extremities, flail motion, PaO2 to fraction of inspired oxygen (FiO2) ratio, AIS head, and lateral primary fracture line. The precise ranking of these features varied slightly between models.
DISCUSSION
- Our novel AI model successfully predicted pulmonary adverse outcomes in patients with chest trauma. This approach offers several advantages for clinical implementation. First, it outperformed conventional scoring systems, including ISS, TTSS, RFS, CTS, and RibScore, with every machine learning algorithm outperforming traditional metrics. Second, as an explainable AI model, it provides valuable clinical insights, highlighting key predictors such as ISS, age, GCS score, AIS extremities, flail motion, PaO2 to FiO2 ratio, AIS head, and lateral primary fracture line. Third, the proposed model relies on clinical variables that are readily available during the early assessment phase, underscoring its potential utility for trauma surgeons in acute care settings. Our AI model may offer valuable clinical decision support, including guidance for closer monitoring or preemptive procedural decisions. In our study, all variables, including rib fracture patterns and ISS, were obtainable within a few days of admission. All CT scans were performed in the ED, suggesting that the model may be applicable at an early stage of care. However, scoring systems such as ISS and AIS are not always immediately accessible in the acute care setting, as their calculation may require additional time and input from trained trauma coordinators. Therefore, we plan to develop a prediction model based solely on variables available at the time of ED presentation.
- Previous studies on AI applications in chest trauma have primarily focused on detecting and diagnosing rib fractures [26,27]. Previous systematic reviews and meta-analyses reported good diagnostic accuracy of AI tools based on x-ray or CT [26,27]. A recent meta-analysis demonstrated that commercially available AI-based fracture detection solutions performed best when combined with human assessment [26]. In contrast, AI-based prognostic prediction in chest trauma remains limited. Choi et al. [28] developed a deep learning algorithm using the US National Readmissions Database to identify factors associated with readmission after rib fractures; their SHAP analysis highlighted chronic obstructive pulmonary disease, prolonged index hospitalization, longer length of stay, private primary payer status, and pneumothorax diagnosed during the index admission as significant predictors of 3-month readmission. In a multicenter retrospective study of 3,116 patients from 33 institutions in China, Liu et al. [29] designed an AI model to predict venous thromboembolism in thoracic trauma, achieving AUCs of 0.879 in the testing set and 0.83 in the external validation set; SHAP analysis identified age, body mass index, number of broken rib ends, rib fracture surgery, multiple trauma, lower-limb fracture, tracheal intubation, blood transfusion, and laboratory results as key predictors. Song et al. [30], in a single-center retrospective study of 169 patients with flail chest, built an XGBoost model for pneumonia prediction, achieving an AUC of 0.895; their SHAP analysis indicated systolic blood pressure, pH value, blood transfusion, ISS, hemoglobin, tracheotomy, number of rib fractures, intensive care unit admission via the ED, rib location, and AIS of the limbs or pelvis as the top predictors.
- Compared with previous research, our study makes a unique contribution through the analysis of rib fracture patterns, including the number of rib fractures, segmental rib fractures, and the location of the primary fracture line. Our earlier work demonstrated the clinical relevance of these patterns [2,8,9]. In the present study, the model identified several important features—ISS, age, GCS score, AIS extremities, flail motion, PaO2 to FiO2 ratio, AIS head, and lateral primary fracture line—that may offer valuable insights for clinicians and researchers.
- In our study, tree-based ensemble models such as XGBoost, Random Forest, AdaBoost, and LightGBM outperformed the DNN model. This result may be attributed to the nature of our dataset, which consisted of tabular data composed of numeric features [31]. Recent studies have similarly reported that DNNs often underperform compared to other machine learning algorithms when applied to tabular datasets, which are commonly used in healthcare research [31–33]. Tree-based models not only achieve superior predictive performance in such settings but also offer significantly lower computational costs [31]. This performance advantage is thought to stem from the inherent characteristics of tabular data, such as irregular target function patterns, the presence of uninformative features, and a lack of rotational invariance, conditions under which linear combinations of features, as used in DNNs, may fail to capture relevant information effectively [31].
- Limitations
- This study had several limitations. First, the retrospective design may have introduced selection bias. Second, rib fracture patterns were assessed only once, on the initial chest CT, without follow-up imaging to monitor displacement over time due to cost and patient-safety concerns. Third, the number of patients who underwent surgical stabilization of rib fractures (SSRF) was small. Although SSRF can influence outcomes, our institution follows a conservative strategy, reserving SSRF for patients with flail motion and respiratory distress, or as an adjunct to thoracotomy performed for other indications. We do not perform SSRF prophylactically to prevent potential respiratory complications; therefore, its impact on the incidence of flail motion in this study is likely minimal. Fourth, this study did not include external validation, highlighting the need for future multicenter investigations. External validation of the proposed model is planned in future studies. Fifth, our AI model was developed exclusively with a tabular dataset comprising numerical variables. Incorporating multimodal data, such as medical imaging, electrocardiography, and textual information, should enhance model precision. Sixth, the k-fold cross-validation performance was lower than that of the final test set, likely due to the limited training set size. Future investigations using larger datasets are warranted to address potential overfitting or underfitting. Seventh, the BPC 18, used as an input feature, is susceptible to variability in clinical interpretation, potentially affecting interobserver reliability. Further study is needed to address this issue.
- Conclusions
- We anticipate that our AI model could serve as a novel predictive tool for patients with chest trauma. Given the complexity and ambiguity inherent in multiple-trauma cases, our model may offer a valuable decision support tool for clinicians. Moreover, the model’s explainability is expected to enhance its clinical utility. Future prospective studies and external validation with larger datasets are warranted to improve generalizability and robustness.
ARTICLE INFORMATION
-
Author contributions
Conceptualization: all authors; Data curation: all authors; Formal analysis: all authors; Funding acquisition: WSK; Methodology: all authors; Project administration: WSK; Visualization: all authors; Writing–original draft: all authors; Writing–review & editing: all authors. All authors read and approved the final manuscript.
-
Conflicts of interest
Wu Seong Kang is an editorial board member of this journal, but was not involved in the peer reviewer selection, evaluation, or decision process of this article. The authors have no other conflicts of interest to declare.
-
Funding
This research was supported by the Research Program of Korean Association for Research, Procedures on Education on Trauma (No. KARPET-2023-01).
-
Data availability
Data analyzed in this study are available from the corresponding author upon reasonable request.
Fig. 1.Receiver operating characteristic curves for machine learning and conventional models. AUC, area under the receiver operating characteristic curve; AdaBoost, adaptive boosting; XGBoost, extreme gradient boosting; LightGBM, light gradient boosting machine; DNN, deep neural network; TTSS, Thorax Trauma Severity Score; RFS, Rib Fracture Score; CTS, Chest Trauma Score; ISS, Injury Severity Score.
Fig. 2.Feature importance interpreted by Shapley Additive Explanations (SHAP) values, illustrating the distribution of each feature’s impact on the model output: (A) extreme gradient boosting (XGBoost), (B) random forest, (C) adaptive boosting (AdaBoost), and (D) light gradient boosting machine (LightGBM). Color indicates the feature value, with red representing high values and blue representing low values. ISS, Injury Severity Score; GCS, Glasgow Coma Scale, AlS, Abbreviated Injury Scale; PFR, PaO2 to fraction of inspired oxygen ratio; RFX, rib fracture; BPC 18, Blunt Pulmonary Contusion 18.
Fig. 3.Mean absolute Shapley Additive Explanations (SHAP) values for each feature, indicating overall feature importance across the models: (A) extreme gradient boosting (XGBoost), (B) random forest, (C) adaptive boosting (AdaBoost), and (D) light gradient boosting machine (LightGBM). ISS, Injury Severity Score; GCS, Glasgow Coma Scale, AlS, Abbreviated Injury Scale; PFR, PaO2 to fraction of inspired oxygen ratio; RFX, rib fracture; BPC 18, Blunt Pulmonary Contusion 18.
Table 1.Comparison of characteristics between patients without and with pulmonary complications (n=1,040)
|
Characteristic |
Pulmonary complication
|
P-value |
|
No (n=852, 81.9%) |
Yes (n=188, 18.1%) |
|
Sex |
|
|
|
|
Female |
227 (26.6) |
41 (21.8) |
0.201 |
|
Male |
625 (73.4) |
147 (78.2) |
|
|
Age (yr) |
57 (43–69) |
66 (52–75) |
<0.001 |
|
Body mass index (kg/m2) |
23.7 (21.6–26.2) |
23.3 (20.8–26.0) |
0.176 |
|
PaO2 to FiO2 ratio |
327.4 (256.9–392.3) |
248.8 (171.5–329.3) |
<0.001 |
|
GCS score on admission |
15 (14–15) |
14 (8–15) |
<0.001 |
|
15 (Normal) |
604 (70.9) |
74 (39.4) |
<0.001 |
|
14 (Mild) |
107 (12.6) |
26 (13.8) |
0.725 |
|
10–13 (Moderate) |
80 (9.4) |
33 (17.6) |
0.002 |
|
<10 (Severe) |
61 (7.2) |
55 (29.2) |
<0.001 |
|
Hospital length of stay (day) |
12.0 (6.0–24.8) |
35.0 (19.5–66.0) |
<0.001 |
|
ICU length of stay (day) |
0.8 (0–2.7) |
11.3 (4.9–22.3) |
<0.001 |
|
MV use (min) |
0 (0–0) |
5,642 (0–15,584) |
<0.001 |
|
Trauma scoring system |
|
|
|
|
Abbreviated Injury Scale |
|
|
|
|
Head |
0 (0–2) |
2 (0–3) |
<0.001 |
|
Face |
0 (0–0) |
0 (0–1) |
<0.001 |
|
Chest |
3 (3–3) |
3 (3–3) |
<0.001 |
|
Abdomen |
0 (0–2) |
0 (0–2) |
0.004 |
|
Extremities |
0 (0–2) |
2 (0–3) |
<0.001 |
|
External |
1 (0–1) |
1 (0–1) |
0.023 |
|
Injury Severity Score |
14 (10–22) |
26.5 (17–34) |
<0.001 |
|
Chest trauma scoring system |
|
|
|
|
Thoracic Trauma Severity Score |
8 (6–11) |
11 (9–15) |
<0.001 |
|
Rib Fracture Score |
5 (3–8) |
8 (6–14) |
<0.001 |
|
Chest Trauma Score |
5 (4–6) |
6 (5–8) |
<0.001 |
|
RibScore |
0 (0–2) |
2 (0–3) |
<0.001 |
|
Pneumothorax |
421 (49.4) |
109 (58.0) |
0.041 |
|
Hemothorax |
409 (48.0) |
106 (56.4) |
0.046 |
|
BPC 18 score |
1 (0–2) |
2 (0–5) |
<0.001 |
|
Rib fracture pattern |
|
|
|
|
Total no. of rib fractures |
4 (2–6) |
5 (3.5–9) |
<0.001 |
|
Grade 2 |
0 (0–1) |
0 (0–1) |
0.007 |
|
Grade 3 |
0 (0–1) |
1 (0–3) |
<0.001 |
|
Total no. of segmented rib fractures |
0 (0–2) |
1 (0–4) |
<0.001 |
|
Grade 2 |
0 (0–0) |
0 (0–0) |
0.004 |
|
Grade 3 |
0 (0–0) |
0 (0–1) |
<0.001 |
|
Bilateral |
112 (13.1) |
50 (26.6) |
<0.001 |
|
Flail segment |
176 (20.7) |
74 (39.4) |
<0.001 |
|
Flail motion |
27 (3.2) |
41 (21.8) |
<0.001 |
|
Anterior flail motion |
4 (0.5) |
11 (5.9) |
<0.001 |
|
Primary fracture line location |
|
|
|
|
Anterior |
146 (17.1) |
37 (19.7) |
0.469 |
|
Lateral |
285 (33.5) |
85 (45.2) |
0.003 |
|
Posterior |
273 (32.0) |
63 (33.5) |
0.761 |
|
Surgical stabilization of rib fracture |
21 (2.5) |
21 (11.2) |
<0.001 |
|
Complication |
|
|
|
|
Pneumonia |
0 |
162 (86.2) |
<0.001 |
|
Other pulmonary complication |
0 |
41 (21.8) |
<0.001 |
|
MV use >48 hr |
54 (6.3) |
109 (58.0) |
<0.001 |
|
Overall complication |
72 (8.5) |
188 (100) |
<0.001 |
Table 2.Threefold cross-validation results of machine learning models
|
Model |
Sensitivity |
Precision |
Specificity |
F1 score |
Accuracy |
Balanced accuracy |
AUC |
|
XGBoost |
0.420 |
0.426 |
0.875 |
0.422 |
0.793 |
0.648 |
0.783 |
|
Random forest |
0.526 |
0.463 |
0.865 |
0.491 |
0.804 |
0.696 |
0.786 |
|
AdaBoost |
0.460 |
0.448 |
0.874 |
0.453 |
0.799 |
0.667 |
0.785 |
|
LightGBM |
0.433 |
0.435 |
0.875 |
0.433 |
0.796 |
0.654 |
0.782 |
|
DNN |
0.707 |
0.311 |
0.654 |
0.431 |
0.663 |
0.680 |
0.731 |
Table 3.Final performance of machine learning and conventional models on the testing dataset (n=208)
|
Model |
Threshold |
Sensitivity |
Precision |
Specificity |
F1 score |
Accuracy |
Balanced accuracy |
AUC |
|
Machine learning model |
|
|
|
|
|
|
|
|
XGBoost |
0.500 |
0.526 |
0.513 |
0.888 |
0.520 |
0.822 |
0.707 |
0.856 |
|
Random forest |
0.500 |
0.605 |
0.500 |
0.865 |
0.548 |
0.817 |
0.735 |
0.855 |
|
AdaBoost |
0.500 |
0.763 |
0.453 |
0.794 |
0.569 |
0.789 |
0.779 |
0.834 |
|
LightGBM |
0.500 |
0.579 |
0.537 |
0.888 |
0.557 |
0.832 |
0.734 |
0.846 |
|
DNN |
0.500 |
0.816 |
0.341 |
0.647 |
0.481 |
0.678 |
0.731 |
0.809 |
|
Conventional model |
|
|
|
|
|
|
|
|
TTSS |
12 |
0.579 |
0.386 |
0.794 |
0.463 |
0.755 |
0.687 |
0.766 |
|
RFS |
9 |
0.658 |
0.385 |
0.765 |
0.485 |
0.745 |
0.711 |
0.747 |
|
CTS |
6 |
0.763 |
0.326 |
0.647 |
0.457 |
0.668 |
0.705 |
0.738 |
|
RibScore |
2 |
0.658 |
0.325 |
0.694 |
0.435 |
0.688 |
0.676 |
0.730 |
|
ISS |
22 |
0.684 |
0.347 |
0.712 |
0.460 |
0.707 |
0.698 |
0.739 |
REFERENCES
- 1. Brewer JM, Karsmarski OP, Fridling J, et al. Chest wall injury fracture patterns are associated with different mechanisms of injury: a retrospective review study in the United States. J Trauma Inj 2024;37:48–59.ArticlePubMedPMCPDF
- 2. Seok J, Yoon SY, Lee JY, Kim S, Cho H, Kang WS. Novel nomogram for predicting pulmonary complications in patients with blunt chest trauma with rib fractures: a retrospective cohort study. Sci Rep 2023;13:9448.ArticlePubMedPMCPDF
- 3. Chien CY, Chen YH, Han ST, Blaney GN, Huang TS, Chen KF. The number of displaced rib fractures is more predictive for complications in chest trauma patients. Scand J Trauma Resusc Emerg Med 2017;25:19.ArticlePubMedPMCPDF
- 4. Kang WS, Chung H, Ko H, et al. Artificial intelligence to predict in-hospital mortality using novel anatomical injury score. Sci Rep 2021;11:23534.ArticlePubMedPMCPDF
- 5. Lee S, Kang WS, Seo S, et al. Model for predicting in-hospital mortality of physical trauma patients using artificial intelligence techniques: nationwide population-based study in Korea. J Med Internet Res 2022;24:e43757.ArticlePubMedPMC
- 6. Lee S, Kang WS, Kim DW, et al. An artificial intelligence model for predicting trauma mortality among emergency department patients in South Korea: retrospective cohort study. J Med Internet Res 2023;25:e49283.ArticlePubMedPMC
- 7. Lee S, Kim DW, Oh NE, et al. External validation of an artificial intelligence model using clinical variables, including ICD-10 codes, for predicting in-hospital mortality among trauma patients: a multicenter retrospective cohort study. Sci Rep 2025;15:1100.ArticlePubMedPMCPDF
- 8. Seok J, Jeong ST, Yoon SY, et al. Novel nomogram for predicting paradoxical chest wall movement in patients with flail segment of traumatic rib fracture: a retrospective cohort study. Sci Rep 2023;13:20251.ArticlePubMedPMCPDF
- 9. Kim H, Yoon SY, Han J, Seok J, Kang WS. Non-completely displaced traumatic rib fractures: potentially less crucial for pulmonary adverse outcomes, regardless of classification. Medicina (Kaunas) 2025;61:81.ArticlePubMedPMC
- 10. Collins GS, Moons KG, Dhiman P, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024;385:e078378.ArticlePubMedPMC
- 11. de Moya MA, Manolakaki D, Chang Y, et al. Blunt pulmonary contusion: admission computed tomography scan predicts mechanical ventilation. J Trauma 2011;71:1543–7.ArticlePubMed
- 12. Pape HC, Remmers D, Rice J, Ebisch M, Krettek C, Tscherne H. Appraisal of early evaluation of blunt chest trauma: development of a standardized scoring system for initial clinical decision making. J Trauma 2000;49:496–504.ArticlePubMed
- 13. Maxwell CA, Mion LC, Dietrich MS. Hospitalized injured older adults: clinical utility of a rib fracture scoring system. J Trauma Nurs 2012;19:168–74.ArticlePubMed
- 14. Chen J, Jeremitsky E, Philp F, Fry W, Smith RS. A chest trauma scoring system to predict outcomes. Surgery 2014;156:988–93.ArticlePubMed
- 15. Chapman BC, Herbert B, Rodil M, et al. RibScore: a novel radiographic score based on fracture pattern that predicts pneumonia, respiratory failure, and tracheostomy. J Trauma Acute Care Surg 2016;80:95–101.ArticlePubMed
- 16. Seok J, Cho HM, Kim HH, et al. Chest trauma scoring systems for predicting respiratory complications in isolated rib fracture. J Surg Res 2019;244:84–90.ArticlePubMed
- 17. Edwards JG, Clarke P, Pieracci FM, et al. Taxonomy of multiple rib fractures: results of the chest wall injury society international consensus survey. J Trauma Acute Care Surg 2020;88:e40–5.ArticlePubMed
- 18. Pharaon KS, Marasco S, Mayberry J. Rib fractures, flail chest, and pulmonary contusion. Curr Trauma Rep 2015;1:237–42.ArticlePDF
- 19. Baiu I, Spain D. Rib fractures. JAMA 2019;321:1836.ArticlePubMed
- 20. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002;16:321–57.ArticlePDF
- 21. Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: KDD 2016: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 Aug 13–17; San Francisco, CA, USA. Association for Computing Machinery 2016;p. 785–94.Article
- 22. Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Guyon I, Von Luxburg U, Bengio S, et al., editors. Advances in Neural Information Processing Systems 30 (NIPS 2017); 31st Annual Conference on Neural Information Processing Systems (NIPS); 2017 Dec 4–9; Long Beach, CA, USA.. PDF
- 23. Breiman L. Random forests. Mach Learn 2001;45:5–32.Article
- 24. Mathanker SK, Weckler PR, Bowser TJ, Wang N, Maness NO. AdaBoost classifiers for pecan defect classification. Comput Electron Agric 2011;77:60–8.Article
- 25. Lundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2020;2:56–67.ArticlePubMedPMCPDF
- 26. Husarek J, Hess S, Razaeian S, et al. Artificial intelligence in commercial fracture detection products: a systematic review and meta-analysis of diagnostic test accuracy. Sci Rep 2024;14:23053.ArticlePubMedPMCPDF
- 27. Lopez-Melia M, Magnin V, Marchand-Maillet S, Grabherr S. Deep learning for acute rib fracture detection in CT data: a systematic review and meta-analysis. Br J Radiol 2024;97:535–43.ArticlePubMedPMCPDF
- 28. Choi J, Alawa J, Tennakoon L, Forrester JD. DeepBackRib: deep learning to understand factors associated with readmissions after rib fractures. J Trauma Acute Care Surg 2022;93:757–61.ArticlePubMed
- 29. Liu K, Qian D, Zhang D, et al. A risk prediction model for venous thromboembolism in hospitalized patients with thoracic trauma: a machine learning, national multicenter retrospective study. World J Emerg Surg 2025;20:14.ArticlePubMedPMCPDF
- 30. Song X, Li H, Chen Q, et al. Predicting pneumonia during hospitalization in flail chest patients using machine learning approaches. Front Surg 2023;9:1060691.ArticlePubMedPMC
- 31. Grinsztajn L, Oyallon E, Varoquaux G. Why do tree-based models still outperform deep learning on typical tabular data? Adv Neural Inf Process Syst 2022;35:507–20.
- 32. Yıldız AY, Kalayci A. Gradient boosting decision trees on medical diagnosis over tabular data [Preprint]. Posted 2024 Sep 25. arXiv:2410.03705. https://doi.org/10.48550/arXiv.2410.03705, Article
- 33. Shwartz-Ziv R, Armon A. Tabular data: deep learning is not all you need. Inf Fusion 2022;81:84–90.Article
Citations
Citations to this article as recorded by
