Title Necrotizing Enterocolitis Classification Using Unsupervised Machine Learning: A Multinational Retrospective Study
Author Sung-Hoon Chung, Yong Sung Choi
Background Necrotizing enterocolitis (NEC) is a critical neonatal intestinal disease, yet its classification remains ambiguous. Previous research primarily focused on NEC as a single entity, often conflated with other neonatal intestinal pathologies such as spontaneous intestinal perforation (SIP). The lack of a precise definition hampers research and clinical outcomes. Recent advancements in machine learning provide an opportunity to redefine and classify neonatal intestinal diseases more accurately.
Aim / Hypothesis This study aims to classify neonatal intestinal injuries, including NEC, using unsupervised machine learning, leveraging data from multiple countries to improve the accuracy and generalizability of findings. Unsupervised machine learning can effectively classify neonatal intestinal injuries into distinct categories, facilitating better diagnosis, management, and research.
Inclusion Criteria 1) Infants admitted to participating neonatal intensive care units (NICUs) from 2013 to 2022. 2) Diagnosed with ICD codes for NEC, SIP.
Exclusion Criteria 1) Infants born with congenital anomaly or metabolic inborn errors. 2) Infants with incomplete medical records.
Study Design Statistical methods This is a multicenter study involving hospitals from South Korea, the United States, and Canada. Data from the Korean Neonatal Network (KNN) will be utilized for the Korean cohort. Data collection will include a comprehensive review of maternal and infant medical records, capturing a wide range of variables related to prenatal, perinatal, and postnatal factors, nutritional status, medications, and surgical outcomes. Data will be standardized to ensure comparability across sites. Importantly, the raw data will not be shared between the research teams from South Korea, the United States, and Canada. Instead, each research team will conduct statistical analyses on their respective country's data independently. The results from these analyses will then be compared and interpreted collectively, or the analyses will be conducted on the aggregated results without the exchange of raw data. Hierarchical clustering using R (version 4.2.1) and hclust’s complete linkage method will be employed to identify distinct clusters of intestinal injuries. The significance of clustering variables will be assessed using Wilcoxon tests.
Primary Outcomes 1) Identification and characterization of distinct clusters of neonatal intestinal injuries, including NEC. 2) Determination of clinical, nutritional, and multiple factors associated with each cluster.
Secondary Outcomes and Definitions 1) Comparison of clinical outcomes, including mortality and morbidity, across identified clusters. 2) Evaluation of the impact of clinical factors and medications on the development and progression of neonatal intestinal injuries. NEC: Defined using ICD codes and clinical criteria from participating hospitals. SIP: Defined as isolated perforation with no evidence of NEC on surgical or histopathological examination.
Protocols 1. Data Collection: 1) Standardized data extraction from medical records across all sites. 2) Use of a unified data collection template to ensure consistency. 2. Data Standardization and Preprocessing: 1) Conversion of categorical variables to numeric form. 2) Standardization of continuous variables to a mean of zero and standard deviation of one. 3. Machine Learning Analysis: 1) Application of hierarchical clustering using the complete linkage method. 2) Visualization of clusters via dendrograms. 3) Statistical analysis to identify significant variables contributing to each cluster. 4. Ethical Considerations: 1) Approval from respective Institutional Review Boards. 2) Ensuring patient confidentiality and data security.
Funding None