Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A new method for identifying the acute respiratory distress syndrome disease based on noninvasive physiological parameters

  • Pengcheng Yang,

    Roles Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Institute of Medical Support, Academy of Military Sciences, Tianjin, China

  • Taihu Wu,

    Roles Methodology, Project administration

    Affiliation Institute of Medical Support, Academy of Military Sciences, Tianjin, China

  • Ming Yu,

    Roles Methodology, Project administration, Resources

    Affiliation Institute of Medical Support, Academy of Military Sciences, Tianjin, China

  • Feng Chen,

    Roles Methodology, Project administration

    Affiliation Institute of Medical Support, Academy of Military Sciences, Tianjin, China

  • Chunchen Wang,

    Roles Software, Validation

    Affiliation Department of Aerospace Medicine, Air Force Military Medical University, Xi’an, China

  • Jing Yuan,

    Roles Methodology, Supervision, Validation, Visualization

    Affiliation Institute of Medical Support, Academy of Military Sciences, Tianjin, China

  • Jiameng Xu,

    Roles Software, Validation

    Affiliation Institute of Medical Support, Academy of Military Sciences, Tianjin, China

  • Guang Zhang

    Roles Conceptualization, Formal analysis, Supervision, Writing – review & editing

    zhangguang01@hotmail.com

    Affiliation Institute of Medical Support, Academy of Military Sciences, Tianjin, China

Abstract

Early diagnosis and prevention play a crucial role in the treatment of patients with ARDS. The definition of ARDS requires an arterial blood gas to define the ratio of partial pressure of arterial oxygen to fraction of inspired oxygen (PaO2/FiO2 ratio). However, many patients with ARDS do not have a blood gas measured, which may result in under-diagnosis of the condition. Using data from MIMIC-III Database, we propose an algorithm based on patient non-invasive physiological parameters to estimate P/F levels to aid in the diagnosis of ARDS disease. The machine learning algorithm was combined with the filter feature selection method to study the correlation of various noninvasive parameters from patients to identify the ARDS disease. Cross-validation techniques are used to verify the performance of algorithms for different feature subsets. XGBoost using the optimal feature subset had the best performance of ARDS identification with the sensitivity of 84.03%, the specificity of 87.75% and the AUC of 0.9128. For the four machine learning algorithms, reducing a certain number of features, AUC can still above 0.8. Compared to Rice Linear Model, this method has the advantages of high reliability and continually monitoring the development of patients with ARDS.

1. Introduction

Acute respiratory distress syndrome is a disease that seriously threatens the health of human lives[1,2]. According to relevant epidemiological investigations, the in-hospital mortality rate of ARDS is as high as 40%[3,4]. Currently, the diagnosis of ARDS disease is mainly based on the Berlin definition[5]. The Berlin definition was introduced in 2012 and allowed a clear diagnosis of ARDS disease by stating that when positive end-expiratory pressure (PEEP) ≥5 cmH2O, ARDS can be classified into three stated with increasing severity, namely, mild (200 < arterial oxygen partial pressure (PaO2)/ fraction of inspired oxygen (FiO2) (P/F) ≤300), moderate (100 < P/F ≤ 200), and severe (P/F ≤ 100), according to the level of oxygenation index (P/F). At present, blood gas analysis is mainly used to measure PaO2 to calculate the P/F value to evaluate the severity of ARDS. However, this method is still limited by some defects[6]. Firstly, the calculation of the P/F value requires blood gas analyses. In the clinical use of arterial indwelling catheters, daily care is difficult, and it is not easy to operate on some particular patients, such as newborns and elderly patients[7]. Secondly, arterial blood gas analyses cannot monitor the development of patients with ARDS in real-time, which makes doctors unable to adopt appropriate respiratory therapy strategies and delay the diagnosis and treatment of patients[8].

In recent years, in response to encountered problems in conducting blood gas analyses, researchers attempted to use the noninvasive parameter pulse oximetric saturation (SpO2)/FiO2 (S/F) to estimate P/F, thereby achieving noninvasive identification of the severity of ARDS disease[911]. At this stage, the single SpO2 parameter was mainly used, and there was a specific limit expected in relation to the range of SpO2 (SpO2 ≤ 97%). The traditional linear regression algorithm[11] was used to construct the prediction model, but the model identification effect was not ideal[10,1216]. Additionally, it was challenging to provide accurate guidance for medical staff in the clinic[12,17].

Based on a review of the literature[9], we found that when a patient's condition changes, the patient's physiological parameters (such as heart rate, blood pressure, respiratory rate, etc.) will change at varying degrees, which provided ideas that assisted the investigation of the aims of our study.

In response to the problems listed above, we extracted a variety of noninvasive physiological parameters from ICU patients and explored the relevance of these parameters for the identification of the level of P/F ratio. An algorithmic model for identifying ARDS disease based on a variety of noninvasive parameters was established to provide medical staff with the reference basis for disease diagnosis. This model uses a feature selection algorithm and a cross-validation model to evaluate the recognition effects of four machine learning algorithms using different subsets of feature values.

Herein, we used a variety of evaluation indicators to assess the ability of different algorithms and feature subsets for ARDS disease identification. To further investigate the performance of machine learning algorithms, we used existing data to classify the ARDS disease using traditional linear regression models, and we discuss the various methods of development.

2. Materials

2.1 Data sources

Medical Information Mart for Intensive Care III (MIMIC-III, V1.4) is a large, freely available database comprising de-identified health-related data associated with over forty thousand patients who stayed in the critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012[18]. The database includes information, such as demographics, vital sign measurements obtained at the bedside, laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality records.

2.2 Patients and data collection

The patient diagnostic information was recorded in the MIMIC–III database. In the patient screening process, we combined the diagnostic information provided by the database and the Berlin definition to determine whether the enrolled patient was suffering from ARDS, thus ensuring the accuracy of the disease diagnosis. In combination with the Berlin definition and the disease diagnosis, we propose the following conditions: 1) determine whether the patient has a P/F < 300 on the first day of entering the ICU, 2) determine whether the patient underwent chest imaging during his/her presence in the ICU and whether the imaging report was verified, 3) formulate a comprehensive judgment based on the patient's disease diagnosis information.

Combined with the above, we propose the corresponding patient selection criteria:

  1. Choose the patients who first entered the ICU (if the patients entered the ICU multiple times, it may be likely that the patient conditions were more complicated and may have affected the identification result. In this study, we only used the data from the patients who entered the ICU for the first time)
  2. The patient is older than 16 years old
  3. The patient stayed in the ICU for more than 48 h
  4. Mechanical ventilation was used during the presence of the patient in the ICU
  5. P/F < 300 on the first day

This study extracted a variety of noninvasive physiological parameters of patients: demographics (age, gender, height, weight, body mass index (BMI), ethnicity), ICU information (ICU type, length of stay in ICU, admission type, in-hospital mortality), clinical measures (SpO2, temperature, heart rate, blood pressure, Glasgow Coma Scale (GCS)), respiratory system (respiratory rate, tidal volume, minute ventilation volume, peak pressure, plateau pressure, mean air pressure, PEEP, FiO2), and oxygenation index (P/F, S/F, Oxygenation Index (OI), Oxygenation Saturation Index (OSI)).

This study has paid more attention to the noninvasive physiological parameters of patients. Based on an extensive review of the literature combined with the actual recording parameters of patients in the database, the following noninvasive physiological parameters are finally used in the identification algorithm: SpO2, temperature, heart rate, blood pressure, GCS, respiratory rate, tidal volume, minute ventilation volume, peak pressure, plateau pressure, mean air pressure, PEEP, FiO2, S/F, OSI, and demographics (age, gender, BMI). Additionally, convert these parameters into 24 features for model training. The main purpose of this study was to identify ARDS by monitoring P/F values through a variety of noninvasive parameters. We used P/F as the outcome variable, P/F ≤ 300 data points as positive samples, and P/F > 300 as negative samples.

In the process of extracting the physiological parameters for patients from the database, we also needed to extract the blood gas analyses outcomes obtained at a specific test time to ensure the accuracy of the identified results. However, this also caused considerable data losses. To avoid this problem, we allowed the use of data from the first two h following blood gas analyses in the data collection process as a substitute for the respective outcomes at the specific, desired test time.

3. Methods

This section provides an overview of the procedures described in the adopted methods, which are visually summarized in Fig 1. The dataset for this study was from the MIMIC–III database. After preprocessing the data, they were divided into a training (75%) and a test set (25%). In the model training process, we used the training dataset, used cross-validation to evaluate the identification performance of different feature subsets and algorithms, used the test set to verify the model, and compared it with the traditional algorithm.

thumbnail
Fig 1. Overview of the model design.

Summarize the overall process of the experiment. The raw data in the MIMIC-III database was preprocessed, and the data set was awakened and randomly grouped: 75% data was used for model training; the remaining 25% data was used for model testing, and comparative experiments were conducted to obtain the final experimental results.

https://doi.org/10.1371/journal.pone.0226962.g001

3.1 Preprocessing

3.1.1 Handling missing values.

In the process of collection of physiological parameters from patients, we have found that some physiological parameters were recorded at a lower frequency, such as the noninvasive blood pressure recordings, thus resulting in the absence of physiological data recordings at the time of the blood gas analyses. Fortunately, the patient's invasive blood pressure was continually monitored, which could provide data when noninvasive blood pressure data was lacking[19]. Therefore, the random forest was used to complement the patient's noninvasive blood pressure missing values, and other physiological parameters were imputed using k-nearest neighbor (k-NN)[20].

3.1.2 Oversampling and normalization.

In the data preprocessing process, we found that use of P/F ≤ 300 to divide the dataset into positive and negative samples would result in an imbalance in the dataset. The use of unbalanced data for the machine learning algorithm training would result in a bias toward the larger sample size, which would make the generalization ability of the algorithm insufficient, and would affect the overall performance of the model. For the above reasons, we used the oversampling method to deal with the problem of data imbalance[21]. The current implementation methods of oversampling included random oversampling, the Synthetic Minority Oversampling Technique (SMOTE)[22], and the Adaptive Synthetic (ADASYN) sampling approach[23]. Random oversampling solves the problem of data imbalance by randomly sampling in the classes which are under-represented. SMOTE uses the similarities of under-represented samples in the feature space to generate new samples. The ADASYN solution was used to generate different numbers of new samples for different under-represented samples, based on the data distribution. It is an extension of SMOTE, but from the results, ADASYN tended to focus on some outliers. Based on the above analyses, we use SMOTE to deal with the problem of the data sample imbalance.

This study used a variety of physiological patient parameters each of which was associated with a different range of values. For most machine learning algorithms (such as neural networks), this situation would result in the slow learning of the algorithm, and would be easier to achieve the local optimal solution, thereby affecting the training outcomes of the algorithm. Therefore, it was necessary to normalize the feature values of different orders of magnitude to the same order of magnitude [a, b]. We used feature scaling to standardize data in accordance with Eq (1): (1) Where Xmin represents the minimum and Xmax represents the maximum value of an attribute. The motivation to use feature scaling was based on the robustness to very small standard deviations of features and the preservation of zero entries in the sparse data.

3.2 Feature selection

This study extracted a variety of information and physiological parameters of patients, and these parameters correspond to 24 features, but it is unclear which features yield a strong correlation with the identification of ARDS disease. Conversely, the performance of the supervised learning algorithm had a certain correlation with the number of input features, and the correlation between features and outcome variables. The purpose of feature selection was to identify a subset of features that optimized the algorithmic performance compared to the original feature set. There were three types of feature selection algorithms, namely, filter, wrapper, and embedded[24].

The filter method first selected the feature of the dataset and then trained the classifier. The feature selection process was independent of the subsequent classifier. In contrast to the filter method, the wrapper feature selection directly applied the performance of the classifier to be used as the evaluation criterion of the feature subset. In the case where more features existed, the computational overhead was usually much larger than that for the filter. The embedded method combined the evaluation feature importance with the model algorithm, and resulting in an increased correlation between the feature value selection results and the evaluation algorithm. In most cases, other algorithms were not applicable[25].

The filter method was associated with a small number of calculations, and the feature value selection result did not depend on the classification algorithm[24]. The filter method generally evaluates the importance of feature values in three ways: distance, dependency, and information. Based on this, we have selected three representative methods for these three aspects: relief, chi-squared, and mutual information.

3.2.1 Relief-F.

The key idea of the Relief-F algorithm was to estimate the quality of attributes according to how well their values distinguished among instances that were close to each other[26]. For example, the quality estimation of the attribute j is shown in Eq (2), if sample , belonged to class k, Relief–F first searched for the (near-hit) of in the sample of class k, and then found (near-miss) of x in each class other than the kth class.

(2)

3.2.2 Chi-squared.

The chi-squared was based on the χ2 statistic and consisted of two phases. The first phase began with a high significance level for all numeric attributes for discretization. Each attribute was sorted according to its values[27]. The following steps were then performed: 1) calculate the χ2 value in accordance with Eq (3) for every pair of adjacent intervals, and 2) merge the pair of adjacent intervals with the lowest χ2 value. Eq (3) was used for computing the χ2 value in accordance to (3) Where Aij denotes the number of patterns in the ith interval and jth class, and Eij is the expected frequency of Aij.

3.2.3 Mutual information.

Mutual information based feature selected(MIFS) was used to measure the amount of information shared between the two features[28]. The mutual information I(X,Y) between two variables X and Y was expressed as (4)

3.2.4 Rank aggregation.

To ensure the stability of feature selection, we used a combination of filter feature selection methods[25]. The results of the three algorithms were not in a uniform numerical range, and it is not convenient to evaluate the importance of the features. In order to make the three methods equally important, we normalized the three results[29]. The rank aggregation method was formulated based on Eq (5). (5) Where Fi is the result of different filter feature selection methods, and Rj is the final rank score for the jth feature.

3.3 Classification algorithms

This study designed an algorithm that combined feature selection with multiple classification algorithms, used a 10-fold cross-validation model, trained classifiers for different feature subsets, and selected the optimal combination of feature subsets and classifiers, and achieved the identification of the ARDS. This section presents an abridged description of the four classifiers selected for this study.

3.3.1 L2 regularized logistic regression (L2–LR).

In order to prevent overfitting of the classification algorithm, a regularization term was added to the traditional logistic regression cost function J(w,b). Since the feature selection used the external filter method, this study used L2 regularization to avoid the situation where the L1 regularization caused the weight to be sparse[30]. Furthermore, λ is the regularization parameter used to control the weight w. (6) (7) Where is defined in accordance to Eq (6), m is the number of samples, and n is the number of features.

3.3.2 Artificial neural network.

This study used a single hidden layer feedforward neural network (SLP–FNN). According to the number of features and the outcome variables, the following network structure was designed. Specifically, the number of neurons in the input layer was 24, the number of neurons in the hidden layer was 23, and the number of neurons in the output layer was two. In order to quickly iterate and train the network, we used a stochastic gradient descent algorithm to optimize the parameters of the network, while we concurrently used adaptive learning rates. Selecting the rectified linear unit function as the activation function we could effectively prevent the occurrence of gradient disappearance. To prevent overfitting in the network training, we used the L2 regularization term. The principle is the same as that described in subsection 3.3.1.

3.3.3 AdaBoost.

The AdaBoost algorithm is a two-class learning method in which the model is an additive model, the loss function is an exponential function, and the learning algorithm is a forward step-by-step algorithm. The specific idea of AdaBoost was to increase the weights of samples that had been misclassified by the previous round of weak classifiers, and to reduce the weights of those samples that were correctly classified[31]. As a result, the data that were not correctly classified were more concerned by the latter round of weak classifiers owing to their increased weight. Herein, Gm(x) is a weak classifier, and αm indicates the importance of Gm(x) in the final classifier, Eq (8) is a mathematical description of the forward distributed algorithm, and Eq (9) is the final classifier constructed based on Eq (8).

(8)(9)

3.3.4 XGBoost.

XGBoost is a scalable machine learning system for tree boosting. The impact of the system has been extensively recognized in a number of machine learning and data mining challenges[32]. (10) (11) Herein, is a differentiable convex loss function that measures the difference between the prediction and the target yi. The second term Ω(ft) penalizes the complexity of the model. The additional regularization term helps to smooth the final learned weights to avoid overfitting. Moreover, γ and λ are the regularization parameters used to control regularization terms.

3.3.5 Traditional noninvasive classification method.

Previous studies on the use of the noninvasive parameter identification ARDS focused on the use of a single parameter S/F to fit the P/F value. This study used the linear regression model proposed by Rice et al[11]. The model used adult SpO2 values (SpO2 < 97%) to fit the P/F values, thus enabling continuous monitoring of the patient’s P/F values using noninvasive parameters. The Rice Linear Model is shown in Eq (12). (12) The noninvasive parameter S/F is used to obtain the predicted P/F value according to Eq (12) so as to classify the severity of ARDS disease, and to obtain the classification result of the traditional algorithm.

3.4 Performance metrics

According to the diagnostic definition of ARDS disease, P/F ≤ 300 is ARDS. According to this standard, the sample is divided into positive and negative results. Table 1 describes the relationship between the real category and the identification category.

thumbnail
Table 1. The relationship between real categories and recognition results.

https://doi.org/10.1371/journal.pone.0226962.t001

We measured the classification performance based on the average of AUC, and the accuracy (ACC), sensitivity (SEN), specificity (SPE), and balanced error rate (BER), as defined by Eqs (13)–(16), respectively.

(13)(14)(15)(16)

BER is a balanced metric that equally weights errors in SEN and SPN. We used the BER index to select the optimal feature subset based on a 10-fold cross-validation model[33]. For each algorithm, under the different feature subsets, the smallest mean BER was chosen as the optimal feature subset (the minimum BER subset) of this algorithm[29]. The search algorithm for optimal feature subsets is summarized in Algorithm 1 (Fig 2). According to the results of this algorithm, the minimum feature subset of the algorithm was found within the BER standard deviation of the optimal feature subset. At the same time, the two cases presented above were compared with all of their features to select the optimal identification result.

4. Results

We identified 8702 patients who met our inclusion criteria from a total of 46476 patients enrolled in the MIMIC–III database. Fig 3 is a flowchart outlining the patient selection and detailing the number of patients and the data selection process. There were 6601 patients (148414 data points) in the training set and 2101 patients (47352 data points) in the test set.

thumbnail
Fig 3. Flow diagram for patient selection.

According to the ARDS diagnostic criteria, the appropriate enrolled population was selected from more than 40,000 patients in the MIMIC-III database, and 8702 eligible patients were finally included, and the data sets were randomly divided into training sets and test sets.

https://doi.org/10.1371/journal.pone.0226962.g003

The demographics and utilization characteristics are summarized in Tables 2 and 3. Table 2 summarizes the demographic information of patients. The training set has a consistent patient distribution with the test set. In the training set, the patients were hospitalized in different intensive care units: CSRU (2231, 33.8%), MICU (1851, 28.4%), SICU (927, 14.04%), TSICU (904,13.09%), and CCU (688, 10.42%), the average age of patients was 65.14. The majority of the patients were male (58.64%). The patient in-hospital mortality rate was 16.34%. Table 3 summarizes the distribution of physiological parameters of patients classified in the training and test sets. As observed, there is a large difference between the positive samples (P/F ≤ 300) and the negative samples (P/F > 300) within the same dataset. For the training set and the test set, the two datasets were randomly grouped and had a common distribution. There is no significant difference in the dataset.

thumbnail
Table 2. Patient demographics in training and test sets (ICU: Intensive Care Unit, CSRU: Cardiac Surgery Recovery Unit, MICU: Medical Intensive Care Unit, CCU: Coronary care unit, SICU: Surgical intensive care unit, TSICU: Trauma Surgical Intensive Care Unit).

https://doi.org/10.1371/journal.pone.0226962.t002

thumbnail
Table 3. Patient characteristics in training and test sets (Nisbp: Noninvasive systolic blood pressure, Nidbp: Noninvasive diastolic blood pressure, Nimbp: Noninvasive mean blood pressure, OI: (FiO2×Mean air pressure)/PaO2, OSI: (FiO2×Mean air pressure)/SpO2.

https://doi.org/10.1371/journal.pone.0226962.t003

4.1 Feature selection result

Table 4 represents the normalized values of the scores provided by the three filter methods under consideration. The importance of the features in this study is relative to the level of oxygenation index. The closer the value is to one of the scores, the more relevant the feature is. The MIFS criterion showed that a number of parameters were relevant, while the Relief–F and chi-squared tests were more conservative, and indicated that the SpO2 and S/F were the more important and relevant features. The ranking of the final features is listed in Table 3. This combined score was calculated based on Eq (5). According to the combined score, SpO2 is clearly more relevant than the rest of the parameters. Furthermore, SpO2, S/F, FiO2, and PEEP, are also likely to be highly relevant features.

thumbnail
Table 4. Physiological parameter scores and rankings for different feature selection methods.

https://doi.org/10.1371/journal.pone.0226962.t004

4.2 Algorithmic evaluation

Using the training dataset, the 10-fold cross-validation methods were used to evaluate the performance of the four algorithms. According to the feature ranking results in Table 4, the features were substituted into the four algorithms in turn, the BER of each algorithm was used to select the feature subset, and the algorithm effect was compared based on AUC. As shown in Fig 4, the BER of the four algorithms change as a function of the number of features, and the average BER results of the four classification algorithms are listed for different feature subsets. The gray area represents the standard deviation of the BER. The red triangle and green dot marks and their corresponding numbers represent the minimum feature and the optimal feature subsets, respectively. We found that the BER of the four algorithms decreased considerably when the first five features were added to the model, but as the features were added gradually, the BER decreased slowly.

thumbnail
Fig 4. Feature selection based on the four methods discussed in this study.

The X-axis is the feature number, the y-axis is the BER average of the ten-fold cross-validation, and the gray shaded area is the BER standard deviation of the ten-fold cross-validation under a specific feature subset. The figure shows the trend of BER changes of the four algorithms in the process of adding features step by step. The position of the green circle is the optimal feature subset of the algorithm, and the red triangle is the smallest feature subset.

https://doi.org/10.1371/journal.pone.0226962.g004

For SLP–FNN, L2–LR, and AdaBoost, almost all feature training models were used to achieve minimum BER. Compared to the first three algorithms, XGBoost achieved the minimum BER in the 13th feature. As the number of features increased, the BER appeared to increase. We selected the smallest number of features for which the mean BER was within one standard error of the minimum BER (subset selection threshold). According to this standard, we found the optimal and smallest feature subset of the four algorithms: L2–LR (24, 18), SLP–FNN (24, 20), AdaBoost (23, 21), XGBoost (12, 6).

4.3 Performance of classification algorithms

4.3.1 Training dataset.

Based on the selected features, we obtained the minimum, optimal feature subset for the training set. We used training data for the minimum, optimal, and all feature subsets. Four classification algorithms were trained using 10-fold cross-validation. The results are shown in Table 5.

thumbnail
Table 5. Identification results of the four algorithms on the training set for different feature subsets.

https://doi.org/10.1371/journal.pone.0226962.t005

By comparing the results of the optimal and minimum feature subsets, we found that the minimum feature subset was determined using the minimum BER and standard deviation, but with the use of fewer feature quantities (reducing certain data information). However, there was no significant decline in AUC. Fig 5 shows the AUC results for each algorithm with the use of different feature subsets.

thumbnail
Fig 5. AUC of the four tested algorithms on the training set for different feature subsets.

Based on the feature selection experiment, the training data set is used to study the recognition performance of four machine learning algorithms under different feature subsets.

https://doi.org/10.1371/journal.pone.0226962.g005

4.3.2 Test dataset

The test set was completely independent of the data of feature selection and model training. When the training set was used to test the performance of the algorithm, we added a traditional noninvasive identification algorithm to compare the traditional and the machine learning algorithm. The final result is shown in Table 6.

thumbnail
Table 6. Identification results of four algorithms on test sets for different feature subsets.

https://doi.org/10.1371/journal.pone.0226962.t006

The ROC curves of the five algorithms are shown in Fig 6. Based on the results, we show that the overall performance of the traditional algorithm exhibits a specific gap with respect to the machine learning algorithm. The AUC (0.7354) of the Rice Linear Model is much lower than the AUC of L2–LR under the minimum feature subset (0.8156).

thumbnail
Fig 6. ROC curves of the applications of the four algorithms studied herein on the test dataset.

According to the experimental results of the training set, the four machine learning algorithms and the Rice linear model are used to identify the performance of the minimum feature subset on the test set data, and the ROC curve of each algorithm is drawn.

https://doi.org/10.1371/journal.pone.0226962.g006

In this study, we analyze the classification ability of features, use filter methods to sort the importance of features, and use MIN–BER–FS algorithm to find the optimal feature subset and the minimum feature subset. In the algorithmic evaluation, we compare the experimental results of the minimum, optimal, and all feature subsets, which can reflect the ability of the algorithm to mine information from different aspects. In the process of feature traversal, it can also be said that the feature importance is consistent with the feature sorting experiment results. From the results, XGBoost has the best results under the optimal subset. At the same time, XGBoost achieved better results than the other three algorithms in the minimum feature subset (only the first six features). Observing the variation of the standard deviation during the training process, we can see that AdaBoost has the best stability.

5. Discussion

In this study, a novel identification algorithm was presented that combined multiple noninvasive physiological parameters with machine learning algorithms to estimate P/F ratio levels. First, we used the MIMIC–III database to extract the SpO2, PaO2, and FiO2 that were commonly used to identify ARDS. At the same time, we extracted a variety of other noninvasive physiological parameters relevant to the patient. In terms of feature selection, the filter method was selected, and its feature selection was independent of the subsequent models. We used a variety of feature selection algorithms (Relief, chi-squared, MIFS) to filter the features, combined with the rank aggregation method, to obtain the final feature ranking results[25]. In the process of designing the ARDS identification algorithm, we used the cross-validation model to evaluate the average BER of the four algorithms (L2–LR, SLP–FNN, AdaBoost, XGBoost) in the optimal feature subset and the minimum feature subset according to the results of feature sorting to comprehensively consider the number of features and the identification results, thereby allowing the choice of the most suitable combination[29,33,34]. Conversely, selection of the minimum number of features implied the elimination of features that were insensitive to the accuracy of identification, simplification of the difficulty of the identification algorithm in actual use, and saving computation time. Conversely, the accuracy of the identification algorithm cannot be sacrificed.

Regarding the noninvasive identification related research on the severity of ARDS, most of the current research is concerned on the relationship between S/F and P/F[9,10,12,35]. S/F and P/F did exhibit strong correlations, but the use of S/F for regression analyses alone led to a large error in the classification of the severity of ARDS disease[6], and there was a range of restrictions on SpO2 (SpO2 < 97%)[10,11]. Some researchers have found that P/F was affected by some other parameters, such as the possible connection to a ventilator, or the modification of ventilator-related settings (PEEP, FiO2, Minute ventilation volume, etc.)[9]. At the same time, when the P/F changed, some physiological parameters of the patient (such as heart rate, respiratory rate, etc.) also changed[7]. Based on the above analyses, this study considered a variety of noninvasive physiological parameters obtained from patients. In the feature selection method design, we did not use an algorithm alone, but chose a variety of algorithms and integrated the results to prevent the selection of a single sorting algorithm to make the feature sorting less accurate. We selected three representative methods in accordance with distance, dependency, and information, and normalized the three feature ranking results to calculate the final feature ranking outcome. Compared to a single method, the sorting result is more stable and reliable.

Table 5 shows the results of the training set. L2–LR achieved a minimum BER when all the features were used, thus yielding an AUC = 0.8268. The neural network also reached the minimum BER when all the features were used, and its recognition performance was slightly better than L2–LR, thus yielding an AUC = 0.8464. Both AdaBoost and XGBoost were lifting tree algorithms. The identification results of the lifting tree algorithm were better than the logistic regression and neural networks (single hidden layer). When AdaBoost used 23 features, the BER outcome was the smallest, and yielded AUC = 0.8694. When XGBoost used 12 features, the BER outcome was minimal and yielded an AUC = 0.9282. Using the average minimum BER to find the minimum feature subset, it was found that reducing the number of features to a certain extent did not affect the recognition performance of the algorithm. In this respect, the advantage of XGBoost is obvious. With the use of six features, the accuracy rate only dropped by 0.42%. Combined with Fig 4, we found that the first six features (SpO2, S/F, FiO2, PEEP, mean air pressure, respiratory rate) contributed considerably to the identification algorithm, and the BER decrease was more distinct. After the addition of the features, the BER decreased gradually.

In the test set, we introduced a traditional linear regression algorithm[11] to evaluate the recognition performance of the classification algorithm. The performance of the four algorithms on the test set was basically consistent with the results of the training set, and yielded good generalization ability for the single-center independent dataset. The Rice Linear Model yielded an AUC = 0.7738 and ACC = 70.67%, which are far from the corresponding results elicited based on the machine learning algorithm. According to the literature published by Rice in 1994, we know that the Rice Linear Model method was based on the premise of SpO2 < 97%[11]. This study directly used the model for the existing data (and the range of SpO2 without limitations), and some deviation was expected. In actual clinical problems, there are often some ARDS patients with SpO2 > 97%, but their P/F ≤ 300. In this case, Rice Linear Model cannot provide doctors with auxiliary diagnosis decisions, and our algorithm model can overcome the shortcomings of traditional methods.

The application scenario for this algorithm is as shown in S1 Fig. The patient's oxygenation level (≤300 or >300) was identified by collecting mechanical ventilation parameters and physical signs to assist the physician in the diagnosis of ARDS without blood gas analysis. For patients who have been diagnosed as ARDS, the algorithm is used to monitor the patient's oxygenation index level in real-time, and the doctor can adjust the ventilator treatment plan at any time. On the other hand, the MIN–BER–FS algorithm can significantly reduce the amount of computation, and it is easy to transplant the algorithm to the ordinary microprocessors of ventilators and monitors, which can realize a more intelligent aided diagnosis.

There were also some limitations associated with this study. The MIMIC–III database used in this study was a single-center database. Even though in the experimental design we separated the training set from the test set, we still need to perform external verification to ensure that the model works well in different hospitals. We found no data on bilateral pulmonary infiltrate of non-cardiogenic origin in the database, and there were very few patients with a clear diagnosis of ARDS. In the process of patient screening, this study based on the actual definition of the Berlin definition and database, as far as possible to select patients who meet the diagnostic criteria of ARDS, this process may introduce some confounding factors. The population of the clinical database we used was mainly concentrated at the age of 55, and most of the patients were middle-aged and elderly patients, which may bias our training model more focused on elderly patients, and there may be deviations in low-age populations.

Dataset imbalance is a ubiquitous problem in clinical research, and we have adopted a compromise of oversampling dataset imbalances. Oversampling does not solve the problem fundamentally, but can only alleviate the deviation of the results caused by data imbalances to some extent. The problem of data set imbalance can be fundamentally solved only by expanding the dataset. Missing data is an important problem in all modeling efforts, especially in the healthcare domain. For the MIMIC database, missing data problems also exist, such as the patient's noninvasive blood pressure, airway pressure, and other physiological parameters. If these missing data is omitted, a lot of samples will be lost. If some technical means are used for missing data processing, most of the samples will be retained. However, the above two methods will cause deviations in the model, the former was caused by the partial patient data loss, and the latter was caused by the introduction of some errors in the process of missing value estimation based on interpolation[19].

This study was exploratory and was mainly applied to investigate whether the use of noninvasive parameters could identify the ARDS patients, and to use feature selection techniques to select which noninvasive parameters yielded a higher correlation to the oxygenation level. In the future, we will include more patients with ARDS and develop multi-classification methods to achieve continuous ARDS disease severity identification. The outcomes of this study are expected to provide some ideas for future related research.

Next, in the future, an early warning system of the severity of ARDS for the monitors and ventilators will be developed using a multi-classification algorithm The patient's oxygenation level (≤300 or >300) was identified by collecting mechanical ventilation parameters and physical signs to assist the physician in the diagnosis of ARDS without blood gas analysis. For patients who have been diagnosed as ARDS, the algorithm is used to monitor the patient's oxygenation index level in real-time, and the doctor can adjust the ventilator treatment plan at any time.

6. Conclusion

In conclusion, the overall classification effects of machine learning algorithms were better than those elicited by traditional algorithms. For machine learning algorithms, XGBoost was significantly better than the other three algorithms. Feature sorting and feature selection algorithms can help us understand the characteristics of ARDS to identify which features elicit better correlations, and can improve us design high-precision algorithms. The method can continually provide medical assistants with auxiliary diagnosis suggestions.

Supporting information

S1 Fig. The application scenario for the present algorithm.

The algorithm continuously monitors the patient's oxygenation level using basic patient information, ventilator parameters, and monitoring parameters to help the doctor diagnose whether the patient has ARDS and adjust the treatment plan for ARDS patients.

https://doi.org/10.1371/journal.pone.0226962.s001

(TIF)

S1 Dataset. The train dataset.

The train dataset including 6601 patients (148,414 points).

https://doi.org/10.1371/journal.pone.0226962.s002

(CSV)

S2 Dataset. The test dataset.

The test dataset including 2101 patients (47,352 points).

https://doi.org/10.1371/journal.pone.0226962.s003

(CSV)

References

  1. 1. Zimmerman JJ, Akhtar SR, Caldwell E, Rubenfeld GD. Incidence and outcomes of pediatric acute lung injury. Pediatrics. 2009;124: 87–95. pmid:19564287
  2. 2. Rubenfeld GD, Caldwell E, Peabody E, Weaver J, Martin DP, Neff M, et al. Incidence and outcomes of acute lung injury. N Engl J Med. 2005;353: 1685–1693. pmid:16236739
  3. 3. Rezoagli E, Fumagalli R, Bellani G. Definition and epidemiology of acute respiratory distress syndrome. Ann Transl Med. 2017;5: 282. pmid:28828357
  4. 4. Cheifetz IM. Year in Review 2015: Pediatric ARDS. Respir Care. 2016;61: 980–985. pmid:27381701
  5. 5. ARDS Definition Task Force, Ranieri VM, Rubenfeld GD, Thompson BT, Ferguson ND, Caldwell E, et al. Acute respiratory distress syndrome: The Berlin Definition. JAMA. 2012;307: 2526–2533. pmid:22797452
  6. 6. Horhat FG, Gundogdu F, David LV, Boia ES, Pirtea L, Horhat R, et al. Early Evaluation and Monitoring of Critical Patients with Acute Respiratory Distress Syndrome (ARDS) Using Specific Genetic Polymorphisms. Biochem Genet. 2017;55: 204–211. pmid:28070694
  7. 7. Ahmed A, Kojicic M, Herasevich V, Gajic O. Early identification of patients with or at risk of acute lung injury. Neth J Med. 2009;67: 268–71. pmid:19841483
  8. 8. Dechert RE, Park PK, Bartlett RH. Evaluation of the oxygenation index in adult respiratory failure. J Trauma Acute Care Surg. 2014;76: 469–473. pmid:24458052
  9. 9. Pisani L, Roozeman J-P, Simonis FD, Giangregorio A, van der Hoeven SM, Schouten LR, et al. Risk stratification using SpO2/FiO2 and PEEP at initial ARDS diagnosis and after 24 h in patients with moderate or severe ARDS. Ann Intensive Care. 2017;7: 108. pmid:29071429
  10. 10. Brown SM, Grissom CK, Moss M, Rice TW, Schoenfeld D, Hou PC, et al. Nonlinear Imputation of Pao2/Fio2 From Spo2/Fio2 Among Patients With Acute Respiratory Distress Syndrome. Chest. 2016;150: 307–313. pmid:26836924
  11. 11. Rice TW, Wheeler AP, Bernard GR, Hayden DL, Schoenfeld DA, Ware LB, et al. Comparison of the SpO2/FIO2 ratio and the PaO2/FIO2 ratio in patients with acute lung injury or ARDS. Chest. 2007;132: 410–417. pmid:17573487
  12. 12. DesPrez K, McNeil JB, Wang C, Bastarache JA, Shaver CM, Ware LB. Oxygenation Saturation Index Predicts Clinical Outcomes in ARDS. Chest. 2017. pmid:28823812
  13. 13. Hammond BG, Garcia-Filion P, Kang P, Rao MY, Willis BC, Dalton HJ. Identifying an Oxygenation Index Threshold for Increased Mortality in Acute Respiratory Failure. Respir Care. 2017;62: 1249–1254. pmid:28634172
  14. 14. Festic E, Bansal V, Kor DJ, Gajic O, US Critical Illness and Injury Trials Group: Lung Injury Prevention Study Investigators (USCIITG–LIPS). SpO2/FiO2 ratio on hospital admission is an indicator of early acute respiratory distress syndrome development among patients at risk. J Intensive Care Med. 2015;30: 209–216. pmid:24362445
  15. 15. Khemani RG, Thomas NJ, Venkatachalam V, Scimeme JP, Berutti T, Schneider JB, et al. Comparison of SpO2 to PaO2 based markers of lung disease severity for children with acute lung injury. Crit Care Med. 2012;40: 1309–1316. pmid:22202709
  16. 16. Pandharipande PP, Shintani AK, Hagerman HE, St Jacques PJ, Rice TW, Sanders NW, et al. Derivation and validation of Spo2/Fio2 ratio to impute for Pao2/Fio2 ratio in the respiratory component of the Sequential Organ Failure Assessment score. Crit Care Med. 2009;37: 1317–1321. pmid:19242333
  17. 17. Bos LD, Martin-Loeches I, Schultz MJ. ARDS: challenges in patient care and frontiers in research. Eur Respir Rev Off J Eur Respir Soc. 2018;27.
  18. 18. Johnson AEW, Pollard TJ, Shen L, Lehman LH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3: 160035. pmid:27219127
  19. 19. Ishioka T. Imputation of missing values for semi-supervised data using the proximity in random forests. Int J of Business Intelligence and Data Mining. 2012. pp. 319–322.
  20. 20. Jonsson P, Wohlin C. An evaluation of k-nearest neighbour imputation using Likert data. Proceedings—International Software Metrics Symposium. 2004. pp. 108–118. https://doi.org/10.1109/METRIC.2004.1357895
  21. 21. Lemaître G, Nogueira F, Aridas C. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. 2016;18.
  22. 22. Chawla N, Bowyer K, O. Hall L, Philip Kegelmeyer W. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res JAIR. 2002;16: 321–357.
  23. 23. He H, Bai Y, Garcia EA, Li S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. IEEE International Joint Conference on Neural Networks. 2008. pp. 1322–1328. Available: http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=4633969
  24. 24. Li J, Cheng K, Wang S, Morstatter F, P. Trevino R, Tang J, et al. Feature Selection: A Data Perspective. ACM Comput Surv. 2016;50.
  25. 25. Bouaguel W, Limam M. A New Way for Combining Filter Feature Selection Methods. 2015.
  26. 26. Robnik-Sikonja M, Kononenko I. Theoretical and Empirical Analysis of ReliefF and RReliefF. Mach Learn. 2003;53: 23–69.
  27. 27. Setiono R, Liu H. Chi2: Feature Selection and Discretization of Numeric Attributes. Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence(ICTAI). 1995. p. 388. https://doi.org/10.1109/TAI.1995.479783
  28. 28. Battiti R. Using Mutual Information for Selecting Features in Supervised Neural Net Learning. Neural Netw IEEE Trans On. 1994;5: 537–550. pmid:18267827
  29. 29. Alonso-Atienza F, Morgado E, Fernández-Martínez L, García-Alberola A, Rojo-Álvarez JL. Detection of Life-Threatening Arrhythmias Using Feature Selection and Support Vector Machines. IEEE Trans Biomed Eng. 2014;61: 832–840. pmid:24239968
  30. 30. Ng AY. Feature selection, L1 vs. L2 regularization, and rotational invariance. 2004. p. 78.
  31. 31. Freund Y, Schapire RE. A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi P, editor. Computational Learning Theory. Springer Berlin Heidelberg; 1995. pp. 23–37.
  32. 32. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. 2016.
  33. 33. Carlos F, Unai I, Eduardo M, Elisabete A, Unai A, Lars W, et al. Machine Learning Techniques for the Detection of Shockable Rhythms in Automated External Defibrillators: Plos One. 2016;11: e0159654. pmid:27441719
  34. 34. Simon RM. Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data. Brief Bioinform. 2011;12: 203. pmid:21324971
  35. 35. Ray S, Rogers L, Pagel C, Raman S, Peters MJ, Ramnarayan P. PaO2/FIO2 Ratio Derived From the SpO2/FIO2 Ratio to Improve Mortality Prediction Using the Pediatric Index of Mortality-3 Score in Transported Intensive Care Admissions. Pediatr Crit Care Med J Soc Crit Care Med World Fed Pediatr Intensive Crit Care Soc. 2017;18: e131–e136. pmid:28121834