Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Automatic clustering method to segment COVID-19 CT images

  • Mohamed Abd Elaziz ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Project administration, Writing – original draft, Writing – review & editing

    abd_el_aziz_m@yahoo.com (MAE); lusongfeng@hust.edu.cn (SL)

    Affiliations Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, China, Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt

  • Mohammed A. A. Al-qaness,

    Roles Methodology, Software, Validation, Writing – original draft

    Affiliation State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China

  • Esraa Osama Abo Zaid,

    Roles Methodology, Writing – original draft

    Affiliation Department of Mathematics, Faculty of Science, Seuz University, Suez, Egypt

  • Songfeng Lu ,

    Roles Resources, Writing – review & editing

    abd_el_aziz_m@yahoo.com (MAE); lusongfeng@hust.edu.cn (SL)

    Affiliation Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, China

  • Rehab Ali Ibrahim,

    Roles Validation, Writing – review & editing

    Affiliation Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt

  • Ahmed A. Ewees

    Roles Data curation, Methodology, Supervision, Writing – review & editing

    Affiliation Department of Computer, Damietta University, Damietta, Egypt

Abstract

Coronavirus pandemic (COVID-19) has infected more than ten million persons worldwide. Therefore, researchers are trying to address various aspects that may help in diagnosis this pneumonia. Image segmentation is a necessary pr-processing step that implemented in image analysis and classification applications. Therefore, in this study, our goal is to present an efficient image segmentation method for COVID-19 Computed Tomography (CT) images. The proposed image segmentation method depends on improving the density peaks clustering (DPC) using generalized extreme value (GEV) distribution. The DPC is faster than other clustering methods, and it provides more stable results. However, it is difficult to determine the optimal number of clustering centers automatically without visualization. So, GEV is used to determine the suitable threshold value to find the optimal number of clustering centers that lead to improving the segmentation process. The proposed model is applied for a set of twelve COVID-19 CT images. Also, it was compared with traditional k-means and DPC algorithms, and it has better performance using several measures, such as PSNR, SSIM, and Entropy.

Introduction

Coronavirus (COVID-19) that first reported in December 2019, in Wuhan, China, has been spread to more than 200 countries and regions. It could be transmitted through the respiratory droplets and the contact [1, 2]. Diagnosing COVID-19 is a critical challenge for health organizations that must be accurately and efficiently implemented to make necessary plans [3]. The real-time polymerase chain reaction (RT-PCR) can be used to diagnose COVID-19, but it is a time-consuming test; also, it may suffer from false-negative diagnosing [4, 5]. Therefore, medical imaging, such as chest X-ray and chest Computed Tomography (CT) can be used efficiently for diagnosing COVID-19.

Image segmentation is considered as an important key for analyzing medical images. Its main goal is to distinguish the region of interest (ROI) from the area of outside. Moreover, it also enables to extract important features, for example, texture, and shape of tissues [68]. Recent advances in the field of medical imaging show that medical images can be heavily used in many medical procedures. Therefore, huge numbers of medical images are generated every day. With this massive volume of images, it is a big challenge for analyzing and diagnosing since manual segmentation requires more time; more so, it may not meet the demand of analyzing big images data.

To this end, creating automatic methods for medical image segmentation is an important and urgent issue. Therefore, in recent decades, many efforts have been made by researchers to propose various medical image segmentation methods using various technologies, for example, region-based methods, clustering methods, threshold algorithms, machine learning, deep learning techniques, and others. The segmentation of Computed Tomography (CT) images is a critical step in Computer-Aided Diagnosis (CADx) systems. Therefore, many studies have been proposed, such as Dev et al. [9] proposed a lung cancer detection from DICOM CT images using the support vector machine (SVM) algorithm. The tested images could be classified as cancerous or non-cancerous. Shakeel et al. [10] applied a profuse clustering technique (PCT) to segment lung CT images and then employed a deep learning model to detect lung cancers from the tested CT images. Medeiros et al. [11] presented a segmentation method based on active contour method (ACM) with fuzzy border detector to segment lung CT images. Wang et al. [12] presented CT image segmentation method based on adaptive fully dense(AFD) neural network. Their proposed method had been evaluated using CT images of liver cancer. More so, they showed that this method could successfully segment CT images with complex boundaries. Sousa et al. [13] proposed an automatic CT images segmentation method for lung and trachea. Their proposed method, called ALTIS showed good performance in detecting abnormal structures in CT images. Ye et al. [14] proposed a heart CT image segmentation method using multi-depth fusion network. Sun et al. [15] proposed convolutional neural networks (CNN) model to classify CT images, moreover, to segment eyes, and the surrounding organs. Li et al. [16] utilized the power of blockchain technology for medical image segmentation. Paulraj et al. [17] proposed a possibilistic fuzzy C‐means method for lung CT images segmentation. Han et al. [18] used generative adversarial networks (GANs) for object detection in lung nodules.

Chen et al. [19] proposed a dictionary-based method for automatically segment 3D CT images of pathological lungs. Shariaty et al. [20] used a thresholding algorithm to segment lung CT images. Day et al. [21] proposed a lung segmentation approach to identify lung diseases using CT images. They used an enhanced graph cuts algorithm and Gaussian mixture model (GMM). Swierczynski et al. [22] proposed a mathematical model for lung CT image segmentation. The proposed level-set formulation combines active dense displacement estimation with Chan–Vese segmentation. Sousa et al. [13] presented a segmentation method called ALTIS to segment lung and trachea in CT images.

Among all the mentioned methods, deep learning approaches have received wide popularity because of their notable performance in image segmentation. However, these methods require extensive training using many images [23], and this may cause a problem for some applications that have only limited images. Therefore, unsupervised methods, such as clustering, are preferable since they do not require more images for training. There are several types of clustering segmentation methods used for medical images, for example, fuzzy C-means [24], density-based clustering [25], and K-means [26].

According to Tao et al. [27] Chest CT is more sensitive to diagnose COVID-19 comparing to RT-PCR (initial reverse-transcription polymerase chain reaction). Therefore, in this paper, we propose a clustering method to segment chest CT images of infected people of COVID-19.

In this study, we apply density peaks clustering (DPC) [28, 29] based on generalized extreme value distribution to deal with chest CT scans of COVID-19. Based on visual selection rule of density peaks clustering and following [30], the clustering point has a higher density than other points with a relative large distance between each of them. Moreover, the measure that used to determine the clustering center is approximately the generalized extreme value (GEV) distribution [31]. Whereas, the upper quantile of GEV is used to detect the clustering is higher. The main motivation to combine the DPC and GEV is to benfit from the strength of DPC that avoids the limitations of iteration clustering methods. In addition, using GEV to determine the optimal number of clustering in automatic form.

The contributions of this study are as follows:

  • Present an image segmentation model to segment COVID-19 CT images using a density peaks clustering based on generalized extreme value distribution.
  • The proposed model was evaluated with a set of twelve CT images of COVID-19 collecting form different datasets.
  • To evaluate our model, we compared it with density peaks clustering and k-means clustering methods, and it showed better performances.

Materials and methods

Density peaks clustering

In this section, the basic concepts of clustering by finding the peaks of density (DPC) algorithm is introduced [28]. In general, the main hypothesis in DPC is assumed that centers of clusters have a higher density than their neighbors, as well as, the distance between those centers is large.

Considering the dataset is given by X = [x1, x2, …, xn] has n samples. The local density ρi of xi can be computed as: (1) where dij is the distance between xi and xj, while dc refers to the cut-off distance. ξ represents the kernel function and it is defined as: (2) Moreover, the minimum distance between xi and other points of higher ρ is represented by δi and it is defined as: (3)

The points that have large δ and high ρ are considered as clustering centers. However, each of the rest points is assigned to the nearest center. According to these behaviors, the DPC algorithm is faster than other clustering methods that need more iterations to find the optimal cluster centers.

In some cases, the class may have two high-density points with a small distance between each them, and to avoid splitting the class into small sub-classes, there is another measure that is used which consider both ρ and δ together and it is defined as: (4) where ρ* and δ* refer to the normalization of ρ* and δ*, respectively and they are formulated as: (5) The clustering centers have θ higher than other points.

Generalized extreme value distribution

This section presented the mathematical notation of the generalized extreme value (GEV) distribution [31]. In general, the GEV is considered as a generalized family of the Gumbel, Fréchet and Weibull using single parameter and is defined as: (6) where μ, σ and k represent the location, scale and shape parameter, respectively. The maximum likelihood estimation is used to estimate these parameters which defined as: (7) For determining the MLEs of the parameters (μ, σ, k) we can for any given data set the maximization is straightforward using standard numerical optimization algorithms for solving the following equations: (8) (9) (10) Where in (7).

Thereafter, using this estimation to obtained the quantile can be defined as: (11) where p represents the probability of quantile. Therefore, the xi is considered as a clustering center when the following condition is satisfied. (12)

Proposed COVID-19 image segmentation model

In this section, the proposed model that used to tackle the problem of segmented the COVID-19 image using the density peak clustering based on generalized extreme value is introduced. The proposed model starts by reading the image and computing the value of ρ and δ using Eqs (1) and (3), respectively. Thereafter compute the value of θ using Eq (4) and using the maximum likelihood method to estimate the parameters of GEV using θ as input for it. Followed by applying Eq (12) to determine the clustering centers and determining the cluster for each other points. In the case, the distance between cluster center and current point is less than δi then assigned the current point to the cluster center. The steps of the proposed model are given in Fig 1.

thumbnail
Fig 1. Steps of proposed COVID-19 image segmentation method.

https://doi.org/10.1371/journal.pone.0244416.g001

Experiment and results

Dataset

To assess the quality of the segmentation method for COVID-19 CT images, a set of twelve image is used from [32]. These images are collected from different datasets such as CheX aka CheXpert [33], OpenI [34], Google [35], PC aka PadChest [36], NIH aka Chest X-ray14 [37], and MIMIC-CXR [38]. The images are resized to 224x224 pixels [32]. Fig 2 depicts the sample of the tested image which contains twelve COVID-19 images.

Performance measure of segmentation

Three measures are used to assess the performance of all algorithms to evaluate the quality of the segmentation process. These measures are peak signal-to-noise ratio (PSNR) [39] as in Eq (13), the structural similarity index (SSIM) as in Eq (14) [40], and entropy as in Eq 15. (13) where I and Is determine the image and its segmented version, respectively at the size M × Q. (14) where μI and determine the average intensity of the I and Is, respectively. σI and determine the standard deviation values for the I and Is, respectively. Covariance of I and Is is presented by . c1 is set to 6.5025 and c2 is set to 58.52252.

Moreover, the entropy of a discrete random variable is used to assess the quality of segmentation, and it is defined as: (15) where Prob is a probability mass function.

Results and discussion

This section shows the results of the proposed methods against the classical algorithm density peaks clustering (DPC) and K-means algorithm; these algorithms are widely used for processing medical images and clustering fields. The comparison uses three measures: PSNR, SSIM, and entropy for evaluating the algorithms using 12 images. Tables 1 and 2 and Fig 3 record these results. Table 1 depicts the number of clusters obtained by each method. To assess these obtained cluster centers we used PSNR, SSIM, and entropy.

From Table 2 that can be seen, the proposed method obtained the best PSNR results in 10 out of 12 images. In spite of the K-means obtained the best PSNR in two images, it is ranked last after DPC because it attained the better PSNR in 7 images in comparison with K-means, as shown in Fig 3.

In terms of the SSIM measure, the proposed method achieved the highest SSIM value in 11 out of 12 images, followed by DPC and K-means, respectively. That means, the proposed method can get the highest similarity with the original images than the other algorithms. As in Fig 3, the proposed method reached 89% of SSIM while the DPC and K-means reached 82% and 76%, respectively.

Regarding the entropy measure, the proposed method has higher image entropy than DPC and K-means algorithms. It outperformed them in 10 out of 12 images that lead to the best segmentation results. The rest of the algorithms are ranked as follows; the DPC reached the second rank while the K-means is ranked last.

Figs 4 and 5 shows the original images and the segmented results of the proposed method, DPC, and K-means. To display all images, the images are split into figures. From these figures, we can see that the proposed method produced better segmentation results in most of the images. These results indicate that the proposed method can efficiently segment the chest CT images with COVID-19.

thumbnail
Fig 4. The segmented image of Im1 to Im6 based on the obtained results by (v)proposed method, (w) DPC, and (x) K-means.

https://doi.org/10.1371/journal.pone.0244416.g004

thumbnail
Fig 5. Original images Im7 to Im12 and their and the segmented results of the proposed method, DPC, and K-means.

https://doi.org/10.1371/journal.pone.0244416.g005

From the previous analysis, it has been observed that the performance of the proposed model is better than the other two models. However, there are some limitations that affect its quality, such as processing time may be increased with increasing the size of a given image due to computing the pair-wise distance between the pixel of images.

Conclusion

Analyzing medical images is very important for diagnosing diseases, and there are preliminary steps that needed to be implemented in image analysis process, such as image segmentation. The main work of segmentation methods in medical images is to find the region of interest (ROI) and to help in distinguishing it from outside regions. With the pandemic of COVID-19, it is necessary to find efficient segmentation methods that may help in improving the diagnosing process. Therefore, this paper proposes an efficient segmentation method for COVID-19 CT images. The proposed method uses density peaks clustering depending on generalized extreme value distribution. To test the performance of the proposed method, a set of twelve images of COVID-19 CT scans is used. The proposed method was compared to DPC and K-means clustering methods, and it showed better performances in terms of PSNR, SSIM, and entropy.

Acknowledgments

We would like thank the editor and reviewers for their constructive comments and suggestions, which improved the manuscript quality.

References

  1. 1. Xia W, Shao J, Guo Y, Peng X, Li Z, Hu D. Clinical and CT features in pediatric patients with COVID-19 infection: Different points from adults. Pediatric pulmonology. 2020;. pmid:32134205
  2. 2. Al-Qaness MA, Ewees AA, Fan H, Abd El Aziz M. Optimization method for forecasting confirmed cases of COVID-19 in China. Journal of Clinical Medicine. 2020;9(3):674.
  3. 3. Al-Qaness MA, Ewees AA, Fan H, Abualigah L, Abd Elaziz M. Marine Predators Algorithm for Forecasting Confirmed Cases of COVID-19 in Italy, USA, Iran and Korea. International Journal of Environmental Research and Public Health. 2020;17(10):3520.
  4. 4. Huang P, Liu T, Huang L, Liu H, Lei M, Xu W, et al. Use of chest CT in combination with negative RT-PCR assay for the 2019 novel coronavirus but high clinical suspicion. Radiology. 2020;295(1):22–23. pmid:32049600
  5. 5. Abd Elaziz M, Ewees AA, Yousri D, Alwerfali HSN, Awad QA, Lu S, et al. An Improved Marine Predators Algorithm With Fuzzy Entropy for Multi-Level Thresholding: Real World Example of COVID-19 CT Image Segmentation. IEEE Access. 2020;8:125306–125330.
  6. 6. Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, Bengio Y, et al. Brain tumor segmentation with deep neural networks. Medical image analysis. 2017;35:18–31. pmid:27310171
  7. 7. Zhu H, He H, Xu J, Fang Q, Wang W. Medical Image Segmentation Using Fruit Fly Optimization and Density Peaks Clustering. Computational and mathematical methods in medicine. 2018;2018.
  8. 8. Abd Elaziz M, Ewees AA, Oliva D. Hyper-heuristic method for multilevel thresholding image segmentation. Expert Systems with Applications. 2020;146:113201.
  9. 9. Dev C, Kumar K, Palathil A, Anjali T, Panicker V. Machine Learning Based Approach for Detection of Lung Cancer in DICOM CT Image. In: Ambient Communications and Computer Systems. Springer; 2019. p. 161–173.
  10. 10. Shakeel PM, Burhanuddin M, Desa MI. Lung cancer detection from CT image using improved profuse clustering and deep learning instantaneously trained neural networks. Measurement. 2019;145:702–712.
  11. 11. Medeiros AG, Guimarães MT, Peixoto SA, Santos LdO, da Silva Barros AC, Rebouças EdS, et al. A new fast morphological geodesic active contour method for lung CT image segmentation. Measurement. 2019;148:106687.
  12. 12. Wang EK, Chen CM, Hassan MM, Almogren A. A deep learning based medical image segmentation technique in Internet-of-Medical-Things domain. Future Generation Computer Systems. 2020;108:135–144.
  13. 13. Sousa AM, Martins SB, Falcão AX, Reis F, Bagatin E, Irion K. ALTIS: A fast and automatic lung and trachea CT-image segmentation method. Medical physics. 2019;46(11):4970–4982.
  14. 14. Ye C, Wang W, Zhang S, Wang K. Multi-depth fusion network for wholeheart CT image segmentation. IEEE Access. 2019;7:23421–23429.
  15. 15. Sun Y, Shi H, Zhang S, Wang P, Zhao W, Zhou X, et al. Accurate and rapid CT image segmentation of the eyes and surrounding organs for precise radiotherapy. Medical physics. 2019;46(5):2214–2222. pmid:30815885
  16. 16. Li B, Chenli C, Xu X, Jung T, Shi Y. Exploiting computation power of blockchain for biomedical image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; 2019. p. 0-0.
  17. 17. Paulraj T, Chelliah KSV, Chinnasamy S. Lung computed axial tomography image segmentation using possibilistic fuzzy C-means approach for computer aided diagnosis system. International Journal of Imaging Systems and Technology. 2019;29(3):374–381.
  18. 18. Han C, Kitamura Y, Kudo A, Ichinose A, Rundo L, Furukawa Y, et al. Synthesizing diverse lung nodules wherever massively: 3D multi-conditional GAN-based CT image augmentation for object detection. In: 2019 International Conference on 3D Vision (3DV). IEEE; 2019. p. 729-737.
  19. 19. Chen G, Xiang D, Zhang B, Tian H, Yang X, Shi F, et al. Automatic pathological lung segmentation in low-dose CT image using eigenspace sparse shape composition. IEEE transactions on medical imaging. 2019;38(7):1736–1749. pmid:30605097
  20. 20. Shariaty F, Hosseinlou S, Rud VY. Automatic lung segmentation method in computed tomography scans. In: Journal of Physics: Conference Series. vol. 1236. IOP Publishing; 2019. p. 012028.
  21. 21. Dai S, Lu K, Dong J, Zhang Y, Chen Y. A novel approach of lung segmentation on chest CT images using graph cuts. Neurocomputing. 2015;168:799–807.
  22. 22. Swierczynski P, Papież BW, Schnabel JA, Macdonald C. A level-set approach to joint image segmentation and registration with application to CT lung imaging. Computerized Medical Imaging and Graphics. 2018;65:58–68.
  23. 23. Valente IRS, Cortez PC, Neto EC, Soares JM, de Albuquerque VHC, Tavares JMR. Automatic 3D pulmonary nodule detection in CT images: a survey. Computer methods and programs in biomedicine. 2016;124:91–107.
  24. 24. Vishnuvarthanan A, Rajasekaran MP, Govindaraj V, Zhang Y, Thiyagarajan A. An automated hybrid approach using clustering and nature inspired optimization technique for improved tumor and tissue segmentation in magnetic resonance brain images. Applied Soft Computing. 2017;57:399–426.
  25. 25. Zhang W, Zhang X, Zhao J, Qiang Y, Tian Q, Tang X. A segmentation method for lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise. PloS one. 2017;12(9). pmid:28880916
  26. 26. Wu J, Gensheimer MF, Dong X, Rubin DL, Napel S, Diehn M, et al. Robust intratumor partitioning to identify high-risk subregions in lung cancer: a pilot study. International Journal of Radiation Oncology* Biology* Physics. 2016;95(5):1504–1512. pmid:27212196
  27. 27. Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, et al. Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology. 2020; p. 200642. pmid:32101510
  28. 28. Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science. 2014;344(6191):1492–1496.
  29. 29. Mehmood R, Zhang G, Bie R, Dawood H, Ahmad H. Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing. 2016;208:210–217.
  30. 30. Ding J, He X, Yuan J, Jiang B. Automatic clustering based on density peak detection using generalized extreme value distribution. Soft Computing. 2018;22(9):2777–2796.
  31. 31. Kotz S, Nadarajah S. Extreme value distributions: theory and applications. World Scientific; 2000.
  32. 32. Cohen JP, Morrison P, Dao L. COVID-19 image data collection. arXiv 200311597. 2020;.
  33. 33. Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea-Ilcus S, Chute C, et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 33; 2019. p. 590-597.
  34. 34. Demner-Fushman D, Kohli MD, Rosenman MB, Shooshan SE, Rodriguez L, Antani S, et al. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association. 2016;23(2):304–310. pmid:26133894
  35. 35. Majkowska A, Mittal S, Steiner DF, Reicher JJ, McKinney SM, Duggan GE, et al. Chest radiograph interpretation with deep learning models: Assessment with radiologist-adjudicated reference standards and population-adjusted evaluation. Radiology. 2020;294(2):421–431. pmid:31793848
  36. 36. Bustos A, Pertusa A, Salinas JM, de la Iglesia-Vayá M. Padchest: A large chest x-ray image dataset with multi-label annotated reports. arXiv preprint arXiv:190107441. 2019;.
  37. 37. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 2097-2106.
  38. 38. Johnson AE, Pollard TJ, Berkowitz S, Greenbaum NR, Lungren MP, Deng Cy, et al. MIMIC-CXR: A large publicly available database of labeled chest radiographs. arXiv preprint arXiv:190107042. 2019;1(2).
  39. 39. Yin PY. Multilevel minimum cross entropy threshold selection based on particle swarm optimization. Applied mathematics and computation. 2007;184(2):503–513.
  40. 40. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error measurement to structural similarity. IEEE Transactions on Image Processing. 2004;13(4):600–612.