Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A machine learning approach identifies distinct early-symptom cluster phenotypes which correlate with hospitalization, failure to return to activities, and prolonged COVID-19 symptoms

  • Nusrat J. Epsi,

    Roles Data curation, Investigation, Methodology, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America

  • John H. Powers,

    Roles Supervision

    Affiliation Clinical Research Directorate, Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America

  • David A. Lindholm,

    Roles Supervision

    Affiliations Molecular Biology Laboratory, Brooke Army Medical Center, San Antonio, Texas, United States of America, Department of Medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America

  • Katrin Mende,

    Roles Resources

    Affiliations Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America, Molecular Biology Laboratory, Brooke Army Medical Center, San Antonio, Texas, United States of America

  • Allison Malloy,

    Roles Resources

    Affiliation Department of Pediatrics, Walter Reed National Military Medical Center, Bethesda, Maryland, United States of America

  • Anuradha Ganesan,

    Roles Resources

    Affiliations Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America, Infectious Disease Clinic, Walter Reed National Military Medical Center, Bethesda, Maryland, United States of America

  • Nikhil Huprikar,

    Roles Resources

    Affiliation Infectious Disease Clinic, Walter Reed National Military Medical Center, Bethesda, Maryland, United States of America

  • Tahaniyat Lalani,

    Roles Resources

    Affiliations Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America, Infectious Disease Clinical Research Program, Naval Medical Center Portsmouth, Portsmouth, Virginia, United States of America

  • Alfred Smith,

    Roles Resources

    Affiliation Infectious Disease Clinical Research Program, Naval Medical Center Portsmouth, Portsmouth, Virginia, United States of America

  • Rupal M. Mody,

    Roles Resources

    Affiliation Infectious Disease Clinic, William Beaumont Army Medical Center, El Paso, Texas, United States of America

  • Milissa U. Jones,

    Roles Resources

    Affiliation Pediatric Infectious Diseases, Tripler Army Medical Center, Honolulu, Hawaii, United States of America

  • Samantha E. Bazan,

    Roles Resources

    Affiliation Family Nurse Practitioner and Women’s Health Nurse Practitioner Program, Carl R. Darnall Army Medical Center, Fort Hood, Texas, United States of America

  • Rhonda E. Colombo,

    Roles Resources

    Affiliations Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America, Department of Medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Infectious Disease Clinic, Madigan Army Medical Center, Tacoma, Washington, United States of America

  • Christopher J. Colombo,

    Roles Resources

    Affiliation Infectious Disease Clinic, Madigan Army Medical Center, Tacoma, Washington, United States of America

  • Evan C. Ewers,

    Roles Resources

    Affiliation Internal Medicine, Fort Belvoir Community Hospital, Fort Belvoir, Virginia, United States of America

  • Derek T. Larson,

    Roles Resources

    Affiliations Department of Medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Internal Medicine, Fort Belvoir Community Hospital, Fort Belvoir, Virginia, United States of America, Infectious Disease Clinic, Naval Medical Center San Diego, San Diego, California, United States of America

  • Catherine M. Berjohn,

    Roles Resources

    Affiliations Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Department of Medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Infectious Disease Clinic, Naval Medical Center San Diego, San Diego, California, United States of America

  • Carlos J. Maldonado,

    Roles Resources

    Affiliation Department of Research and Clinical Investigation, Womack Army Medical Center, Fort Bragg, North Carolina, United States of America

  • Paul W. Blair,

    Roles Resources

    Affiliations Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America, Department of Pathology, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America

  • Josh Chenoweth,

    Roles Resources

    Affiliation Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America

  • David L. Saunders,

    Roles Resources

    Affiliation Translational Medicine Unit, Department of Medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America

  • Jeffrey Livezey,

    Roles Resources

    Affiliation Translational Medicine Unit, Department of Medicine, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America

  • Ryan C. Maves,

    Roles Resources

    Affiliation Infectious Diseases and Critical Care Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America

  • Margaret Sanchez Edwards,

    Roles Resources

    Affiliations Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America

  • Julia S. Rozman,

    Roles Project administration

    Affiliations Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America

  • Mark P. Simons,

    Roles Funding acquisition, Project administration, Supervision

    Affiliation Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America

  • David R. Tribble,

    Roles Funding acquisition, Project administration

    Affiliation Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America

  • Brian K. Agan,

    Roles Funding acquisition, Project administration

    Affiliations Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America

  • Timothy H. Burgess,

    Roles Funding acquisition, Project administration

    Affiliation Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America

  • Simon D. Pollett,

    Roles Investigation, Methodology, Project administration, Resources, Supervision, Writing – review & editing

    Affiliations Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America

  •  [ ... ],
  • for the EPICC COVID-19 Cohort Study Group

    srichard@idcrp.org

    Membership of the EPICC COVID-19 Cohort Study Group is provided in the Acknowledgments.

    Affiliations Infectious Disease Clinical Research Program, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, Maryland, United States of America

  • [ view all ]
  • [ view less ]

Abstract

Background

Accurate COVID-19 prognosis is a critical aspect of acute and long-term clinical management. We identified discrete clusters of early stage-symptoms which may delineate groups with distinct disease severity phenotypes, including risk of developing long-term symptoms and associated inflammatory profiles.

Methods

1,273 SARS-CoV-2 positive U.S. Military Health System beneficiaries with quantitative symptom scores (FLU-PRO Plus) were included in this analysis. We employed machine-learning approaches to identify symptom clusters and compared risk of hospitalization, long-term symptoms, as well as peak CRP and IL-6 concentrations.

Results

We identified three distinct clusters of participants based on their FLU-PRO Plus symptoms: cluster 1 (“Nasal cluster”) is highly correlated with reporting runny/stuffy nose and sneezing, cluster 2 (“Sensory cluster”) is highly correlated with loss of smell or taste, and cluster 3 (“Respiratory/Systemic cluster”) is highly correlated with the respiratory (cough, trouble breathing, among others) and systemic (body aches, chills, among others) domain symptoms. Participants in the Respiratory/Systemic cluster were twice as likely as those in the Nasal cluster to have been hospitalized, and 1.5 times as likely to report that they had not returned-to-activities, which remained significant after controlling for confounding covariates (P < 0.01). Respiratory/Systemic and Sensory clusters were more likely to have symptoms at six-months post-symptom-onset (P = 0.03). We observed higher peak CRP and IL-6 in the Respiratory/Systemic cluster (P < 0.01).

Conclusions

We identified early symptom profiles potentially associated with hospitalization, return-to-activities, long-term symptoms, and inflammatory profiles. These findings may assist in patient prognosis, including prediction of long COVID risk.

Introduction

The SARS-CoV-2 pandemic continues to burden the healthcare system, and the clinical spectrum of Coronavirus disease 2019 (COVID-19) ranges from asymptomatic to severe or critical illness [1]. Older age and medical comorbidities have been associated with a higher risk for severe COVID-19 outcomes [13]. In addition to variability in acute illness severity, duration of illness can vary across individuals, with many recovering within several weeks, and others reporting symptoms for months, a phenomenon termed Post-COVID conditions (PCC, or “Long COVID”) [4]. Individuals with Long COVID exhibit a wide variety of symptoms, including loss of sense of taste and smell, fatigue, dyspnea, arthralgia, chest pain, myalgia, and cough [511].

Predicting such long-term outcomes after an initial COVID-19 illness is a priority. Due to the broad case definition, inconsistent self-reporting, and the non-specific nature of frequently observed symptoms, predictions based on acute clinical presentation remains elusive [12]. Emerging research has focused on early biomarker signatures, including immune responses and acute imaging [1315], but these are not widely accessible in routine care. Moreover, they are often focused on populations requiring hospitalization for COVID-19, rather than patients who are treated in outpatient settings but nonetheless still carry a risk of long-term sequelae even after vaccination [16]. While some studies have identified acute symptom clusters [1719], they have not yet fully explored the biological disease phenotype association with symptoms within each cluster, and/or have focused on severe acute COVID-19 rather than chronic outcomes [17, 20]. Further, acute symptom data used in prognostic studies are often not measured using validated patient symptom scoring systems, which are critical given the inherent subjectivity and variability in elicited symptoms [1820].

In this work, we sought to group early COVID-19 symptoms using machine learning techniques and describe the relationships between those acute symptom groups and short- and long-term clinical outcomes of SARS-CoV-2 infection. We then compared inflammatory biomarkers among these clusters to further characterize their biological significance. We utilized InFLUenza Patient-Reported Outcome (FLU-PRO) Plus [12, 21], which is a standardized instrument designed to characterize the frequency, intensity, and duration of symptoms in viral respiratory infections, when administered serially over time. Specifically, we sought to (i) identify infected individuals who exhibit similar acute symptoms using machine learning methods and delineate symptom-based acute phenotypes with precision, (ii) evaluate the relationship between these acute symptom clusters and acute COVID-19 hospitalization status, (iii) evaluate the relationship between acute symptom cluster and reported return to usual activities and health, and (iv) evaluate the relationship between acute symptom clusters and COVID-19 symptoms at six months post-symptom onset, Finally, we (v) explored whether patients in different acute symptom clusters have different host inflammatory responses.

Materials and methods

Study population and general study design

The Epidemiology, Immunology, and Clinical Characteristics of Emerging Infectious Diseases with Pandemic Potential (EPICC) study is a longitudinal cohort study of U.S. Military Health System (MHS) beneficiaries designed to examine the clinical severity and long-term outcomes of SARS-CoV-2 infection [22]. Briefly, MHS beneficiaries presenting to one of ten participating military treatment facilities (MTFs) with confirmed COVID-19, a COVID-19-like illness, or a high-risk exposure to someone with COVID-19 were eligible for enrollment in EPICC. Later in the study, eligibility expanded to include an online component, in which individuals who were tested for or vaccinated against SARS-CoV-2 could also enroll. The participants included in this analysis were adults enrolled between March 20, 2020, and March 31, 2022, tested positive for SARS-CoV-2, and filled out at least one FLU-PRO Plus survey. We calculated the Charlson Comorbidity Index (CCI) [23] using documented health encounters in the MHS Data Repository (MDR). Body mass index (BMI) was calculated using height and weight values collected in the surveys and from the MDR and categorized as normal/underweight (≤24.9 kg/m2), overweight (25–29.9 kg/m2), obese (30–34.9 kg/m2), and severely obese (≥35 kg/m2). Age, sex, and race/ethnicity were reported by the participant. COVID-19 hospitalization was determined based on survey responses reporting hospitalization due to COVID-19 and case report forms filled out by study staff.

FLU-PRO© Plus

The FLU-PRO© instrument was originally developed to assess influenza-like symptoms [24], and has since been evaluated for use in other respiratory infections [25, 26]. The original FLU-PRO instrument has been updated to include loss of sense of smell and taste (FLU-PRO Plus), and was found to have high reliability and construct validity for use in SARS-CoV-2 studies [12]. The FLU-PRO Plus instrument was implemented only in those enrolled at an MTF (not in online participants) and includes questions about 34 symptoms that provide a direct measure of the presence and severity of symptoms across seven body systems termed “domains” (Nose, Throat, Eyes, Chest/Respiratory, Gastrointestinal, Body/Systemic, and Senses) using a 5-point ordinal severity scale. The responses ranged from “Not at all” to “Very much” for most questions, and “Never” to “Always” for sneezing, coughing, and coughed up mucus or phlegm, and the number of times (0 to 4 or more) for vomiting and diarrhea. Domain-specific scores, as well as a total score, are calculated using the mean of all symptoms within the domain/total. Participants completed the FLU-PRO Plus survey daily for 14 days, and we used the earliest response reported by each participant for this analysis. The FLU-PRO Plus survey also includes questions about whether the participant has returned to their usual health or activities, and for these questions, we considered whether they reported that they had returned to their usual health or activities at the time of their last FLU-PRO Plus.

Diagnosis of SARS-CoV-2 infection and genotype

In EPICC, SARS-CoV-2 infection was determined by one of the following criteria: a positive clinical PCR test, PCR positive swab collected as part of the EPICC study, or report of a positive test by the participant. The PCR assay used in this study for study-collected swabs was the SARS-CoV-2 (2019-nCoV) Centers for Disease Control and Prevention (CDC) quantitative PCR (qPCR) Probe Assay (IDT, Coralville, IA). Nucleocapsid (N) genes (N1 and N2) were targeted in this assay, human RNase P gene (Rp) acts as a specimen control, and a cycle threshold (CT) less than 40 for N1 and N2 protein was considered positive for SARS-CoV-2 infection. To determine genotype, whole-genome sequencing of SARS-CoV-2 using 1200 bp tiled amplicons was applied to upper respiratory swabs [27]. Illumina Nextera®XT DNA Library Preparation Kit was used to prepare amplified product and the Pangolin lineage assignment tool was utilized to classify the genotype [28].

Measurement of CRP and IL-6 levels

C-reactive protein (CRP) and Interleukin-6 (IL-6) were measured in the plasma samples using the high dynamic range enzyme-linked immunosorbent assay (ELISA) microfluidics analyzer (ProteinSimple, San Jose, California, USA). CRP and IL-6 were log10 transformed and empirical Bayes frameworks were used to adjust data for batch effects. Data imputation was done using the k-nearest neighbour algorithm [29]. For this analysis we included plasma samples within 21 days post-symptom onset.

Ascertainment and definitions of COVID-19 vaccination group

We obtained vaccination details from MDR records, case report forms, and surveys [30]. We identified participants as fully vaccinated if 14 or more days had passed since their second dose of an mRNA vaccine series (Pfizer/BioNTech-BNT162b2, Moderna mRNA‐1273). Vaccine breakthroughs were identified if a participant tested positive for SARS-CoV-2 14 or more days after the final vaccine dose of the series.

Identification of prolonged COVID-19 symptoms

Along with the FLU-PRO Plus survey, EPICC participants were also requested to complete online follow-up surveys that included questions about presence and severity of ongoing symptoms at 1, 3, 6, 9, and 12 months. Participants who responded at approximately six months post-symptom onset (135–224 days) were included in the prolonged COVID-19 symptoms analysis (S1 Fig in S1 File). If the participant reported continuing symptoms on the follow-up survey, they were asked about specific symptoms (cough, dyspnea (difficulty breathing/shortness of breath), exercise intolerance, loss of sense of smell and/or taste, joint pain, fatigue, headache, etc.) and asked to rate the severity of those symptoms (none, mild, moderate, severe, and critical). Participants who reported any moderate to critical symptoms at six-months post-symptom onset were considered to have prolonged COVID-19 symptoms in this analysis.

Cluster analysis

Rather than grouping symptoms based on a priori assumptions, we used machine learning clustering of FLU-PRO Plus responses to identify symptom patterns. Optimal clustering can be a subjective process which is dependent on the characteristics used for determining patterns of commonality and dissimilarity. We applied principal component analysis [31], which performs a linear mapping of the data to a lower dimension space in such a way that the variance of the data in the low-dimensional representation is maximized. It does so by calculating eigenvectors from the covariance matrix. The eigenvectors that correspond to the largest principal components are used to reconstruct a significant fraction of the variance of the original data. As an added benefit, each of the new features or components created after PCA are all independent of one another. Therefore to visualize the pattern with much greater granularity we applied unsupervised machine learning algorithm K-means [32] to view the top PCA components. To do that, first, we determine the number of clusters k by using statistical testing method gap statistics [33]. This method compares observed data and reference data with a random uniform distribution and identifies clusters by choosing the value that maximizes the gap. Maximum gap value signifies that the clustering structure is far away from the random uniform distribution of the data points. Therefore, PCA followed by K-means helps to identify groups with distinct clusters of symptoms. This algorithm clusters subjects so that symptoms that are highly correlated are clustered together. We further characterized the clusters using the mean domain response [24], and annotated each cluster with the predominant domain.

Adjusted comparisons of clusters and acute and long-term outcomes

Univariable Poisson regression was performed to evaluate whether the identified clusters and other independent variables were associated with the outcomes of COVID-19, including hospitalization, return to usual activities, return to usual health, and prolonged COVID-19 symptoms. Multivariable Poisson regression was performed for each of the outcomes, adjusting for other factors including age group, sex, race/ethnicity, obesity, CCI category, vaccine receipt, and days post-symptom onset of the first FLU-PRO Plus survey. Adjusted risk ratios (aRR) and 95% confidence limits (CIs) were calculated. Model fit was estimated by the Akaike information criterion (AIC) and Bayesian information criterion (BIC).

We performed unadjusted and adjusted linear regression to quantify the relationship between identified clusters and participants’ acute plasma inflammatory biomarker CRP and IL-6 levels. These models considered other potential predictors of clusters, including specimen sampling time since symptom onset, sex, age group, race/ethnicity, obesity, CCI category, and vaccine receipt.

Statistical analysis

Descriptive statistics were calculated for the demographic characteristics, CCI category, BMI category, vaccination status, variants, vaccine receipt, return to usual activities, return to usual health, and days post-symptom onset, with P values computed using Fisher’s exact test. Figures were generated and statistical analyses were performed in RStudio version 4·0·2 [34].

Ethics

This study was approved by the Uniformed Services University of the Health Sciences (USUHS) Institutional Review Board (IRB) under protocol IDCRP-085 [22]; all participants or their legally authorized representative provided informed consent to participate.

Results

Demographic and clinical characteristics

Among 2552 participants enrolled in EPICC at an MTF from March 2020 through March 2022, 2407 were SARS-CoV-2 positive, and 1273 responded to their first FLU-PRO Plus survey within 21 days post-symptom onset (S1 Fig and S1 Table in S1 File). The responder study sample was predominately male (58.3%), 18–44 years of age (60.2%), and had no comorbidities at enrollment (62.3% had a CCI score of zero, Table 1). About one in five participants in this analysis were hospitalized due to COVID-19 (19.1%). In the non-responder group (those who did not filled out FLU-PRO Plus survey, N = 433), which was excluded from the analysis, consisted of approximately 20% children, 29.3% hospitalized individuals, and 13.7% tested negative for SARS-CoV-2. Only 2 participants reported moderate to severe symptoms at 6 months, and overall, the group dominantly exhibits dependent (42.0%) than the responder group (S1 Table in S1 File).

thumbnail
Table 1. Clinical and demographic characteristics of 1273 military health system beneficiaries by early FLU-PRO symptom clusters.

https://doi.org/10.1371/journal.pone.0281272.t001

Acute COVID-19 symptoms group together in three distinct clusters

To define distinct early-stage symptom profiles, we employed principal component analysis followed by the k-means clustering technique (Fig 1A–1C). Cluster 1 exhibited a higher mean score of nasal symptoms (e.g., runny or stuffy nose), thus termed the ‘Nasal cluster’. Cluster 2 exhibited a higher mean score of sensory symptoms (e.g., loss of sense of smell or taste), thus is the ‘Sensory cluster’. Cluster 3 exhibited a higher mean score of respiratory (e.g., upper and lower respiratory) and systemic symptoms (e.g., body ache, chills), annotated as the ‘Respiratory/Systemic cluster’ (Fig 1D).

thumbnail
Fig 1.

(A) Principal component analysis depicting FLU-PRO Plus response, (B) Optimal number of clusters using Gap-statistics, (C) K-means clustering identified three distinct clusters of participants, (D) Heatmap depicting three distinct clusters (high values are in red and low values are in blue).

https://doi.org/10.1371/journal.pone.0281272.g001

The Respiratory/Systemic cluster of cases had a higher proportion of participants that were hospitalized (36.3%) and had more comorbidities (46.6% with CCI > 0) than the Nasal (hospitalized: 11.9%; CCI > 0: 39.5%) or Sensory (hospitalized: 10.5%; CCI >0: 28.1%) clusters (P < 0.01); those participants in the Sensory cluster appeared to be younger than the other clusters (70.2% were 18–44 years old), compared to Nasal (59.1%) and Respiratory/Systemic (50.1%) clusters (P < 0.01) (Table 1). Those with Nasal cluster symptom profiles corresponded with a higher proportion of Omicron variant infections (BA.1/BA.2) (13.3%) compared to Sensory (2.5%) and Respiratory/Systemic (4.5%) clusters. Nasal cluster acute symptom profiles were more likely to be associated with vaccine breakthrough cases (37.4%) compared to Sensory (20.4%) and Respiratory/Systemic (19.3%) clusters. Self-reported return to usual activities and health at their last FLU-PRO Plus survey was more common in those with Nasal cluster symptoms compared to the other clusters (Table 1). In the prolonged COVID-19 subset, the Nasal cluster had a lower proportion reporting symptoms at six months (3.8%) than the other two clusters (Sensory: 12%; Respiratory/Systemic: 11.2%) (Table 1).

Acute COVID-19 symptom profiles defined by machine learning are independently associated with hospitalization and long-term symptom persistence risk

The Respiratory/Systemic cluster was associated with more than a two-fold (aRR = 2.24 [95% CI: 1.61 to 3.12], P < 0.01) increased risk of hospitalization compared to the Nasal cluster, after controlling for sex, age group, race/ethnicity, CCI category, BMI category, vaccine receipt, and days post-symptom onset (Fig 2 and S2 Table in S1 File). Older age, CCI category, and BMI category were also independently associated with a higher risk of hospitalization, after adjusting for all variables simultaneously. Compared to study participants in the Nasal cluster, those in the Respiratory/Systemic were more likely to report that they had not yet returned to activities or usual health on their last day with FLU-PRO Plus data (Fig 2 and S3, S4 Tables in S1 File).

thumbnail
Fig 2. Multivariable Poisson regression model results from four distinct models: Disease severity (pink), failure to return to usual activities (blue), failure to return to usual health (yellow), and Long COVID (green).

Whiskers representing 95% confidence limits.

https://doi.org/10.1371/journal.pone.0281272.g002

Next, we evaluated the subset of 529 participants who filled out surveys at six-months post-symptom onset (S5 Table in S1 File). The most common symptoms reported at six months were fatigue (4.2%), loss of sense of smell or taste (4.0%), dyspnea (3.8%), and exercise intolerance (3.4%) (S6 Table in S1 File). We observed that those cases with acute symptom profiles belonging to the Sensory and Respiratory/Systemic clusters were more likely to report moderate to severe symptoms at six months than those belonging to the Nasal cluster (Sensory cluster: aRR = 2.86 [95% CI = 1.14 to 7.18], P = 0.03; Respiratory/Systemic cluster: aRR = 2.89 [95% CI = 1.12 to 7.44], P = 0.03) (S7 Table in S1 File).

Acute COVID-19 symptom profile clusters have distinct inflammatory profiles

CRP and IL-6 levels are found to be higher in the Respiratory/Systemic cluster compared to the Nasal cluster after adjusting for age, sex, race/ethnicity, CCI category, BMI category, vaccine receipt, and sampling time (P < 0.01). Participants who were obese and severely obese also had higher CRP and IL-6 levels compared with those with normal weight, after adjusting for sampling time (P < 0.01) (S8, S9 Tables in S1 File). The participants in the Respiratory/Systemic cluster have higher CRP and IL-6 levels than those in the Nasal and Sensory clusters (P < 0.01) (Fig 3).

thumbnail
Fig 3.

Comparison of inflammatory biomarker (A) CRP and (B) IL-6 in identified clusters. Statistical significance was determined by Wilcoxon rank sum test. Asterisks indicate statistical significance: ns: p > 0.05, *: p ≤ 0.05, **: p ≤ 0.01, ***: p ≤ 0.001, ****: p ≤ 0.0001.

https://doi.org/10.1371/journal.pone.0281272.g003

Discussion

Distinguishing and interpreting patterns of acute symptoms of COVID-19 is inherently challenging. In this study, we used machine learning techniques to identify groups with distinct early symptom profile clusters. Our results indicate that participants with early COVID-19 symptoms belonging to the Respiratory/Systemic cluster were more likely to be hospitalized than those in the Sensory and Nasal early-symptom clusters. The Respiratory/Systemic cluster was also independently associated with older age, more comorbidities, and obesity, factors which have been found to be associated with increased COVID-19 severity in the EPICC cohort and other studies [30, 3539]. Participants in the Respiratory/Systemic cluster were also more likely to report that they had not yet returned to usual health and activities at the end of the FLU-PRO Plus survey (Fig 2).

With growing concern about long-term symptoms associated with SARS-CoV-2 infection, and uncertainty on how to predict long term symptom persistence/PCC, we performed sub-analyses in participants who completed a (non-FLU-PRO) symptom survey at six months post-symptom onset. Cases in the early FLU-PRO Sensory and Respiratory/Systemic clusters reported more prolonged COVID-19 symptoms at six months than those in the Nasal cluster.

The mechanism for such longer term COVID-19 symptoms, and how acute symptom profiles predict such late sequelae, is unclear [4044]. One hypothesis may be that these late post-COVID-19 manifestations are associated with prolonged inflammation. Indeed, CRP and IL-6 serum levels were higher in participants in the Respiratory/Systemic cluster than those in the Sensory and Nasal clusters after adjusting for age, race/ethnicity, sex and sampling time, suggesting greater systemic inflammation in these patients. This would be consistent with the Respiratory/Systemic cluster also correlating with hospitalization risk, which in turn is well known to be associated with higher IL-6 and CRP [45, 46]. Additionally, acute presentation with predominantly nasal (“cold-like”) symptoms may represent less invasive and less severe disease in turn connoting a decreased risk of both severe COVID and Long COVID.

Our findings shows that the Nasal cluster type symptoms were more prominent among those infected with the Omicron (BA.1/BA.2) variant, consistent with recent studies suggesting that Omicron (BA.1) may have a greater tropism for the upper respiratory tract and putatively lower virulence [47] compared with prior variants [48, 49]. Omicron has also been found to replicate 70 times faster than Delta in the large bronchi but replicates ten times slower in lung parenchyma than the ancestral variant [50, 51].

This analysis has several caveats and prompts further study. First, given the subjectivity of symptom measurement (even with the standardized FLU-PRO scoring system) and given that only a subset of those in our cohort filled out six-month surveys (because long term follow-up is ongoing for more recent enrollees) (S1, S5 Tables in S1 File), our findings should be cross validated in separate cohorts from other populations. Second, these findings are limited to statistical associations and limited inflammatory profiling; further mechanistic research (e.g., transcriptomic data showing differential SARS-CoV-2 receptor expression data among those in the Nasal cluster) may help describe the pathophysiology behind these distinct clusters and their association with short- and long-term outcomes.

The strengths of this study include the use of a standardized measurement tool (FLU-PRO Plus) which quantifies respiratory infection symptom severity as part of prospective data collection. In addition, we utilized unsupervised machine learning technique to visualize patterns in symptom data, which allowed for the identification of symptom clusters in a large cohort where such distinct patterns were not otherwise apparent.

Taken together, these findings suggest distinct COVID-19 symptom profiles are associated with differential short- and long-term outcomes and may help improve COVID-19 prognostication. Our further delineation of inflammatory profiles associated with these acute symptom clusters may further assist in understanding the mechanism of developing long term post-COVID complications and may direct further study into Long COVID prevention and treatment.

Acknowledgments

We sincerely thank the members of the EPICC COVID-19 Cohort Study Group for their many contributions in conducting the study and ensuring effective protocol operations. The following members were all closely involved with the design, implementation, and oversight of the study and have met group authorship criteria for this manuscript:

Brooke Army Medical Center, Fort Sam Houston, TX: Jessica J. Cowden; Teresa M. Merritt

Fort Belvoir Community Hospital, Fort Belvoir, VA: Nora Elnahas; Christa Glinn; Donna Jennings; Chiquita West

ACESO, Henry M. Jackson Foundation, Inc., Bethesda, MD: Danielle Clark

Madigan Army Medical Center, Joint Base Lewis McChord, WA: Susan Chambers; Cristin A. Mount

Naval Medical Center San Diego, San Diego, CA: Nichol M. Kirkland

Tripler Army Medical Center, Honolulu, HI: Catherine Uyehara

Uniformed Services University of the Health Sciences, Bethesda, MD: Heidi Adams; Celia Byrne; Mark Fritschlanski; Edward Parmelee; Jennifer Rusiecki; Emily Samuels; Ann Scher; Melissa Wayman

United States Air Force School of Aerospace Medicine, Dayton, OH: Richard Chapleau; Monica Christian; Kelsey Lanter; Elizabeth Macias

United States Coast Guard, Washington, DC: John K. Iskander

Womack Army Medical Center, Fort Bragg, NC: Kathryn J. Lago

The authors wish to also acknowledge all who have contributed to the EPICC COVID-19 study:

Brooke Army Medical Center, Fort Sam Houston, TX: J. Cowden; M. Darling; S. DeLeon; D. Lindholm; A. Markelz; K. Mende; S. Merritt; T. Merritt; N. Turner; T. Wellington

Carl R. Darnall Army Medical Center, Fort Hood, TX: S. Bazan; D. Hrncir; P.K Love

Fort Belvoir Community Hospital, Fort Belvoir, VA: N. Dimascio-Johnson; N. Elnahas; E. Ewers; K. Gallagher; C. Glinn; U. Jarral; D. Jennings; D. Larson; A. Mentzos; K. Reterstoff; A. Rutt; A. Silva; C. West

Henry M. Jackson Foundation, Inc., Bethesda, MD: P. Blair; J. Chenoweth; D. Clark

Madigan Army Medical Center, Joint Base Lewis McChord, WA: J. Bowman; S. Chambers; C. Colombo; R. Colombo; C. Conlon; K. Everson; P. Faestel; T. Ferguson; L. Gordon; S. Grogan; S. Lis; M. Martin; C. Mount; D. Musfeldt; D. Odineal; M. Perreault; W. Robb-McGrath; R. Sainato; C. Schofield; C. Skinner; M. Stein; M. Switzer; M. Timlin; S. Wood

Naval Medical Center Portsmouth, Portsmouth, VA: S. Banks; R. Carpenter; L. Kim; K. Kronmann; T. Lalani; T. Lee; A. Smith; R. Smith; R. Tant; T. Warkentien

Naval Medical Center San Diego, San Diego, CA: C. Berjohn; S. Cammarata; N. Kirkland; D. Libraty; R. Maves; G. Utz

Tripler Army Medical Center, Honolulu, HI: C. Bradley; S. Chi; R. Flanagan; A. Fuentes; M. Jones; N. Leslie; C. Lucas; C. Madar; K. Miyasato; C. Uyehara

Uniformed Services University of the Health Sciences, Bethesda, MD: H. Adams; B. Agan; L. Andronescu; A. Austin; C. Broder; T. Burgess; C. Byrne; K Chung; J. Davies; C. English; N. Epsi; C. Fox; M. Fritschlanski; A. Hadley; P. Hickey; E. Laing; C. Lanteri; J. Livezey; A. Malloy; R. Mohammed; C. Morales; P. Nwachukwu; C. Olsen; E. Parmelee; S. Pollett; S. Richard; J. Rozman; J. Rusiecki; D. Saunders; E. Samuels; M. Sanchez; A. Scher; M. Simons; A. Snow; K. Telu; D. Tribble; M. Tso; L. Ulomi; M. Wayman

United States Air Force School of Aerospace Medicine, Dayton, OH: T. Chao; R. Chapleau; M. Christian; A. Fries; C. Harrington; V. Hogan; S. Huntsberger; K. Lanter; E. Macias; J. Meyer; S. Purves; K. Reynolds; J. Rodriguez; C. Starr

United States Coast Guard, Washington, DC: J. Iskander; I. Kamara

Womack Army Medical Center, Fort Bragg, NC: B. Barton; D. Hostler; J. Hostler; K. Lago; C. Maldonado; J. Mehrer

William Beaumont Army Medical Center, El Paso, TX: T. Hunter; J. Mejia; R. Mody; J. Montes; R. Resendez; P. Sandoval

Walter Reed National Military Medical Center, Bethesda, MD: I. Barahona; A. Baya; A. Ganesan; N. Huprikar; B. Johnson

Disclaimer: The contents of this publication are the sole responsibility of the author (s) and do not necessarily reflect the views, opinions, or policies of Uniformed Services University of the Health Sciences (USUHS); the Department of Defense (DoD); the Departments of the Army, Navy, or Air Force; the Defense Health Agency, Brooke Army Medical Center; Walter Reed National Military Medical Center; Naval Medical Center San Diego; Madigan Army Medical Center; United States Air Force School of Aerospace Medicine; Fort Belvoir Community Hospital; Carl R. Darnall Army Medical Center; Naval Medical Center Portsmouth, Portsmouth, VA; Tripler Army Medical Center, Honolulu, HI; United States Coast Guard, Washington, DC; Womack Army Medical Center, Fort Bragg, NC; William Beaumont Army Medical Center, El Paso, TX: the Henry M. Jackson Foundation for the Advancement of Military Medicine Inc; the National Institutes of Health. Mention of trade names, commercial products, or organizations does not imply endorsement by the U.S. Government. The investigators have adhered to the policies for protection of human subjects as prescribed in 45 CFR 46.

References

  1. 1. Stokes EK, Zambrano LD, Anderson KN, et al. Coronavirus Disease 2019 Case Surveillance—United States, January 22-May 30, 2020. MMWR Morbidity and mortality weekly report 2020; 69(24): 759–65. pmid:32555134
  2. 2. Sorci G, Faivre B, Morand S. Explaining among-country variation in COVID-19 case fatality rate. Scientific Reports 2020; 10(1): 18909. pmid:33144595
  3. 3. Pereira NL, Ahmad F, Byku M, et al. COVID-19: Understanding Inter-Individual Variability and Implications for Precision Medicine. Mayo Clinic proceedings 2021; 96(2): 446–63. pmid:33549263
  4. 4. Bull-Otterson L BS, Saydah S, et al. Post–COVID Conditions Among Adult COVID-19 Survivors Aged 18–64 and ≥65 Years—United States, March 2020–November 2021: CDC, 2022 May 27, 2022.
  5. 5. Spudich S, Nath A. Nervous system consequences of COVID-19. Science (New York, NY) 2022; 375(6578): 267–9. pmid:35050660
  6. 6. Arnold DT, Hamilton FW, Milne A, et al. Patient outcomes after hospitalisation with COVID-19 and implications for follow-up: results from a prospective UK cohort. Thorax 2021; 76(4): 399–401. pmid:33273026
  7. 7. Cirulli ET, Schiabor Barrett KM, Riffle S, et al. Long-term COVID-19 symptoms in a large unselected population. medRxiv: the preprint server for health sciences 2020: 2020.10.07.20208702.
  8. 8. Bellan M, Soddu D, Balbo PE, et al. Respiratory and Psychophysical Sequelae Among Patients With COVID-19 Four Months After Hospital Discharge. JAMA network open 2021; 4(1): e2036142. pmid:33502487
  9. 9. Dennis A, Wamil M, Alberts J, et al. Multiorgan impairment in low-risk individuals with post-COVID-19 syndrome: a prospective, community-based study. BMJ open 2021; 11(3): e048391. pmid:33785495
  10. 10. Davis HE, Assaf GS, McCorkell L, et al. Characterizing long COVID in an international cohort: 7 months of symptoms and their impact. EClinicalMedicine 2021; 38: 101019. pmid:34308300
  11. 11. Chen C, Haupert SR, Zimmermann L, Shi X, Fritsche LG, Mukherjee B. Global Prevalence of Post COVID-19 Condition or Long COVID: A Meta-Analysis and Systematic Review. J Infect Dis 2022.
  12. 12. Richard SA, Epsi NJ, Pollett S, et al. Performance of the inFLUenza Patient-Reported Outcome Plus (FLU-PRO Plus) Instrument in Patients With Coronavirus Disease 2019. Open Forum Infectious Diseases 2021; 8(12). pmid:34901299
  13. 13. Araiza A, Duran M, Patiño C, Marik PE, Varon J. The Ichikado CT score as a prognostic tool for coronavirus disease 2019 pneumonia: a retrospective cohort study. Journal of intensive care 2021; 9(1): 51. pmid:34419163
  14. 14. Meng L, Dong D, Li L, et al. A Deep Learning Prognosis Model Help Alert for COVID-19 Patients at High-Risk of Death: A Multi-Center Study. IEEE journal of biomedical and health informatics 2020; 24(12): 3576–84. pmid:33108303
  15. 15. Haimovich AD, Ravindra NG, Stoytchev S, et al. Development and Validation of the Quick COVID-19 Severity Index: A Prognostic Tool for Early Clinical Decompensation. Annals of emergency medicine 2020; 76(4): 442–53. pmid:33012378
  16. 16. Kenny G, McCann K, O’Brien C, et al. Identification of Distinct Long COVID Clinical Phenotypes Through Cluster Analysis of Self-Reported Symptoms. Open Forum Infect Dis 2022; 9(4): ofac060. pmid:35265728
  17. 17. Sudre CH, Lee KA, Lochlainn MN, et al. Symptom clusters in COVID-19: A potential clinical prediction tool from the COVID Symptom Study app. Science advances 2021; 7(12). pmid:33741586
  18. 18. Cheng X, Wan H, Yuan H, et al. Symptom Clustering Patterns and Population Characteristics of COVID-19 Based on Text Clustering Method. Frontiers in public health 2022; 10: 795734. pmid:35186839
  19. 19. Wong-Chew RM, Rodríguez Cabrera EX, Rodríguez Valdez CA, et al. Symptom cluster analysis of long COVID-19 in patients discharged from the Temporary COVID-19 Hospital in Mexico City. Therapeutic advances in infectious disease 2022; 9: 20499361211069264. pmid:35059196
  20. 20. Güemes A, Ray S, Aboumerhi K, et al. A syndromic surveillance tool to detect anomalous clusters of COVID-19 symptoms in the United States. Sci Rep 2021; 11(1): 4660. pmid:33633250
  21. 21. Richard SA, Epsi NJ, Lindholm DA, et al. COVID-19 patient reported symptoms using FLU-PRO Plus in a cohort study: associations with infecting genotype, vaccine history, and return-to-health. Open Forum Infectious Diseases 2022.
  22. 22. Richard SA, Pollett SD, Lanteri CA, et al. COVID-19 Outcomes Among US Military Health System Beneficiaries Include Complications Across Multiple Organ Systems and Substantial Functional Impairment. Open Forum Infectious Diseases 2021; 8(12). pmid:34909439
  23. 23. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of chronic diseases 1987; 40(5): 373–83. pmid:3558716
  24. 24. Powers JH, Guerrero ML, Leidy NK, et al. Development of the Flu-PRO: a patient-reported outcome (PRO) instrument to evaluate symptoms of influenza. BMC Infectious Diseases 2016; 16(1): 1. pmid:26729246
  25. 25. Han A, Poon J-L, Powers JH, Leidy NK, Yu R, Memoli MJ. Using the Influenza Patient-reported Outcome (FLU-PRO) diary to evaluate symptoms of influenza viral infection in a healthy human challenge model. BMC Infectious Diseases 2018; 18(1): 353. pmid:30055573
  26. 26. Powers JH 3rd, Bacci ED, Leidy NK, et al. Performance of the inFLUenza Patient-Reported Outcome (FLU-PRO) diary in patients with influenza-like illness (ILI). PloS one 2018; 13(3): e0194180. pmid:29566007
  27. 27. Freed NE, Vlková M, Faisal MB, Silander OK. Rapid and inexpensive whole-genome sequencing of SARS-CoV-2 using 1200 bp tiled amplicons and Oxford Nanopore Rapid Barcoding. Biology methods & protocols 2020; 5(1): bpaa014.
  28. 28. Rambaut A, Holmes EC, O’Toole Á, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature microbiology 2020; 5(11): 1403–7. pmid:32669681
  29. 29. Blair PW, Brandsma J, Chenoweth J, et al. Topological data analysis identifies distinct biomarker phenotypes during the ‘inflammatory’ phase of COVID-19. 2021: 2021.12.25.21268206.
  30. 30. Epsi NJ, Richard SA, Lindholm DA, et al. Understanding ’hybrid immunity’: comparison and predictors of humoral immune responses to SARS-CoV-2 infection and COVID-19 vaccines. Clinical infectious diseases: an official publication of the Infectious Diseases Society of America 2022. pmid:35608504
  31. 31. Pearson K. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, Dublin philosophical magazine journal of science 1901; 2(11): 559–72.
  32. 32. Ding C, He X. K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on Machine learning, 2004:29.
  33. 33. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society 2001; 63(2): 411–23.
  34. 34. Team TRDC. R: A language environment for statistical computing. R Foundation for Statistical Computing. 2020.
  35. 35. Epsi NJ, Richard SA, Laing ED, et al. Clinical, immunological and virological SARS-CoV-2 phenotypes in obese and non-obese military health system beneficiaries. The Journal of Infectious Diseases 2021.
  36. 36. Mueller AL, McNamara MS, Sinclair DA. Why does COVID-19 disproportionately affect older people? Aging 2020; 12(10): 9959–81. pmid:32470948
  37. 37. Borges do Nascimento IJ, Cacic N, Abdulazeem HM, et al. Novel Coronavirus Infection (COVID-19) in Humans: A Scoping Review and Meta-Analysis. Journal of clinical medicine 2020; 9(4). pmid:32235486
  38. 38. Yang J, Zheng Y, Gou X, et al. Prevalence of comorbidities and its effects in patients infected with SARS-CoV-2: a systematic review and meta-analysis. International journal of infectious diseases: IJID: official publication of the International Society for Infectious Diseases 2020; 94: 91–5. pmid:32173574
  39. 39. Wang D, Hu B, Hu C, et al. Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China. Jama 2020; 323(11): 1061–9. pmid:32031570
  40. 40. Dani M, Dirksen A, Taraborrelli P, et al. Autonomic dysfunction in ’long COVID’: rationale, physiology and management strategies. Clinical medicine (London, England) 2021; 21(1): e63–e7. pmid:33243837
  41. 41. Eshak N, Abdelnabi M, Ball S, et al. Dysautonomia: An Overlooked Neurological Manifestation in a Critically ill COVID-19 Patient. The American journal of the medical sciences 2020; 360(4): 427–9. pmid:32739039
  42. 42. de Melo GD, Lazarini F, Levallois S, et al. COVID-19-related anosmia is associated with viral persistence and inflammation in human olfactory epithelium and brain infection in hamsters. Science translational medicine 2021; 13(596). pmid:33941622
  43. 43. Wang F, Kream RM, Stefano GB. Long-Term Respiratory and Neurological Sequelae of COVID-19. Medical science monitor: international medical journal of experimental and clinical research 2020; 26: e928996. pmid:33177481
  44. 44. Rass V, Tymoszuk P, Sahanic S, et al. Distinct smell and taste disorder phenotype of post-acute COVID-19 sequelae. 2022: 2022.06.02.22275932.
  45. 45. Liu F, Li L, Xu M, et al. Prognostic value of interleukin-6, C-reactive protein, and procalcitonin in patients with COVID-19. Journal of clinical virology: the official publication of the Pan American Society for Clinical Virology 2020; 127: 104370. pmid:32344321
  46. 46. Herold T, Jurinovic V, Arnreich C, et al. Elevated levels of IL-6 and CRP predict the need for mechanical ventilation in COVID-19. The Journal of allergy and clinical immunology 2020; 146(1): 128-36.e4. pmid:32425269
  47. 47. Bhattacharyya RP, Hanage WP. Challenges in Inferring Intrinsic Severity of the SARS-CoV-2 Omicron Variant. 2022; 386(7): e14.
  48. 48. Kozlov M. Omicron’s feeble attack on the lungs could make it less dangerous. Nature 2022; 601(7892): 177. pmid:34987210
  49. 49. Davies MA, Kassanjee R, Rousseau P, et al. Outcomes of laboratory-confirmed SARS-CoV-2 infection in the Omicron-driven fourth wave compared with previous waves in the Western Cape Province, South Africa. Tropical medicine & international health: TM & IH 2022.
  50. 50. Willett BJ, Grove J, MacLean OA, et al. The hyper-transmissible SARS-CoV-2 Omicron variant exhibits significant antigenic change, vaccine escape and a switch in cell entry mechanism. 2022: 2022.01.03.21268111.
  51. 51. Meng B, Abdullahi A, Ferreira IATM, et al. Altered TMPRSS2 usage by SARS-CoV-2 Omicron impacts infectivity and fusogenicity. Nature 2022; 603(7902): 706–14. pmid:35104837