Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Accuracy and interobserver-agreement of respiratory rate measurements by healthcare professionals, and its effect on the outcomes of clinical prediction/diagnostic rules

  • Gideon H. P. Latten ,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Software, Writing – original draft, Writing – review & editing

    g.latten@zuyderland.nl

    Affiliations Emergency Department, Zuyderland Medical Centre, Heerlen, The Netherlands, Department of Family Medicine, Maastricht University, Care and Public Health Research Institute (CAPHRI), Maastricht, The Netherlands

  • Michelle Spek,

    Roles Conceptualization, Formal analysis

    Affiliation Department of Internal Medicine, division general medicine, section acute medicine, Maastricht University Medical Centre (MUMC+), Maastricht, The Netherlands

  • Jean W. M. Muris,

    Roles Writing – review & editing

    Affiliation Department of Family Medicine, Maastricht University, Care and Public Health Research Institute (CAPHRI), Maastricht, The Netherlands

  • Jochen W. L. Cals,

    Roles Writing – review & editing

    Affiliation Department of Family Medicine, Maastricht University, Care and Public Health Research Institute (CAPHRI), Maastricht, The Netherlands

  • Patricia M. Stassen

    Roles Conceptualization, Formal analysis, Supervision, Writing – review & editing

    Affiliation Department of Internal Medicine, division general medicine, section acute medicine, Maastricht University Medical Centre (MUMC+), Maastricht, The Netherlands

Abstract

Objective

In clinical prediction/diagnostic rules aimed at early detection of critically ill patients, the respiratory rate plays an important role. We investigated the accuracy and interobserver-agreement of respiratory rate measurements by healthcare professionals, and the potential effect of incorrect measurements on the scores of 4 common clinical prediction/diagnostic rules: Systemic Inflammatory Response Syndrome (SIRS) criteria, quick Sepsis-related Organ Failure Assessment (qSOFA), National Early Warning Score (NEWS), and Modified Early Warning Score (MEWS).

Methods

Using an online questionnaire, we showed 5 videos with a healthy volunteer, breathing at a fixed (true) rate (13–28 breaths/minute). Respondents measured the respiratory rate, and categorized it as low, normal, or high. We analysed how accurate the measurements were using descriptive statistics, and calculated interobserver-agreement using the intraclass correlation coefficient (ICC), and agreement between measurements and categorical judgments using Cohen’s Kappa. Finally, we analysed how often incorrect measurements led to under/overestimation in the selected clinical rules.

Results

In total, 448 healthcare professionals participated. Median measurements were slightly higher (1-3/min) than the true respiratory rate, and 78.2% of measurements were within 4/min of the true rate. ICC was moderate (0.64, 95% CI 0.39–0.94). When comparing the measured respiratory rates with the categorical judgments, 14.5% were inconsistent. Incorrect measurements influenced the 4 rules in 8.8% (SIRS) to 37.1% (NEWS). Both underestimation (4.5–7.1%) and overestimation (3.9–32.2%) occurred.

Conclusions

The accuracy and interobserver-agreement of respiratory rate measurements by healthcare professionals are suboptimal. This leads to both over- and underestimation of scores of four clinical prediction/diagnostic rules. The clinically most important effect could be a delay in diagnosis and treatment of (critically) ill patients.

Introduction

An abnormal respiratory rate is an important predictor of deterioration of a patient.[1,2] Consequently, the respiratory rate has a prominent place in many clinical prediction/diagnostic rules, which aim to early identify critically ill patients. Adequate and timely identification of these patients is important, as a delay in treatment increases morbidity and mortality disproportionately.[35] Commonly used prediction/diagnostic rules for critical illness are the Systemic Inflammatory Response Syndrome (SIRS) criteria, the quick Sepsis-related Organ Failure Assessment (qSOFA), the National Early Warning Score (NEWS), and the Modified Early Warning Score (MEWS) (Table 1).[69]

thumbnail
Table 1. Four common clinical prediction/diagnostic rules for critical illness.

https://doi.org/10.1371/journal.pone.0223155.t001

Considering the predictive potential of the respiratory rate, one would expect healthcare professionals to assess it as often and accurate as possible. However, in daily practice, the respiratory rate turns out to be the least often recorded vital sign, both on wards as well as in emergency departments (EDs).[1012] Contrary to body temperature, blood pressure, and heart rate, the respiratory rate is mostly measured manually, which could be one of the explanations of infrequent recording. In addition, counting the respiratory rate is believed to waste valuable time.[13] In order to improve documentation of the respiratory rate, some organizations use systems that force employees into recording it. This may however, lead to inaccurate estimations of the respiratory rate, causing a delay in the identification and treatment of patients with serious conditions, such as sepsis.[7,14]

Importantly, minor changes in the respiratory rate, just above or below normal, can have important effects on risk stratification for critically ill patients. Although the accuracy and interobserver-agreement of respiratory rate measurements by healthcare professionals has been reported to be fair to good, most of these studies used a wide and probably unnaturally low or high–range (5–60 breaths/minute), and the number of observers was small.[14,15] The impact of misclassification of respiratory rate measurements on important diagnostic/prognostic rules for critically ill patients has not yet been studied.

In this study, we investigated the accuracy and interobserver-agreement of respiratory rate measurements by different healthcare professionals, using 5 videos with different respiratory rates of one healthy volunteer. We hypothesized that a substantial proportion of measurements would deviate more than 4/min from the true respiratory rate, and that there would be inconsistencies when comparing continuous measurements with categorical judgments. Furthermore, we expected that deviations from the true respiratory rate would influence the outcome of 4 frequently used clinical prediction/diagnostic rules: SIRS, qSOFA, MEWS, and NEWS.[69]

Methods

Design and setting

For this questionnaire-based study, we made videos of a healthy volunteer, breathing with different respiratory rates. We shared these videos and a corresponding questionnaire with healthcare professionals through e-mail and social media. The research protocol was judged by the ethics committee METC Z and approval was not deemed necessary. Participants were aware of the study aims and the intention of publishing the results in a peer-reviewed journal. They were asked to participate when interested.

Videos

We created five videos, showing a healthy, male volunteer in supine position in a quiet setting. In each video, the volunteer breathed with a constant respiratory rate between 13 and 28 breaths per minute (28, 13, 22, 19 and 25 breaths/minute for video 1 to 5, respectively). In order to breathe at a constant rate, our volunteer was guided by ECG derived respiratory signals on a monitor. We selected stable video recordings, to make sure there was no variation in the respiratory rate throughout the videos. We defined the true respiratory rate as the rate displayed on the monitor, which was confirmed by the investigators, by counting the breaths during the whole video, divided by the duration of the video. Each video lasted approximately 60 seconds. See Fig 1 (and Video 1–5 available online) for an example of one of the videos.

thumbnail
Fig 1. Still example of one of the videos used in the questionnaire.

https://doi.org/10.1371/journal.pone.0223155.g001

Questionnaire

In March 2018, an invitation to participate in this questionnaire was distributed among different healthcare professionals throughout the Netherlands. We sent invitations by e-mail to the professional network of the authors, and we stimulated recipients to pass the invitation on to relevant colleagues. Furthermore, we posted the link to the (Dutch) survey on social media (Twitter, LinkedIn) in order to reach as many potential respondents as possible. The questionnaire could be filled out during a period of 3 weeks. We asked respondents about their profession, the years of experience in the current profession, and their preferred method of respiratory rate assessment. Thereafter, video 1 was shown. Respondents were asked to measure the respiratory rate, and after each video, they were asked to judge whether it was ‘low’, ‘normal’ or ‘high’. We did not provide a definition of these three categories, as a categorical description of the respiratory rate is often used in daily practice.

Statistical analyses

All statistical analyses were performed using IBM SPSS statistical software version 25 (Chicago, Illinois, USA). We used descriptive statistics to summarize the respondents’ profession, experience, and preferred method of respiratory rate assessment.

In order to assess how accurate the respondents’ measurements were, we decided to use descriptive analysis and calculate medians with interquartile ranges (IQR). In addition, we calculated the proportion of measurements that were within 4 breaths/minute of the true respiratory rate. This cut-off value was chosen since we expected that a majority of the respondents would measure for 15 seconds and multiply by 4. A deviation of 1 breath would therefore result in a deviation of 4 from the true rate. To investigate if there were significant differences in measurements between groups of professionals, we compared groups for each video.

We further determined the interobserver-agreement of the measured respiratory rates, by calculating the intraclass correlation coefficients (ICC) and their 95% confidence intervals (CI), based on a single-measurement, absolute-agreement, 2-way random effects model. This was done for all videos together, as well as combined for video 1, 3 and 5 (respiratory rate >20 breaths/minute), and for videos 2 and 4 (respiratory rate <20 breaths/minute). ICC values less than 0.50 are considered indicative of poor interobserver-agreement, between 0.50 and 0.75 moderate agreement, between 0.75 and 0.90 good agreement, and values higher than 0.90 indicate excellent agreement.[16] In order to achieve a large, representative group of participants, we limited the number of videos to 5. This was in accordance with the sample size we calculated to investigate interobserver agreement. We additionally calculated the effect of showing 10 instead of 5 videos to reduce the width of the confidence intervals, but this did not result in narrower confidence intervals.

In addition, the respondents’ measurements of the respiratory rate were compared with their categorical judgments (‘low’, ‘normal’, ‘high’). We used the following cut-off values to define a low, normal and high respiratory rate: <12 breaths/minute for ‘low’, 12 through 20 for ‘normal’, and >20 for ‘high’. These are widely used cut-off points for adults.[6] Cohen’s Kappa statistics were used to measure the agreement between the respondents’ measurements and their categorical answers. Kappa values of 0.6–0.8 represent moderate agreement, values of 0.8–0.9 strong agreement, and values >0.9 almost perfect agreement.[17]

In order to evaluate the potential clinical relevance of accurate respiratory rate measurements, we calculated how often an incorrect measurement of the respiratory rate would have resulted in an incorrect result on 4 clinical prediction/diagnostic rules for critical illness: SIRS, qSOFA, NEWS, and MEWS (Table 1).

Results

Respondents and method of assessment

In total, 452 respondents filled out the questionnaire within 3 weeks after sending out the first invitation (median 3, IQR 2–7 days). After exclusion of 4 incomplete questionnaires, we included 448 respondents in the analyses. The study sample consisted of nurses, consultants, residents, medical students, general practitioners (GPs) and other healthcare professionals (Table 2). Of these participants, 432 (96.4%) assessed the respiratory rate on a regular base.

thumbnail
Table 2. Respondents and proportion of measurements within 4/min from the true respiratory rate*.

https://doi.org/10.1371/journal.pone.0223155.t002

Accuracy of respiratory rate measurements

Fig 2 shows the measured respiratory rates for each video. In general, the median reported respiratory rate was between 1–3 breaths/minute higher than the true rate. IQRs were between 2–4 breaths/minute, and the overall range of measurements was between 6 and 64/min.

thumbnail
Fig 2. Measured respiratory rates for each video.

* Extreme values (<8/>40) are not depicted in these graphs.

https://doi.org/10.1371/journal.pone.0223155.g002

Table 2 shows the proportion of measurements within 4/min of the true respiratory rate. Overall, 78.2% of measurements were within this range (67.4%, 81.9%, 81.9%, 87.9%, and 71.7%% for video 1–5, respectively). We found no significant differences in this proportion between the different groups of professionals (Table 2).

Interobserver-agreement

For all respiratory rate measurements of the 5 videos together, the ICC was 0.64 (95% CI 0.39–0.94), which indicates moderate agreement. For videos with a high respiratory rate (video 1, 3 and 5 (>20 and ≥22/min)), the ICC was 0.29 (95% CI 0.10–0.94), indicating poor agreement. Videos with a low respiratory rate (video 2 and 4 (<20)) showed an ICC of 0.50 (95% CI 0.16–0.99), indicating moderate agreement.

Agreement between measurements and categorical judgments

Table 3 shows the agreement between the respondents’ measurements and their categorical judgments. For all videos together, 324 (14.5%) inconsistencies were present. Most (n = 194, 8.7%) of these occurred when a respondent measured a “normal” respiratory rate (12 through 20/min), and incorrectly judged this to be “high”. In most (n = 148, 76.3%) of these cases, the respiratory rate was measured as exactly 20/minute. In 68 cases (3.0%), a respondent measured a “high” respiratory rate (>20 breaths/minute), and incorrectly judged this to be “normal” (n = 64, 2.9%) or “low” (n = 4, 0.2%). Cohen’s Kappa was 0.71 for all videos together, which represents moderate agreement. However, for all individual videos, Cohen’s kappa was lower (0.27–0.59).

thumbnail
Table 3. Agreement between measurements and categorical judgments*.

https://doi.org/10.1371/journal.pone.0223155.t003

Potential effect on clinical prediction/diagnostic rules

Table 4 shows the potential effect of incorrect respiratory rate measurements on SIRS, qSOFA, NEWS, and MEWS. Of these rules, SIRS was least affected, with misclassification in 8.8%. qSOFA scores changed in 8.9%, NEWS in 18.2%, and MEWS scores changed in 37.1% of cases. Overall, 4.5–7.1% of patients would incorrectly receive a lower score, while 3.9–32.2% would receive a higher one, when compared to the score based on their true respiratory rate.

thumbnail
Table 4. Effect of respiratory rate measurements on clinical prediction/diagnostic rules*.

https://doi.org/10.1371/journal.pone.0223155.t004

Discussion

This study is, to our knowledge, the first that used a large, heterogeneous group of professionals to measure and categorize different clinically relevant respiratory rates. Our study shows that these respiratory rate measurements by health care professionals are not accurate, and that the interobserver-agreement is suboptimal, which may have an important effect on the results of four common clinical prediction/diagnostic rules.

We designed this study using simple tools, available to the majority of healthcare professionals today. We made five videos and shared them using e-mail and social media, after which 448 professionals completed and returned the questionnaire within three weeks. Median measured respiratory rates were slightly higher than the true respiratory rate, 78.2% of measurements were within 4 breaths per minute from the true rate, and the ICC was moderate. These results are in line with those of previous studies.[18,19] Remarkable is the fact that 14.5% of responses showed inconsistencies when comparing the respondents’ measurements and their categorical judgments. In addition, incorrect respiratory rate measurements may in theory have led to both overestimation (12.9%) and underestimation (5.4%) of the score of four common prediction/diagnostic rules.

The median measured respiratory rates varied highly. While IQRs were between 2 and 4/min, ranges were wide (overall 6-64/min). Overall, 78.2% of measurements were within 4 breaths per minute from the true rate. We did not find any differences between professional groups regarding the proportion of measurements within 4/min from the true rate. These results suggest that respiratory rate assessment by different groups of healthcare professionals is suboptimal.

With a value of 0.64 (95% CI 0.39–0.94), the ICC was moderate. Previous studies have demonstrated values as low as 0.26 (95% CI 0.16–0.35), but also as high as 0.99 (95% CI 0.97–1.00).[14,15] A possible explanation for this low ICC is the difference in design between these studies. One study, with a low ICC (0.26), compared values recorded in patient charts to values measured manually by residents.[14] These values were not obtained at the exact same time, and while the participating residents were informed and prepared, the nurses who performed the measurements were not. Another study, with a high ICC (0.99), performed a simulation using 5 videos as well.[15] Respondents were mostly experienced nurses, and the respiratory rates in the videos varied largely: 5, 10, 15, 30 and 60 breaths/min. For professionals like these, it is relatively easy to differentiate between a respiratory rate of 15 and 60, or even 30 breaths/minute. However, measuring a respiratory rate just above or below commonly used cut-off points of >20 or ≥22 breaths/minute is more difficult. Therefore, the smaller range of respiratory rates in our videos, and our large, heterogeneous group of (future) healthcare professionals may have resulted in our less favourable ICCs. As the respiratory rate has been proven to predict adverse outcomes and is incorporated in many clinical prediction/diagnostic rules, this is an important finding.[2,20,21]

When comparing the respondents’ measurements and their categorical judgments, 14.5% of the answers were inconsistent. Respondents measuring a normal (12-20/min) respiratory rate, while judging this as ‘high’, caused the most inconsistencies (8.7%). In over 75% of these cases, the measured respiratory rate was exactly 20/min, which could suggest that some respondents believe that a respiratory rate of 20/min is abnormal. We did not provide a definition of “low”, “normal”, or “high”, but there is no current guideline which supports the use of a cut-off point <20/min for an abnormal respiratory rate. It would be worthwhile to investigate if education would improve these results, as these results suggest a lack of knowledge regarding common cut-off points.

One of the most interesting results of this study was found in the impact of incorrect respiratory rate measurements on daily practice. We entered the respondents’ answers into four commonly used prediction/diagnostic rules, as a proxy of the “true consequence” of incorrect measurements. This resulted in incorrect scores for SIRS in 8.8%, for qSOFA in 8.9%, for NEWS in 18.2%, and for MEWS in 37.1%. While median measurements were higher than the true respiratory rate in all videos, the incorrect measurements resulted in both incorrect lower and higher scores (Table 3). In daily practice, this could have led to delayed diagnosis and treatment of (critically) ill patients or overalerting and eventually alarm fatigue.

By performing this video-based questionnaire, we created the opportunity to have 448 healthcare professionals measure the respiratory rate of the same patient breathing at a constant rate. This design also has limitations. Respondents could only visually measure the respiratory rate. Some professionals normally use palpation of the chest to optimize their measurement. However, we made sure that the volunteer’s breaths could be seen clearly in all videos, and we expect that the restriction to visual assessment had no major influence on the results. In order to provide high quality, stable recordings, we had to select specific sections of video, resulting in 4/5 videos being slightly less than 1 minute long. This could have resulted in suboptimal measurements by 8.3% of respondents, as they reported that they usually measure the respiratory rate for a full minute. Finally, we did not include a video with a low respiratory rate, so we cannot draw conclusions regarding the ability of healthcare professionals to recognize bradypnea.

Notwithstanding these limitations, this study shows that, even when professionals are asked to measure the respiratory rate at the best of their ability, results are still suboptimal. In crowded EDs, quick and reliable methods to accurately measure the respiratory rate could be valuable, especially since many EDs and hospitals rely on these measurements to identify patients at risk, for instance, of sepsis. Therefore, further research should be undertaken to investigate the reliability of non-invasive methods to measure the respiratory rate, especially in EDs. This to avoid incorrect alarms, and even more important, delays in diagnosis and treatment, even when patients are potentially very ill.

In conclusion, using simple tools available to most healthcare professionals today, we showed that accuracy and interobserver-agreement of respiratory rate measurements by healthcare professionals are suboptimal. The clinical relevance of incorrect measurements is illustrated by alterations in the score of four common prediction/diagnostic rules. This happened in 8.8–37.1% of cases, with the clinically the most important effect being potential delay in diagnosis and treatment of (critically) ill patients.

Acknowledgments

Audrey Merry, epidemiologist, has provided extensive statistical support.

References

  1. 1. Fieselmann JF, Hendryx MS, Helms CM, Wakefield DS. Respiratory rate predicts cardiopulmonary arrest for internal medicine inpatients. J Gen Intern Med. 1993;8: 354–360. pmid:8410395
  2. 2. Subbe CP, Davies RG, Williams E, Rutherford P, Gemmell L. Effect of introducing the Modified Early Warning score on clinical outcomes, cardio-pulmonary arrests and intensive care utilisation in acute medical admissions. Anaesthesia. 2003;58: 797–802. pmid:12859475
  3. 3. Peltan ID, Brown SM, Bledsoe JR, Sorensen J, Samore MH, Allen TL, et al. ED Door-to-Antibiotic Time and Long-term Mortality in Sepsis. Chest. 2019;155: 938–946. pmid:30779916
  4. 4. Ferrer R, Martin-Loeches I, Phillips G, Osborn TM, Townsend S, Dellinger RP, et al. Empiric antibiotic treatment reduces mortality in severe sepsis and septic shock from the first hour: results from a guideline-based performance improvement program. Crit Care Med. 2014;42: 1749–1755. pmid:24717459
  5. 5. Kumar A, Roberts D, Wood KE, Light B, Parrillo JE, Sharma S, et al. Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock. Crit Care Med. 2006;34: 1589–1596. pmid:16625125
  6. 6. Smith GB, Prytherch DR, Meredith P, Schmidt PE, Featherstone PI. The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death. Resuscitation. 2013;84: 465–470. pmid:23295778
  7. 7. Subbe CP, Kruger M, Rutherford P, Gemmel L. Validation of a modified Early Warning Score in medical admissions. QJM. 2001;94: 521–526. pmid:11588210
  8. 8. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016;315: 801–810. pmid:26903338
  9. 9. Bone RC, Sibbald WJ, Sprung CL. The ACCP-SCCM consensus conference on sepsis and organ failure. Chest. 1992;101: 1481–1483. pmid:1600757
  10. 10. Bianchi W, Dugas AF, Hsieh YH, Saheed M, Hill P, Lindauer C, et al. Revitalizing a vital sign: improving detection of tachypnea at primary triage. Ann Emerg Med. 2013;61: 37–43. pmid:22738682
  11. 11. Mukkamala SG, Gennings C, Wenzel RP. R = 20: bias in the reporting of respiratory rates. Am J Emerg Med. 2008;26: 237–239.
  12. 12. Leuvan CH, Mitchell I. Missed opportunities? An observational study of vital sign measurements. Crit Care Resusc. 2008;10: 111–115. pmid:18522524
  13. 13. Flenady T, Dwyer T, Applegarth J. Explaining transgression in respiratory rate observation methods in the emergency department: A classic grounded theory analysis. Int J Nurs Stud. 2017;74: 67–75. pmid:28622531
  14. 14. Semler MW, Stover DG, Copland AP, Hong G, Johnson MJ, Kriss MS, et al. Flash mob research: a single-day, multicenter, resident-directed study of respiratory rate. Chest. 2013;143: 1740–1744. pmid:23197319
  15. 15. Nielsen LG, Folkestad L, Brodersen JB, Brabrand M. Inter-Observer Agreement in Measuring Respiratory Rate. PLoS One. 2015;10: e0129493. pmid:26090961
  16. 16. Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016;15: 155–163. pmid:27330520
  17. 17. McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22: 276–282.
  18. 18. Brabrand M, Hallas P, Folkestad L, Lautrup-Larsen CH, Brodersen JB. Measurement of respiratory rate by multiple raters in a clinical setting is unreliable: A cross-sectional simulation study. J Crit Care. 2018;44: 404–406. pmid:29310091
  19. 19. Edmonds ZV, Mower WR, Lovato LM, Lomeli R. The reliability of vital sign measurements. Ann Emerg Med. 2002;39: 233–237. pmid:11867974
  20. 20. Seymour CW, Liu VX, Iwashyna TJ, Brunkhorst FM, Rea TD, Scherag A, et al. Assessment of Clinical Criteria for Sepsis: For the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. 2016;315: 762–774. pmid:26903335
  21. 21. Fieselmann JF, Hendryx MS, Helms CM, Wakefield DS. Respiratory rate predicts cardiopulmonary arrest for internal medicine inpatients. J Gen Intern Med. 1993;8: 354–360. pmid:8410395