Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

Comparison of devices used to measure blood pressure, grip strength and lung function: A randomised cross-over study

  • Carli Lessof,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Writing – original draft

    Affiliation National Centre for Research Methods, University of Southampton, Southampton, United Kingdom

  • Rachel Cooper,

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliations Faculty of Medical Sciences, Translational and Clinical Research Institute, AGE Research Group, Newcastle University, Newcastle upon Tyne, United Kingdom, NIHR Newcastle Biomedical Research Centre, Newcastle University and Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, United Kingdom

  • Andrew Wong,

    Roles Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Writing – review & editing

    Affiliation MRC Unit for Lifelong Health and Ageing at UCL, London, United Kingdom

  • Rebecca Bendayan,

    Roles Methodology, Writing – review & editing

    Affiliations Department of Biostatistics and Health Informatics of the Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, United Kingdom, NIHR Biomedical Research Centre at South London and Maudsley, NHS Foundation Trust and King’s College London, London, United Kingdom

  • Rishi Caleyachetty,

    Roles Investigation, Methodology, Writing – review & editing

    Affiliations Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom, Warwick Medical School, University of Warwick, Warwick, United Kingdom

  • Hayley Cheshire,

    Roles Project administration, Writing – review & editing

    Affiliation Hayley Cheshire Research, Bournemouth, United Kingdom

  • Theodore Cosco,

    Roles Investigation, Writing – review & editing

    Affiliation Department of Gerontology, Simon Fraser University, Vancouver, Canada and Oxford Institute of Population Ageing, University of Oxford, Oxford, United Kingdom

  • Ahmed Elhakeem,

    Roles Investigation, Writing – review & editing

    Affiliation MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom

  • Anna L. Hansell,

    Roles Methodology, Writing – review & editing

    Affiliation Centre for Environmental Health and Sustainability, University of Leicester, United Kingdom

  • Aradhna Kaushal,

    Roles Investigation, Writing – review & editing

    Affiliation Research Department of Behavioural Science and Health, UCL, London, United Kingdom

  • Diana Kuh,

    Roles Conceptualization, Funding acquisition, Writing – review & editing

    Affiliation MRC Unit for Lifelong Health and Ageing at UCL, London, United Kingdom

  • David Martin,

    Roles Supervision, Writing – review & editing

    Affiliation National Centre for Research Methods, University of Southampton, Southampton, United Kingdom

  • Cosetta Minelli,

    Roles Methodology, Writing – review & editing

    Affiliation National Heart & Lung Institute, Imperial College London, United Kingdom

  • Stella Muthuri,

    Roles Investigation, Writing – review & editing

    Affiliation MRC Unit for Lifelong Health and Ageing at UCL, London, United Kingdom

  • Maria Popham,

    Roles Project administration, Writing – review & editing

    Affiliation MRC Unit for Lifelong Health and Ageing at UCL, London, United Kingdom

  • Seif O. Shaheen,

    Roles Methodology, Writing – review & editing

    Affiliation Institute of Population Health Sciences, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom

  • Patrick Sturgis,

    Roles Methodology, Supervision, Writing – review & editing

    Affiliation Department of Methodology, London School of Economics, United Kingdom

  •  [ ... ],
  • Rebecca Hardy

    Roles Conceptualization, Funding acquisition, Methodology, Supervision, Writing – review & editing

    R.J.Hardy@lboro.ac.uk

    Affiliations Social Research Institute, UCL, London, United Kingdom, School of Sport, Exercise and Health Sciences, Loughborough University, Loughborough, United Kingdom

  • [ view all ]
  • [ view less ]

Abstract

Background

Blood pressure, grip strength and lung function are frequently assessed in longitudinal population studies, but the measurement devices used differ between studies and within studies over time. We aimed to compare measurements ascertained from different commonly used devices.

Methods

We used a randomised cross-over study. Participants were 118 men and women aged 45–74 years whose blood pressure, grip strength and lung function were assessed using two sphygmomanometers (Omron 705-CP and Omron HEM-907), four handheld dynamometers (Jamar Hydraulic, Jamar Plus+ Digital, Nottingham Electronic and Smedley) and two spirometers (Micro Medical Plus turbine and ndd Easy on-PC ultrasonic flow-sensor) with multiple measurements taken on each device. Mean differences between pairs of devices were estimated along with limits of agreement from Bland-Altman plots. Sensitivity analyses were carried out using alternative exclusion criteria and summary measures, and using multilevel models to estimate mean differences.

Results

The mean difference between sphygmomanometers was 3.9mmHg for systolic blood pressure (95% Confidence Interval (CI):2.5,5.2) and 1.4mmHg for diastolic blood pressure (95% CI:0.3,2.4), with the Omron HEM-907 measuring higher. For maximum grip strength, the mean difference when either one of the electronic dynamometers was compared with either the hydraulic or spring-gauge device was 4-5kg, with the electronic devices measuring higher. The differences were small when comparing the two electronic devices (difference = 0.3kg, 95% CI:-0.9,1.4), and when comparing the hydraulic and spring-gauge devices (difference = 0.2kg, 95% CI:-0.8,1.3). In all cases limits of agreement were wide. The mean difference in FEV1 between spirometers was close to zero (95% CI:-0.03,0.03), limits of agreement were reasonably narrow, but a difference of 0.47l was observed for FVC (95% CI:0.53,0.42), with the ndd Easy on-PC measuring higher.

Conclusion

Our study highlights potentially important differences in measurement of key functions when different devices are used. These differences need to be considered when interpreting results from modelling intra-individual changes in function and when carrying out cross-study comparisons, and sensitivity analyses using correction factors may be helpful.

Introduction

Blood pressure, grip strength and lung function are commonly assessed in longitudinal population studies. All three are non-invasive measures of physiological function that are practical for a nurse or interviewer to administer in a home or clinical setting using portable equipment. They avoid the subjectivity of self-reports of health, enable researchers and clinicians to track changes in health and function over the life course [1] and are important biomarkers of healthy ageing [2]. Their repeat assessment within longitudinal studies, and inclusion in many studies, facilitates comparisons over time and across ages and cohorts [3,4].

Although there have been a number of initiatives to encourage standardisation of these measures [57], different devices have been adopted by different studies for a variety of practical reasons [8,9]. Furthermore, the device used within a long-running longitudinal study will often need to change over time as obsolete or outdated models are replaced with devices that are more technologically advanced and improve or extend measurement, are less costly, more portable or easier to use. Because devices of this kind are only subject to moderate regulation [10,11], the measures obtained from different makes and models of device are unlikely to be equivalent. This has important implications for research which either compares findings across studies or considers change in function longitudinally. For example, in a study modelling age-related changes in blood pressure across the life course which used data from eight British longitudinal studies, switching from a manual sphygmomanometer to an automated device, without correction for the difference in measurement, resulted in a steeper increase in mean trajectory of systolic blood pressure [4]. Similarly, artefactual findings attributable to a change in device have been observed in studies of lung function [12,13]. Indeed, concerns about potential differences in measures due to differences in spirometry devices have contributed to study investigators in the UK discouraging within- and cross-study analyses [14,15].

There are existing studies which have shown differences between devices used to measure blood pressure [1620], grip strength [2124] and lung function [6,12,13,2527], but these have not yet compared all the devices commonly used in cohort and longitudinal population studies in the UK and many other countries. Further, these are only occasionally discussed in the context of both within- and between-study comparisons. To address this gap, a randomised cross-over trial was undertaken to compare measurements between devices used to assess blood pressure, grip strength and lung function commonly used in UK longitudinal population studies within the CLOSER consortium [28].

Methods

Study design and sample

For each of blood pressure, grip strength and lung function, a randomised cross-over study was carried out, so as to make within-person measurement comparisons. The study was conducted following established (CONSORT) guidelines [29]. The target sample, based on sample size calculations (S1 Appendix), was 120 men and women from the general population aged 45 to 74 years comprising 20 men and 20 women from each of three age groups (45–54, 55–64, 65–74). Participants were drawn from a list of individuals who had participated in a market research study, consented to be re-contacted for research purposes, and were living in London and the South East of England. An invitation letter and information sheet was sent and this was followed-up with a telephone recruitment process including assessment of health-related exclusion criteria (S1 Appendix). Eligible participants were then invited to attend a face-to-face assessment and each participant was measured on every machine (Table 1) at a single assessment visit.

All 90-minute face-to-face assessments took place in central London between October 2015 and January 2016 and were conducted by one of seven researchers who were trained and tested in all relevant protocols. All participants gave informed, written consent. The analytical dataset was pseudo anonymised with each participant given a study number so that individuals could not be identified. Ethical approval for data collection was given by University College London (UCL) (Ethics Project Number: 6338/001) and, for analysis, by the University of Southampton (Ethics Project Number: 18498). Participants received feedback on their results, advice to contact their General Practitioner if their blood pressure was elevated, and a gift voucher.

During the assessment, each participant was assessed in the sequence shown in Table 2. Blood pressure was measured consecutively on each device and the remaining measures were ordered to ensure that there was sufficient time between the four grip strength and two spirometry measurements to avoid participant fatigue. Multiple measurements were recorded on each device as would be done in survey research. Height and weight were also measured and a short self-completion questionnaire was administered (S2 Appendix).

For each of the three measures, the order of devices was determined before fieldwork began, using computer-generated random numbers within each age-sex strata. Individuals were randomly allocated to one of two possible orders of blood pressure and lung function devices and to one of 24 possible orders of grip strength devices.

Blood pressure, grip strength and lung function measurement

Standardised measurement protocols were used as follows. For blood pressure, the participant was asked to sit on a chair with legs uncrossed and their right arm resting comfortably, palm up, on a table, with the sphygmomanometers positioned so that they could not see the display. The participant was asked to expose their right arm, making sure that rolled up sleeves did not restrict circulation and that any watches or bracelets had been removed and, the sphygmomanometer cuff was then positioned over the brachial artery. After 3 minutes of quiet rest, 3 readings with a minute’s rest between each reading were recorded using the first device. The device was then changed and after a further 2 minutes rest, 3 readings were taken using the second device. There was no talking until three readings on both devices had been completed.

Grip strength assessment was based on a published measurement protocol [30]. While seated in a chair with fixed arms, participants were asked to place their forearm on the arm of the chair in the mid-prone position (the thumb facing up) with their wrist just over the end of the arm of the chair in a neutral but slightly extended position. Adjustments were made to each dynamometer to accommodate different hand sizes according to the make and model of the device. On hearing the words “And Go”, the participant was encouraged, through strong verbal instruction, to squeeze as hard as possible for a few seconds until told to stop. For each device, two measurements were carried out in each hand in the sequence Left-Right-Left-Right. The value on the display was recorded to the nearest 0.1kg for the Jamar Plus+ and Nottingham Electronic, to the nearest 0.5kg for the Smedley and to the nearest 1kg for the Jamar Hydraulic.

Lung function measurements adhered to the American Thoracic Society/European Respiratory Society (ATS/ERS) lung function protocol [6]. The procedure was explained and demonstrated, and the participant then had a practice blow without completely emptying their lungs. All measurements were carried out with the participant standing unless they felt unable to do so. During measurement, maximum effort was encouraged verbally. In addition, the ndd Easy on-PC was linked to a laptop which showed a cartoon of a child blowing up a balloon. This represents a real-time trace and as the participant is encouraged to exhale until the balloon pops this helps ensure a maximal FVC is achieved. After each trial the researcher recorded whether it satisfied the protocol, for example a trial was classified as not valid if the participant did not form a tight seal around the mouthpiece or coughed during the procedure, and in these instances, feedback was provided before the next attempt. Participants had up to five attempts to produce three valid measurements of lung function from each spirometer.

Readings for blood pressure, grip strength and lung function using the Micro Medical spirometer were data entered twice, independently, and compared to ensure accuracy. Lung function readings taken using the ndd Easy on-PC spirometer were downloaded directly from the laptop.

Other measures

Height was measured using a portable Marsden Leicester stadiometer and weight using Tanita 352 scales according to standardised procedures, from which body mass index was calculated as weight (kg)/height (m)2. Responses to the self-completion questionnaire provided additional information on: age at completing full-time education, self-rated health, smoking history, medication use and musculoskeletal, cardiovascular and respiratory conditions which might influence performance on the functional tests (S2 Appendix).

Primary outcome measures

For the purposes of the main analyses, outcomes commonly used in epidemiological research were derived. The mean of the second and third readings of systolic blood pressure and diastolic blood pressure in millimeters of mercury (mmHg) were used. For grip strength, the maximum of the four readings in kilograms (kg) was used. For lung function, the maximum forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC) in millilitres (ml) from the highest quality readings (quality A or B) were used. Quality grade A was when 3 or more acceptable tests were achieved with repeatability within 100 ml, and B when 3 acceptable tests were achieved with repeatability within 150 ml, as per ATS/ERS criteria [6].

Statistical analyses

We described relevant characteristics by randomisation group for each measure. For each device we estimated the reliability using intraclass correlations (or Rho) and within-person standard deviations using a variance-components model [31]. To investigate order effects we used two sample t-tests to compare the difference in mean values between groups with the measurements carried out in one sequence (device A followed by device B) compared with the opposite order (BA). For grip strength where 4 devices were tested, 6 pairwise comparisons were made, ignoring the exact placement of devices within the sequence.

We calculated the differences in measurement between pairs of devices then assessed the mean within-person differences between pairs of devices using paired t-tests. The assumption that the mean differences were normally distributed was checked by plotting histograms, and Bland-Altman plots (the difference between measures versus the average of the measures from the two devices for each individual) were used to assess whether the variation was dependent on the magnitude of the measurements [32,33]. The mean difference in values between the two devices, and the 95% limits of agreement, which give the range in which we would expect 95% of future differences in measurements between the two devices to lie, were plotted [33,34].

We also performed a series of sensitivity analyses to test the robustness of the results. We repeated analyses having: (i) excluded measurements where the devices were administered in the incorrect order (n = 2 for blood pressure, n = 5 for grip strength and n = 1 for lung function); (ii) removed extreme outliers identified using scatter plots (n = 1 for blood pressure and n = 2 for grip strength) and; (iii) used alternative outcome definitions commonly used in analyses. For blood pressure, we considered the mean of three readings [35] and the second reading only [36] and for grip strength, the mean of the four readings [37,38]. For lung function, we used the highest reading of FEV1 and FVC drawn from all available readings irrespective of whether they adhered to the ATS/ERS quality criteria.

Finally, we used multilevel modelling, as an alternative statistical approach, to estimate the differences between devices, using all available readings rather than a summary measure, in order to account for variance between readings. The models treat the repeated readings as Level 1 and the individual as Level 2 to account for non-independence of measurements from the same person. Model 1 included device treated as a fixed effect. Model 2 also included covariates to account for the order in which the devices were administered and the position of the reading in the sequence (1 to 3 for blood pressure, 1 or 2 for the dominant and non-dominant hands for grip strength, and 1 to 5 for lung function). Model 3 was additionally adjusted for age, sex and, for blood pressure only, body mass index.

Data cleaning and management were carried out using Excel, IBM-SPSS Version 22 and STATA 14.0 and analyses were conducted using STATA 15.0.

Results

During fieldwork, 118 assessments were completed, with 18–21 participants in each of the age-sex strata (S1 Table). Of the seven researchers, three carried out 20–30 assessments, two carried out 10–20 assessments and two carried out fewer than ten assessments.

The socio-demographic characteristics of the randomised groups were reasonably well balanced as were key aspects of cardiovascular, musculoskeletal and respiratory health (Tables 3 and 4). The reliability of every device was good. The intra-cluster correlations were lowest for blood pressure (0.89–0.94), due to the acknowledged within-person variation in this measure (S2 Table). The values for grip strength of dominant hand were above 0.95 for all devices except the Smedley dynamometer (0.92). Reliability was best for lung function (≥0.96), where within-person standard deviations were small. Reliability was slightly better when including only assessments adhering to the ATS/ERS quality criteria because two measures must be within 150ml of each other. There was no evidence of order effects for blood pressure or lung function. For grip strength, there was evidence of an order effect for the comparison between the Nottingham Electronic and Smedley dynamometers (difference = -3.08kg (95% CI = -5.93, -0.23, p = 0.03) (S3 Table). Histograms show that for all three measures, the mean differences between devices were approximately normally distributed (S1 Fig).

thumbnail
Table 3. Characteristics of the study population by first device used (N = 118).

https://doi.org/10.1371/journal.pone.0289052.t003

thumbnail
Table 4. Cardiovascular, musculoskeletal and respiratory health status of the study population by first device used (N = 118).

https://doi.org/10.1371/journal.pone.0289052.t004

Blood pressure

Three participants were excluded from analyses due to missing readings leaving 115 for analysis. The mean difference in SBP between the two devices was 3.9mmHg (95% CI: 2.5, 5.2, p<0.001) and for DBP was 1.4mmHg (95% CI: 0.3, 2.4, p = 0.1), with the Omron HEM-907 measuring higher than the Omron 705-CP (Table 5). The Bland-Altman plots showed that as blood pressure increased, the difference between the two devices remained approximately constant (Figs 1 and 2). The limits of agreement were wide, being -10.6 to 18.3mmHg for SBP and -9.8 to 12.5mmHg for DBP.

thumbnail
Fig 1. Bland Altman plot for SBP.

Plot of the difference in mean SBP (mmHg) between the Omron 705-CP and Omron HEM-907 by the average SBP with 95% limits of agreement.

https://doi.org/10.1371/journal.pone.0289052.g001

thumbnail
Fig 2. Bland Altman plot for DBP.

Plot of the difference in mean DBP (mmHg) between the Omron 705-CP and Omron HEM-907 by the average DBP with 95% limits of agreement.

https://doi.org/10.1371/journal.pone.0289052.g002

thumbnail
Table 5. Differences in mean and limits of agreement for each pair of devices used to measure blood pressure, grip strength and lung function.

https://doi.org/10.1371/journal.pone.0289052.t005

Grip strength

All 118 participants were included in the analyses. There was no evidence of a difference in mean maximum grip strength when comparing the two electronic dynamometers, the Nottingham Electronic and Jamar Plus+ (difference = 0.3kg (95% CI: -0.9, 1.4, p = 0.6), or when comparing the hydraulic and spring-gauge dynamometers, the Jamar Hydraulic and Smedley (difference = 0.2kg (95%CI:-0.8, 1.3, p = 0.7). However, there were mean differences in maximum grip strength of between 4 and 5kg when comparing either of the electronic dynamometers with either the hydraulic or spring-gauge dynamometer (Table 5). The limits of agreement varied depending on the pair of devices being compared, for example, these were narrower (-2.0 and 10.1 kg) when comparing the Jamar Plus+ and Jamar Hydraulic but very wide (-10.6 and 20.5 kg) when comparing the Nottingham Electronic and Smedley dynamometers. Even in cases where the mean difference was near zero, the limits of agreement indicated substantial differences in measurement between devices. The Bland-Altman plots (Figs 38) showed that for the comparisons of the Smedley dynamometer with all other devices, the difference increased at higher magnitudes of mean grip strength (Figs 4, 6 and 8).

thumbnail
Fig 3. Bland Altman plots of grip strength (Jamar Plus+–Nottingham).

Plot of the difference in maximum grip strength (kg) between devices by average maximum grip strength on both devices with 95% limits of agreement.

https://doi.org/10.1371/journal.pone.0289052.g003

thumbnail
Fig 4. Bland Altman plots of grip strength (Jamar Hydraulic–Smedley).

Plot of the difference in maximum grip strength (kg) between devices by average maximum grip strength on both devices with 95% limits of agreement.

https://doi.org/10.1371/journal.pone.0289052.g004

thumbnail
Fig 5. Bland Altman plots of grip strength (Jamar Plus+–Jamar Hydraulic).

Plot of the difference in maximum grip strength (kg) between devices by average maximum grip strength on both devices with 95% limits of agreement.

https://doi.org/10.1371/journal.pone.0289052.g005

thumbnail
Fig 6. Bland Altman plots of grip strength (Jamar Plus+–Smedley).

Plot of the difference in maximum grip strength (kg) between devices by average maximum grip strength on both devices with 95% limits of agreement.

https://doi.org/10.1371/journal.pone.0289052.g006

thumbnail
Fig 7. Bland Altman plots of grip strength (Nottingham–Jamar Hydraulic).

Plot of the difference in maximum grip strength (kg) between devices by average maximum grip strength on both devices with 95% limits of agreement.

https://doi.org/10.1371/journal.pone.0289052.g007

thumbnail
Fig 8. Bland Altman plots of grip strength (Nottingham–Smedley).

Plot of the difference in maximum grip strength (kg) between devices by average maximum grip strength on both devices with 95% limits of agreement.

https://doi.org/10.1371/journal.pone.0289052.g008

Lung function

Twelve participants had missing lung function measures and just under a third (n = 32 for FEV1 and n = 39 for FVC) of the remaining participants were excluded because there were no readings of a sufficiently high quality. There was no evidence of a difference in mean FEV1 between devices (difference = 0.00 litres (95% CI:-0.03,0.03, p = 0.9)) but there was evidence of a difference in FVC (-0.47 litres (95% CI:-0.53,-0.42, p<0.001)) with the ndd Easy on-PC measuring higher than the Micro Medical (Table 5). The Bland-Altman plots suggested that for FEV1, the difference between the two devices was approximately constant as measurements increased and close to zero (Fig 9) with reasonably narrow limits of agreement (-0.25 and 0.25 litres). The plot for FVC suggested that the difference between devices remained constant as values of FVC increased (Fig 10) but the limits of agreement were wider (-0.92 and -0.03).

thumbnail
Fig 9. Bland Altman plot for FEV1.

Plot of the difference in mean maximum FEV1 between the Micro Medical and ndd Easy on-PC ultrasonic flow-sensor by average maximum FEV1 on both devices with 95% limits of agreement.

https://doi.org/10.1371/journal.pone.0289052.g009

thumbnail
Fig 10. Bland Altman plot for FVC.

Plot of the difference in mean maximum FVC between the Micro Medical and ndd Easy on-PC ultrasonic flow-sensor by average maximum FVC on both devices with 95% limits of agreement.

https://doi.org/10.1371/journal.pone.0289052.g010

Sensitivity analyses

When we repeated the analyses having excluded measurements where the devices were administered in the incorrect order (n = 8), removed outliers (n = 3), included the lung function readings that did not meet ATS/ERS criteria (n = 32 for FEV1 and n = 39 for FVC), and used alternative definitions of outcomes, there were only small changes in the estimated differences between devices such that the conclusions were unaltered (S4 Table). The only differences found were a small number of additional order effects (S5 Table), but these had no impact on the findings when order of device was controlled for through multilevel analysis. Indeed, when the data were reanalysed using multilevel models, the estimates of differences between devices showed only marginal changes, though the standard errors were reduced (S6S8 Tables).

Discussion

In a randomised cross-over study of 118 adults aged 45–74 years, we found evidence of differences in measurement of blood pressure, grip strength and lung function when assessed using different devices. For blood pressure, the newer Omron HEM-907 measured higher than the older Omron 705-CP with wide limits of agreement. For grip strength, the two electronic dynamometers recorded measurements on average 4-5kg higher than either the hydraulic or the spring-gauge dynamometer, but there were only small mean differences when comparing the two electronic dynamometers or the hydraulic and spring-gauge dynamometers. However, limits of agreement were wide for all comparisons. For lung function, the ndd Easy on-PC measures of FVC were an average of 0.47 litres higher than those for the Micro Medical, but there was no difference between measures of FEV1 and the limits of agreement were reasonably narrow.

We are aware of only a few studies that have compared combinations of dynamometers previously. For example, King [21] compared the Jamar Hydraulic with the Jamar Plus+ dynamometer and, in contrast to our findings, reported that the electronic dynamometer had consistently lower readings than the hydraulic device and narrower. However, the study population was younger, with an average age of 32 years, comprising a convenience sample of 40 men and women and may have better function than our older sample which could influence comparability across machines. Another study reported a difference of 3.2kg (limits of agreement -6.3 to 12.6) when comparing the Smedley dynamometer and the Jamar Hydraulic dynamometer, which contrasts with our finding of a smaller mean difference (0.2kg) but wider limits of agreement (-10.8 to 11.3) [22]. However, this other study was carried out in an older, smaller sample of 55 participants aged 65–99 years recruited from a retirement home and social day care centre. Another study [23], found that the Smedley dynamometer measured lower than the Jamar+ Digital, similar to our study, although in this other study there were other potentially important variations in measurement protocol–measures using the Smedley device were undertaken in a standing position and those using the Jamar device were undertaken seated. Our findings provide some reassurance that there is a lack of bias in measurement between specific device combinations (i.e. the Jamar Plus+ and Nottingham electronic; the Jamar Hydraulic and Smedley), although the limits of agreement suggest that the variation can still be substantial.

We have not identified a comparison of Micro Medical or other turbine spirometers with the ndd Easy on-PC spirometer. However, in a study of 35 volunteers, the Micro Medical turbine spirometer, used in our study, gave lower readings compared with the Vitalograph Micro pneumotachograph spirometer [13], for both FEV1 (mean difference of 0.24l) and FVC (0.34l). Another study of 49 volunteers found that the handheld ndd Easy on-PC spirometer produced systematically lower values than a pneumotachograph spirometer (Masterscreen) [25], for both FEV1 (mean difference of 0.24l) and for FVC (0.37l).

For lung function, the accuracy of measurement relies primarily on optimal coaching: maximally deep breath, a rapid blast and appropriate encouragement as well as a full seal around the mouthpiece and correct body posture [6]. The ndd Easy on-PC spirometer presents visualisation of the volume-time graph in real time, meaning that the participant can be encouraged to blow until the curve has reached a plateau, that is, when the true FVC has been achieved. In the absence of this visual display the forced manoeuvre may be terminated prematurely, and the FVC underestimated. We propose that this is the most likely explanation for the substantially higher FVC values obtained using the ndd Easy on-PC device than the Micro Medical device in our study, while there was no difference for FEV1. For FEV1 the mean difference between the 2 spirometers was zero and are, therefore, within the 150ml ATS/ERS criteria for replication of measurement. In addition, the limits of agreement did not exceed the 350ml criterion set in previous spirometry studies [27]. Whether using a group correction for FVC is valid, however, remains debateable as in the SAPALDIA study, a group correction from a quasi-experimental study was found not to be adequate, and an approach using spirometer-specific reference equations from longitudinal measurements to describe individualised corrections terms was preferred [12].

In considering the potential clinical significance of the differences between devices, we have referred to published normative or predicted values of blood pressure, grip strength and lung function [3,39,40]. Based on analysis of age-related differences in mean blood pressure in the Health Survey for England 2016, the mean differences in SBP and DBP between devices that we observed are equivalent to an age difference of approximately five years, although the possible non-linearity of change with age in diastolic blood pressure across the age range of interest [41] that comparison more difficult. Further, the within-person standard deviation for systolic blood pressure is larger than the mean difference between devices. For grip strength [3] the observed 4-5kg difference in grip strength is equivalent to an age difference of approximately 5 years among men and 10 years among women aged 65 years and above. For lung function, based on the National Health and Nutrition Examination Survey (NHANES) III data [42], predicted values for five-year age-groups (with male height of 175cm and female height of 160cm), show that a difference of 0.47l in FVC is equivalent to an age difference of around 15 years, between 45–75 years. Therefore, together with the wide limits of agreement and good measurement reliability for each device, the difference that we observed between devices are likely to have important practical implications for both grip strength and lung function. For example, the differences in dynamometers may result in discrepancies in clinical diagnoses which use cut-points when identifying an individual as sarcopenic [43]. Similarly, the difference in FVC, but not FEV1, between machines will have implications for defining participants with COPD based on the ratio FEV1/FVC.

Maintaining consistency in the make and model of device used in studies reduces the likelihood of measurement differences, but is not always realistic given that equipment becomes obsolete and new technology can improve measurement, for example through automation (as is the case with the Omron 907), the transition from analogue to digital (as is the case with the transition from the Jamar hydraulic to Jamar Plus+ devices) or the introduction of visual encouragement and specific feedback (as provided by the ndd Easy on-PC). An important implication of our findings is that it would be advisable for researchers, therefore, to include simple experiments to assess machine comparability when a new device is introduced into a study. Conducting external comparison studies, such as ours, would also help interpretation for both within-study and between-study comparisons. In addition, the differences between devices need to be considered in the context of reliability of measurements for each device being compared. Our analysis showed good reliability of measurements, particularly for the dynamometers and spirometers, suggesting the differences observed are important. The ATS/ERS quality control for lung function ensures excellent reliability, but does result in exclusion of those who cannot meet the criteria.

A key strength of this study design was that it used the same standardised measurement protocols for all devices, which is important, as for all three functional measures, the type of device is only one of several factors which can affect measurements unless these other factors are kept constant as in our study. Blood pressure is affected by multiple factors [10] including the participant talking, actively listening, being exposed to cold, ingesting alcohol, having a distended bladder, recent smoking [44] and also to measurement protocols such as arm position and cuff size [45]. For grip strength, the values and precision of measurements have also been shown to be influence by a range of factors [30,37] including whether allowance is made for hand size and hand-dominance [46], dynamometer handle shape [47], position of the elbow [48] and wrist during testing [49], setting of the dynamometer [50,51], effort and encouragement, frequency of testing and time of day and training of the assessor [30,51]. The study also included a relatively large sample size, based on a priori sample size calculation, compared with other similar studies, and implemented a randomised design. While confidence in the results rests primarily on this randomised design [29], the fact that participants were drawn from a large database of members of the public, who had been involved in previous market research and consented to be re-contacted, suggests they may be more representative of the general population than the small-scale volunteer samples used in many previous studies. We also acknowledge the limitations of the study. The study findings cannot be generalised beyond the parameters of the research design; for example, results might differ for those outside the sampled age range (i.e., 45 to 74), and while the trial compared devices most commonly used in UK population-based studies, no comment can be made about device combinations which were not included [15]. While standardising the measurement protocols was an important aspect of the research design, it meant deviating from the protocol for the Smedley dynamometer (normally assessed standing rather than sitting) and so may limit the applicability of the findings for this device [30]. Furthermore, in the primary analyses of lung function, a number of participants were excluded due to missing or low-quality readings, particularly on the ndd Easy on-PC, thus reducing the sample size and power of these analyses. Nevertheless, sensitivity analyses using all available readings, irrespective of quality, suggested that this did not have a big impact on findings. Indeed, sensitivity analyses considering outliers, incorrectly ordered tests and alternative coding of measures, all showed that our results were robust. Assessor may be a source of variation in our study which we have not accounted for, although this variation was minimised by the consistent training and protocol, and is not likely to have had a substantial impact on differences between devices since this was a within-person comparison and the same researcher assessed the same person on all machines.

In conclusion, this randomised cross-over study showed measurement differences between devices commonly used to assess blood pressure, grip strength and lung function which researchers should be aware of when carrying out comparative research between studies and within studies over time.

Supporting information

S1 Table. Sample size by age group and sex.

https://doi.org/10.1371/journal.pone.0289052.s001

(DOCX)

S2 Table. Reliability of the devices included in the study.

https://doi.org/10.1371/journal.pone.0289052.s002

(DOCX)

S3 Table. Assessment of order effects for all measures.

https://doi.org/10.1371/journal.pone.0289052.s003

(DOCX)

S4 Table. Sensitivity analysis for differences in mean and limits of agreement for all measures.

https://doi.org/10.1371/journal.pone.0289052.s004

(DOCX)

S5 Table. Sensitivity analysis of order effects for all measures.

https://doi.org/10.1371/journal.pone.0289052.s005

(DOCX)

S6 Table. Sensitivity analysis using multilevel models for blood pressure.

https://doi.org/10.1371/journal.pone.0289052.s006

(DOCX)

S7 Table. Sensitivity analysis using multilevel models for grip strength.

https://doi.org/10.1371/journal.pone.0289052.s007

(DOCX)

S8 Table. Sensitivity analysis using multilevel models for lung function.

https://doi.org/10.1371/journal.pone.0289052.s008

(DOCX)

S1 Fig. Histograms of mean differences in SBP (mmHg), DBP (mmHg), grip strength (kg) and lung function (FEV1 and FVC, litres) for all device combinations.

https://doi.org/10.1371/journal.pone.0289052.s009

(DOCX)

Acknowledgments

We thank the study participants, Kantar who provided access to the study sample, George Kyriakopoulos, now at BP (British Petroleum), for providing study advice, and Shaun Scholes, at UCL, for assistance with the analysis of Health Survey for England data referred to in the discussion.

References

  1. 1. Kuh D, Cooper R, Hardy R, Richards M, Ben-Shlomo Y. A life course approach to healthy ageing. Oxford: Oxford University Press; 2014.
  2. 2. Lara J, Cooper R, Nissan J, Ginty AT, Khaw K-T, Deary IJ, et al. A proposed panel of biomarkers of healthy ageing. BMC Medicine. 2015;13(1):222. pmid:26373927
  3. 3. Dodds RM, Syddall HE, Cooper R, Benzeval M, Deary IJ, Dennison EM, et al. Grip strength across the life course: normative data from twelve British studies. PLoS One. 2014;9(12):e113637. pmid:25474696
  4. 4. Wills AK, Lawlor DA, Matthews FE, Sayer AA, Bakra E, Ben-Shlomo Y, et al. Life course trajectories of systolic blood pressure using longitudinal data from eight UK cohorts. PLoS Med. 2011;8(6):e1000440. pmid:21695075
  5. 5. Standardization of methods of measuring the arterial blood pressure. A joint report of the committees appointed by the Cardiac Society of Great Britain and Ireland and the American Heart Association. 1939;1(3):261–7.
  6. 6. Miller MR, Hankinson J, Brusasco V, Burgos F, Casaburi R, Coates A, et al. Standardisation of spirometry. European Respiratory Journal. 2005;26(2):319–38. pmid:16055882
  7. 7. Reuben DB, Magasi S, McCreath HE, Bohannon RW, Wang YC, Bubela DJ, et al. Motor assessment using the NIH Toolbox. Neurology. 2013;80(11 Suppl 3):S65–75. pmid:23479547
  8. 8. Goisis A, Brown M, Kumari M, Sullivan A. Overview of bio measures in longitudinal and life course research. 2014.
  9. 9. Tolonen H, Koponen P, Naska A, Männistö S, Broda G, Palosaari T, et al. Challenges in standardization of blood pressure measurement at the population level. BMC Medical Research Methodology. 2015;15(1):33. pmid:25880766
  10. 10. Jones DW, Appel LJ, Sheps SG, Roccella EJ, Lenfant C. Measuring blood pressure accurately: new and persistent challenges. JAMA. 2003;289(8):1027–30. pmid:12597757
  11. 11. Mohandas A, Foley KA. Medical devices: adapting to the comparative effectiveness landscape. Biotechnol Healthc. 2010;7(2):25–8. pmid:22478818
  12. 12. Bridevaux PO, Dupuis-Lozeron E, Schindler C, Keidel D, Gerbase MW, Probst-Hensch NM, et al. Spirometer Replacement and Serial Lung Function Measurements in Population Studies: Results From the SAPALDIA Study. Am J Epidemiol. 2015;181(10):752–61. pmid:25816817
  13. 13. Orfei L, Strachan DP, Rudnicka AR, Wadsworth M. Early influences on adult lung function in two national British cohorts. Archives of disease in childhood. 2008;93(7):570–4. pmid:17626144
  14. 14. Craig R, Mindell J. Health Survey for England 2010. Respiratory health. 2011.
  15. 15. McFall S, Petersen J, Kaminska O, Lynn P. Understanding Society Waves 2 and 3 Nurse Health Assessment, 2010–2012. Guide to Nurse Health Assessment ISER, University of Essex. 2014.
  16. 16. Stang A, Moebus S, Mohlenkamp S, Dragano N, Schmermund A, Beck EM, et al. Algorithms for converting random-zero to automated oscillometric blood pressure values, and vice versa. Am J Epidemiol. 2006;164(1):85–94. pmid:16675536
  17. 17. Campbell NRC, McKay DW. Accurate blood pressure measurement. Why does it matter? 1999;161(3):277–8.
  18. 18. Wan Y, Heneghan C, Stevens R, McManus RJ, Ward A, Perera R, et al. Determining which automatic digital blood pressure device performs adequately: a systematic review. J Hum Hypertens. 2010;24(7):431–8. pmid:20376077
  19. 19. Skirton H, Chamberlain W, Lawson C, Ryan H, Young E. A systematic review of variability and reliability of manual and automated blood pressure readings. J Clin Nurs. 2011;20(5–6):602–14. pmid:21320189
  20. 20. Bolling K. The Dinamap 8100 calibration study: HM Stationery Office; 1994.
  21. 21. King TI, 2nd. Interinstrument reliability of the Jamar electronic dynamometer and pinch gauge compared with the Jamar hydraulic dynamometer and B&L Engineering mechanical pinch gauge. Am J Occup Ther. 2013;67(4):480–3.
  22. 22. Guerra RS, Amaral TF. Comparison of hand dynamometers in elderly people. J Nutr Health Aging. 2009;13(10):907–12. pmid:19924352
  23. 23. Kim M, Shinkai S. Prevalence of muscle weakness based on different diagnostic criteria in community-dwelling older adults: A comparison of grip strength dynamometers. Geriatr Gerontol Int. 2017;17(11):2089–95. pmid:28517036
  24. 24. Svens B, Lee H. Intra- and inter-instrument reliability of Grip-Strength Measurements: GripTrack™ and Jamar® hand dynamometers. The British Journal of Hand Therapy. 2005;10(2):47–55.
  25. 25. Milanzi EB, Koppelman GH, Oldenwening M, Augustijn S, Aalders-de Ruijter B, Farenhorst M, et al. Considerations in the use of different spirometers in epidemiological studies. Environ Health. 2019;18(1):39. pmid:31023382
  26. 26. Hosie HE, Nimmo WS. Measurement of FEV1 and FVC. Comparison of a pocket spirometer with the Vitalograph. Anaesthesia. 1988;43(3):233–8.
  27. 27. Gerbase MW, Dupuis-Lozeron E, Schindler C, Keidel D, Bridevaux PO, Kriemler S, et al. Agreement between spirometers: a challenge in the follow-up of patients and populations? Respiration. 2013;85(6):505–14. pmid:23485575
  28. 28. O’Neill D, Benzeval M, Boyd A, Calderwood L, Cooper C, Corti L, et al. Data resource profile: cohort and longitudinal studies enhancement resources (CLOSER). International journal of epidemiology. 2019;48(3):675–6i. pmid:30789213
  29. 29. Schulz KF, Altman DG, Moher D, Group C. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. Trials. 2010;11(1):32.
  30. 30. Roberts HC, Denison HJ, Martin HJ, Patel HP, Syddall H, Cooper C, et al. A review of the measurement of grip strength in clinical and epidemiological studies: towards a standardised approach. Age and ageing. 2011;40(4):423–9. pmid:21624928
  31. 31. Rabe-Hesketh S, Skrondal A. Multilevel and longitudinal modeling using Stata: STATA press; 2008.
  32. 32. Bland JM, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. The lancet. 1986;327(8476):307–10. pmid:2868172
  33. 33. Chhapola V, Kanwal SK, Brar R. Reporting standards for Bland–Altman agreement analysis in laboratory research: a cross-sectional survey of current practice. Annals of Clinical Biochemistry. 2015;52(3):382–6. pmid:25214637
  34. 34. Giavarina D. Understanding bland altman analysis. Biochemia medica: Biochemia medica. 2015;25(2):141–51. pmid:26110027
  35. 35. Chobanian AV, Bakris GL, Black HR, Cushman WC, Green LA, Izzo JL Jr, et al. Seventh report of the joint national committee on prevention, detection, evaluation, and treatment of high blood pressure. hypertension. 2003;42(6):1206–52. pmid:14656957
  36. 36. Hardy R, Wadsworth ME, Langenberg C, Kuh D. Birthweight, childhood growth, and blood pressure at 43 years in a British birth cohort. International Journal of Epidemiology. 2004;33(1):121–9. pmid:15075157
  37. 37. Sousa-Santos A, Amaral T. Differences in handgrip strength protocols to identify sarcopenia and frailty-a systematic review. BMC geriatrics. 2017;17(1):238. pmid:29037155
  38. 38. Mathiowetz V, Weber K, Volland G, Kashman N. Reliability and validity of grip and pinch strength evaluations. Journal of Hand Surgery. 1984;9(2):222–6. pmid:6715829
  39. 39. Scholes S, Neave A. Health Survey for England 2016: Physical activity in adults. Leeds: Health and Social Care Information Centre. 2017.
  40. 40. NHANES normative values [Available from: https://vitalograph.com/resources/nhanes-normal-values.
  41. 41. Mutz J, Lewis CM. Lifetime depression and age-related changes in body composition, cardiovascular function, grip strength and lung function: sex-specific analyses in the UK Biobank. Aging (Albany NY). 2021; 3:17038–17079. pmid:34233295
  42. 42. Thomas ET, Guppy M, Straus SE, Bell KJ, Glasziou P. Rate of normal lung function decline in ageing adults: a systematic review of prospective cohort studies. BMJ open. 2019;9(6):e028150. pmid:31248928
  43. 43. Cooper R, Lessof C, Wong A, Hardy R. The impact of variation in the device used to measure grip strength on the identification of low muscle strength: Findings from a randomised cross-over study. Journal of Frailty, Sarcopenia and Falls. 2021;6(4):225–230. pmid:34950813
  44. 44. Handler J. The importance of accurate blood pressure measurement. The Permanente Journal. 2009;13(3):51. pmid:20740091
  45. 45. Bilo G, Sala O, Perego C, Faini A, Gao L, Głuszewska A, et al. Impact of cuff positioning on blood pressure measurement accuracy: may a specially designed cuff make a difference? Hypertens Res. 2017;40(6):573–80. pmid:28077860
  46. 46. Incel NA, Ceceli E, Durukan PB, Erdem HR, Yorgancioglu ZR. Grip strength: effect of hand dominance. Singapore medical journal. 2002;43(5):234–7. pmid:12188074
  47. 47. Amaral JF, Mancini M, Novo Júnior JM. Comparison of three hand dynamometers in relation to the accuracy and precision of the measurements. Brazilian Journal of Physical Therapy. 2012;16(3):216–24. pmid:22801514
  48. 48. Balogun JA, Akomolafe CT, Amusa LO. Grip strength: effects of testing posture and elbow position. Archives of physical medicine and rehabilitation. 1991;72(5):280–3. pmid:2009042
  49. 49. O’Driscoll SW, Horii E, Ness R, Cahalan TD, Richards RR, An K-N. The relationship between wrist position, grasp size, and grip strength. The Journal of hand surgery. 1992;17(1):169–77. pmid:1538102
  50. 50. Firrell JC, Crain GM. Which setting of the dynamometer provides maximal grip strength? The Journal of hand surgery. 1996;21(3):397–401. pmid:8724468
  51. 51. Fess E. Clinical assessment recommendations. American society of hand therapists. 1981:6–8.