Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Factorial validity and invariance of the Patient Health Questionnaire (PHQ)-9 among clinical and non-clinical populations

  • Satomi Doi ,

    Roles Formal analysis, Writing – original draft

    doi.hlth@tmd.ac.jp

    Affiliations Department of Global Health Promotion, Tokyo Medical and Dental University, Tokyo, Japan, National Center of Neurology and Psychiatry, National Center for Cognitive Behavior Therapy and Research, Tokyo, Japan

  • Masaya Ito,

    Roles Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – review & editing

    Affiliation National Center of Neurology and Psychiatry, National Center for Cognitive Behavior Therapy and Research, Tokyo, Japan

  • Yoshitake Takebayashi,

    Roles Formal analysis, Methodology, Writing – review & editing

    Affiliation National Center of Neurology and Psychiatry, National Center for Cognitive Behavior Therapy and Research, Tokyo, Japan

  • Kumiko Muramatsu,

    Roles Methodology, Supervision

    Affiliation Graduate School of Clinical Psychology, Niigata Seiryo University, Niigata, Japan

  • Masaru Horikoshi

    Roles Supervision

    Affiliation National Center of Neurology and Psychiatry, National Center for Cognitive Behavior Therapy and Research, Tokyo, Japan

Abstract

The Patient Health Questionnaire-9 (PHQ-9) is commonly used to screen for depressive disorder and for monitoring depressive symptoms. However, there are mixed findings regarding its factor structure (i.e., whether it has a unidimensional, two-dimensional, or bi-factor structure). Furthermore, its measurement invariance between non-clinical and clinical populations and that between patients with major depressive disorder (MDD) and MDD with comorbid anxiety disorder (AD) is unknown. Japanese adults with MDD (n = 406), MDD with AD (n = 636), and no psychiatric disorders (non-clinical population; n = 1,163) answered this questionnaire on the Internet. Confirmatory factor analyses showed that the bi-factor model had a better fit than the unidimensional and two-dimensional factor models did. The results of a multi-group confirmatory factor analysis indicated scalar invariance between the non-clinical and only MDD groups, and that between the only MDD and MDD with AD groups. In conclusion, the bi-factor model with two specific factors was supported among the non-clinical, only MDD, and MDD with AD groups. The scalar measurement invariance model was supported between the groups, which indicated the total or sub-scale scores were comparable between groups.

Introduction

Depression is an exceedingly common comorbid condition in several mental disorders. To date, numerous self-report measures of depression have been developed, many of which are commonly used in clinical practice and research [1]. In particular, the Patient Health Questionnaire (PHQ) is one of the most useful measures for monitoring depressive symptoms and for screening for major depressive disorder (MDD) [2]. The 9-item version of the PHQ (PHQ-9) [3] is a brief and simple self-administered measure that corresponds with the nine diagnostic criteria for depressive disorder of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV). The PHQ-9 has been found to have high reliability and validity in Western populations, and can be used as a one- or two-item measure. The guidelines of the National Institute for Health and Clinical Excellence recommend the PHQ-9 for assessing the severity of depressive symptoms in clinical practice [4].

Despite its numerous advantages, the PHQ-9 has at least two main issues stemming from a lack of previous research, namely, the unclear factor structure and measurement invariance. First, there are mixed findings regarding the factor structure of the PHQ-9 in Western populations. Some studies using clinical samples (both in primary care and psychiatric settings) have determined that the PHQ-9 has a unidimensional factor structure [5]. In contrast, other studies on psychiatric patients or individuals with physical illness and depression have determined that a two-dimensional factor structure (including somatic and cognitive/affective symptom factors) had a better fit [6, 7, 8]. However, no studies have yet examined a possible bi-factor model that represents the existence of both a general factor and specific sub-factors called as group factors [9, 10]. For example, similar mixed results regarding the factorial unidimensionality of the Symptom Checklist-90-Revised has been resolved by testing a bi-factor model [11]. Hence, we hypothesized that the bi-factor model will show the best fit against the mere unidimensional or two-factor model.

Second, previous studies have not examined the possible measurement invariance [12] in the PHQ-9 between non-clinical and clinical populations. Additionally, no studies have yet examined whether the factor structure of the PHQ-9 could be assumed equivalently between patients with only MDD and those with MDD who comorbid anxiety disorder (AD). Given the high co-occurrence rate of depression and anxiety [13], it is necessary to examine the factor structure of the PHQ-9 between patients with only MDD and those with MDD who have comorbid AD.

In the present study, using an existing large dataset of non-clinical and clinical populations in Japan, we (1) compared the fit of the unidimensional, two-dimensional, and bi-factor models of the PHQ-9 via a confirmatory factor analysis; and (2) examined the measurement invariance between non-clinical, only MDD, and MDD with AD groups using a multi-group confirmatory factor analysis.

Methods

Participants and procedure

This study was part of a larger web-based survey for examining the emotions and psychopathology of Japanese clinical and non-clinical populations [14, 15]. We recruited participants in this study from panelists registered with Macromill Incorporation. This company is one of the largest Japanese internet marketing research company and has been used in previous studies [16]. A total of 2,830 individuals (1,547 females, 1,283 males; mean age = 42.44 years; SD = 10.39 years; range = 19−79 years) were selected randomly according to their age, gender, and living area from each population, including 619 individuals with MDD, 576 with social anxiety disorder, 619 with panic disorder, 645 with obsessive-compulsive disorder, and 371 without any psychiatric disorder (i.e., non-clinical population). The participants self-reported their own diagnoses by answering the following items regarding their current diagnoses and treatment of mental disorders: “Are you currently diagnosed as having Major Depressive Disorder and being treated for the problem in a medical setting?” Similarly, they were asked to respond “yes” or “no” to the question of their own diagnoses of social anxiety disorder, panic disorder, and obsessive-compulsive disorder. This study was approved by the institutional review board of the National Center of Neurology and Psychiatry (approval number: A2013-002).

Measurements

Japanese version of the Patient Health Questionnaire-9 (J-PHQ-9).

The J-PHQ-9 assesses the frequency with which the nine symptoms of depression occurred over the last two weeks [17]. The participants rate each of the nine items on a scale ranging from 0 (not at all) to 3 (nearly every day). The reliability of the English version of PHQ-9 is excellent, as evidenced by the previous reports of an internal reliability Cronbach’s α of .86 to .91 [3, 5] and test-retest reliability [3]. The construct validity of the English version of the PHQ-9 confirmed by findings of previous studies which reported that increasing PHQ-9 scores were associated with worsening function [3, 18], increasing depression assessed using other measures [6, 18], increasing anxiety [6], and decreasing psychology well-being [6]. Additionally, in the present study, the internal reliability of the J-PHQ-9 was excellent, as evidenced by a Cronbach’s α .93, .84, and .91 for the total score, somatic score, and cognitive/affective score, respectively. The J-PHQ-9 also had good convergent validity as it was associated with the Japanese versions of the Kesseler Psychological Distress Scale (K6) [19] (r = .81) and Center for Epidemiologic Studies Depression Scale [20] (r = .86). In this study, we used the sum of the item scores of this scale for our analyses.

Statistical analysis

First, we conducted a confirmatory factor analysis of the PHQ-9 using the data collected from the entire sample (n = 2,205). In this analysis, we determined and compared the fit of the above-stated three factor models to the data using the full information maximum likelihood method. In the unidimensional factor model, each item was represented by a single factor (Fig 1) [21]. In the two-dimensional factor model, items loaded onto one of the latent factors of somatic and cognitive/affective symptoms (Fig 2) [8]. Finally, in the bi-factor model, we designated somatic and cognitive/affective symptoms as specific group factors, and the sum of the item scores as the general factor (Fig 3).

Second, to examine the measurement invariance across non-clinical, only MDD, and MDD with AD populations, we conducted a multi-group confirmatory factor analysis [22]. We examined the measurement invariance of the PHQ-9 scores between the non-clinical and only MDD groups, and between the only MDD and MDD with AD groups. We constructed the following five increasingly restrictive models: where all parameters were free (Model 1: configural invariance); where loadings were invariant (Model 2: metric invariance); where loadings and intercepts were invariant (Model 3: scalar invariance); where loadings, intercepts, and residuals were invariant (Model 4: error variance invariance); and where loadings, intercepts, residuals, and factor means were invariant (Model 5: factor variance invariance). We used the following fit indices to evaluate the models: chi-square, root mean square error of approximation (RMSEA), Akaike information criterion (AIC), Bayesian information criterion (BIC), comparative fit index (CFI), and standardized root mean square residual (SRMR). Goodness-of-fit indices were examined in light of the following standards used in past literature [23]: the chi-square test (χ2) should not be significant; the RMSEA should be < .10 for acceptable fit and < .06 for good fit; the CFI should be ≥.90 for acceptable fit and >.95 for good fit; and the SRMR should be < .10 for acceptable fit and < .08 for good fit. The following criterion was used to adopting the model: a difference of less than .01 in the ΔCFI index supports the less parameterized model [24].

Results

Distribution of the PHQ-9 score

Mean values of the total, somatic, and cognitive/affective scores on the PHQ-9 were as follows: non-clinical group: 6.96 (Standard Deviation: SD = 6.46), 3.10 (SD = 2.62), and 3.86 (SD = 4.26); only MDD group: 12.42 (SD = 7.57), 5.16 (SD = 2.80), and 7.26 (SD = 5.27); MDD with AD group: 15.86 (SD = 7.20), 6.17 (SD = 2.58), and 9.69 (SD = 5.10).

Confirmatory factor analysis

We compared the fit indices of the three models (unidimensional, two-dimensional, and bi-factor models) using the entire sample. The fit indices of the unidimensional model (χ2(27) = 1171.93, p < 0.001; RMSEA = .122; CFI = .936 SRMR = .037) and two-dimensional model (χ2(26) = 745.14, p < 0.001; RMSEA = .098; CFI = .960; SRMR = .029) were poorer than those of the bi-factor model (χ2(18) = 373.05, p < 0.001; RMSEA = .083; CFI = .980; SRMR = .020) (Table 1). Therefore, we selected the bi-factor model for subsequent analyses. In addition, Table 1 shows the fit indices of three models using the non-clinical, only MDD, and MDD with AD groups.

thumbnail
Table 1. The fit indices of the unidimensional, two-dimensional, and bi-factor models.

https://doi.org/10.1371/journal.pone.0199235.t001

Table 2 shows the standardized factor loadings for the bi-factor model using the entire sample. The general factor, the group cognitive/affective factor, and the group somatic factor accounted for 56.9%, 25.5%, and 17.6% of the common variance, respectively.

thumbnail
Table 2. Standardized factor loadings for the bi-factor model using entire sample.

https://doi.org/10.1371/journal.pone.0199235.t002

Multi-group confirmatory factor analysis

First, we conducted a multi-group confirmatory factor analysis (of the bi-factor model) for the non-clinical and only MDD groups (Table 3). According to the criterion for adopting the model, Model 3 showed the best fit (scalar invariance), wherein the loadings and intercepts were invariant.

thumbnail
Table 3. Summary of goodness of fit statistics for tested models in multi-group analyses.

https://doi.org/10.1371/journal.pone.0199235.t003

Second, we conducted a multi-group confirmatory factor analysis using only the MDD and MDD with AD groups (Table 3). Similar to the findings pertaining to the non-clinical and only MDD groups, Model 3 showed the best fit (scalar invariance). Scalar invariance indicates that differences in the factor mean lead to differences in item mean.

Discussion

In this study, we compared a bi-factor model of the PHQ-9 with unidimensional and two-dimensional factor models and examined the measurement invariance of the PHQ-9 across non-clinical, only MDD, and MDD with AD groups. Among both non-clinical and clinical populations, we found that the bi-factor model had the best fit. This explains the mixed results found in previous studies that reported that the PHQ-9 has either a unidimensional or a two-dimensional factor structure [8, 21]. The bi-factor model allows one to use the unidimensional factor model of the PHQ-9, that is, we can use the cut-off point and the total score as a single variable. Additionally, we can use the two-dimensional factor model of the PHQ-9 for assessing more detailed symptoms. Moreover, the general PHQ-9 factor accounted for over 40% of the common variance. Thus, using both total score and sub-scale scores allows us to assess patients’ symptoms more precisely. In addition to assessing patients’ symptoms more precisely, we may be able to detect the change in patients’ symptoms due to treatment more fully by using the PHQ-9 regularly during treatment. This in turn will aid the implementation of appropriate treatment or the modification of the treatment according to the patients’ needs.

According to the results of the measurement invariance, scalar invariance showed best fit between the non-clinical and only MDD groups, and between the only MDD and MDD with AD groups, which means that we can compare the latent mean of the PHQ-9 between these two populations. Although the PHQ-9 total, somatic, and cognitive/affective scores of the MDD with AD group were higher than those of the MDD and non-clinical groups, these populations responded to each item similarly.

This study has several limitations. First, we might have obtained a biased sample because we conducted a web-based survey. For example, patients with more severe depressive symptoms and those who do not use the Internet frequently might have been excluded from this web-based survey. Second, participants were asked to report their own diagnoses and were not interviewed to assess whether they actually had MDD/AD. In other words, some of the participants might not have met the required diagnostic criteria for MDD/AD. This is in part supported by the low mean PHQ-9 score reported in the present study (M = 12.42 for MDD only, M = 15.86 for MDD/AD) as compared to that found in previous studies conducted in Western countries. For example, Kroenke [3] reported a mean score of 17.1 among 41 patients with MDD, and Petersen [8] reported a mean score of 17.3 among the 626 such patients. Future studies must test the higher-order factorial model and assess measurement invariance with participants diagnosed using a structured interview. Finally, we used only Japanese non-clinical and clinical populations, making it unclear whether these results are applicable to a Western population.

Acknowledgments

We thank all the participants who contributed to our larger web-based survey.

References

  1. 1. Furukawa TA. Assessment of mood: guides for clinicians. Psychosom Res. 2010;68(6):581–589.
  2. 2. Spitzer RL, Kroenke K, Williams JB. Patient Health Questionnaire Primary Care Study Group. Validation and utility of a self-report version of PRIME-MD: The PHQ primary care study. JAMA. 1999;282(18):1737–1744. pmid:10568646
  3. 3. Kroenke K, Spitzer RL, Williams JB. The PhQ-9. J Gen Int Med. 2001;16(91):606–613.
  4. 4. National Institute for Health and Clinical Excellence 8 Depression The treatment and management of depression in adults (update: Clinical guideline 90) [Internet]. National Institute for Health and Clinical Excellence; 2009 [cited 2016/6/1]. www.nice.org/uk/CG90.
  5. 5. Hansson M, Chotai J, Nordstöm A, Bodlund O. Comparison of two self-rating scales to detect depression: HADS and PHQ-9. Br J Gen Pract. 2009;59(566):e283–288. pmid:19761655
  6. 6. Beard C, Hsu KJ, Rifkin LS, Busch AB, Björgvinsson T. Validation of the PHQ-9 in a psychiatric sample. J Affect Disord. 2016;193:267–273. pmid:26774513
  7. 7. Chilcot J, Rayner L, Lee W, Price A, Goodwin L, Monroe B, et al. The factor structure of the PHQ-9 in palliative care. J Psychosom Res 2013;75(1):60–4. pmid:23751240
  8. 8. Petersen JJ, Paulitsch MA, Hartig J, Mergenthal K, Gerlach FM, Gensichen J. Factor structure and measurement invariance of the Patient Health Questionnaire-9 for female and male primary care patients with major depression in Germany. J Affect Disord. 2015; 170:138–142. pmid:25240840
  9. 9. Credé M, Harms PD. 25 years of higher-order confirmatory factor analysis in the organizational sciences: A critical review and development of reporting recommendations. J Organ Behav. 2015;36(6):845–872.
  10. 10. Yung YF, Thissen D, McLeod LD. On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika. 1999;64(2):113–128.
  11. 11. Urbán R, Arrindell WA, Demetrovics Z, Unoka Z, Timman R. Cross-cultural confirmation of bi-factor models of a symptom distress measure: Symptom Checklist-90-Revised in clinical samples. Psychiat Res. 2016; 239:265–274.
  12. 12. Vandenberg RJ, Lance CE. A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organ Res Methods. 2000;3(1):4–70.
  13. 13. Sartorius N, Üstün TB, Lecrubier Y, Wittchen HU. Depression comorbid with anxiety: Results from the WHO study on “Psychological disorders in primary health care.”. Br J Psychiatr. 1996;168(30):38–43.
  14. 14. Ito M, Bentley KH, Oe Y, Nakajima S, Fujisato H, Kato N, et al. Assessing depression related severity and functional impairment: the overall depression severity and impairment scale (ODSIS). PloS One. 2015;10(4):e0122969. pmid:25874558
  15. 15. Ito M, Oe Y, Kato N, Nakajima S, Fujisato H, Miyamae M, et al. Validity and clinical interpretability of Overall Anxiety Severity and Impairment Scale (OASIS). J Affect Disord. 2015;170:217–224. pmid:25259673
  16. 16. Sawada N, Uchida H, Watanabe K, Kikuchi T, Suzuki T, Kashima H, et al. How successful are physicians in eliciting the truth from their patients? A large-scale Internet survey from patients’ perspectives. J Clin Psychiatry. 2012;73(3): 311–317. pmid:22490259
  17. 17. Muramatsu K, Kamijima K, Yoshida M, Otsubo T, Miyaoka H, Muramatsu Y, Gejyo F. The Patient Health Questionnaire, Japanese Version: Validity According to the Mini-International Neuropsychiatric Interview–Plus. Psychol Reports. 2007;101(3):952–960.
  18. 18. Martin A, Rief W, Klaiberg A, Braehler E. Validity of the brief Patient Health Questionnaire mood scale (PHQ-9) in the general population. Gen Hosp Psychiat. 2006;28:71–77.
  19. 19. Furukawa TA, Kawakami N, Saitoh M, Ono Y, Nakane Y, Nakamura Y, et al. The performance of the Japanese version of the K6 and K10 in the World Mental Health Survey Japan. Int J Methods Psychiatr Res. 2008;17(3):152–158. pmid:18763695
  20. 20. Shima S, Shikano T, Kitamura T, Asai M. Reliability and validity of CES-D. Jpn J Psychiatry. 1985;27:717–723.
  21. 21. Yu X, Tam WW, Wong PT, Lam TH, Stewart S. The Patient Health Questionnaire-9 for measuring depressive symptoms among the general population in Hong Kong. Compr Psychiatr. 2012;53(1):95–102.
  22. 22. Gregorich SE. Do self-report instruments allow meaningful comparisons across diverse population groups? Testing measurement invariance using the confirmatory factor analysis framework. Med Care. 2006;44(11 Suppl 3):S78. pmid:17060839
  23. 23. Kline RB. Principles and practice of structural equation modeling. New York: Guilford Press; 2015.
  24. 24. Cheung GW, Rensvold RB. Evaluating goodness-of-fit indexes for testing measurement invariance. Struct Equ Modeling. 2002;9(2):233–255.