Exploring the use of machine learning for risk adjustment: A comparison of standard and penalized linear regression models in predicting health care costs in older adults

Hong J. Kan; Hadi Kharrazi; Hsien-Yen Chang; Dave Bodycombe; Klaus Lemke; Jonathan P. Weiner

doi:10.1371/journal.pone.0213258

Abstract

Background

Payers and providers still primarily use ordinary least squares (OLS) to estimate expected economic and clinical outcomes for risk adjustment purposes. Penalized linear regression represents a practical and incremental step forward that provides transparency and interpretability within the familiar regression framework. This study conducted an in-depth comparison of prediction performance of standard and penalized linear regression in predicting future health care costs in older adults.

Methods and findings

This retrospective cohort study included 81,106 Medicare Advantage patients with 5 years of continuous medical and pharmacy insurance from 2009 to 2013. Total health care costs in 2013 were predicted with comorbidity indicators from 2009 to 2012. Using 2012 predictors only, OLS performed poorly (e.g., R² = 16.3%) compared to penalized linear regression models (R² ranging from 16.8 to 16.9%); using 2009–2012 predictors, the gap in prediction performance increased (R²:15.0% versus 18.0–18.2%). OLS with a reduced set of predictors selected by lasso showed improved performance (R² = 16.6% with 2012 predictors, 17.4% with 2009–2012 predictors) relative to OLS without variable selection but still lagged behind the prediction performance of penalized regression. Lasso regression consistently generated prediction ratios closer to 1 across different levels of predicted risk compared to other models.

Conclusions

This study demonstrated the advantages of using transparent and easy-to-interpret penalized linear regression for predicting future health care costs in older adults relative to standard linear regression. Penalized regression showed better performance than OLS in predicting health care costs. Applying penalized regression to longitudinal data increased prediction accuracy. Lasso regression in particular showed superior prediction ratios across low and high levels of predicted risk. Health care insurers, providers and policy makers may benefit from adopting penalized regression such as lasso regression for cost prediction to improve risk adjustment and population health management and thus better address the underlying needs and risk of the populations they serve.

Citation: Kan HJ, Kharrazi H, Chang H-Y, Bodycombe D, Lemke K, Weiner JP (2019) Exploring the use of machine learning for risk adjustment: A comparison of standard and penalized linear regression models in predicting health care costs in older adults. PLoS ONE 14(3): e0213258. https://doi.org/10.1371/journal.pone.0213258

Editor: Gregor Stiglic, University of Maribor, SLOVENIA

Received: June 13, 2018; Accepted: February 19, 2019; Published: March 6, 2019

Copyright: © 2019 Kan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The code for this study can be found via the following GitHub URL: https://github.com/hkan2018/risk_adjustment_with_penalized_regression. The health insurance administrative claims data that supported the findings of this study were made available by IMS (now part of IQVIA). Restrictions apply to the availability of these data, which were used under a license for the current study and are not publicly available. However, data are available from the authors upon reasonable request and with the permission of IMS. There were no special access privileges used by the authors. IQVIA may be contacted for data access using the following information: Cheryl Boggia, Data Partnerships, US Payer Provider Solutions, Learn more about IQVIA, 201 Broadway 5th floor, Cambridge, MA 02139, USA, Email: Cheryl.Boggia@iqvia.com, M: +1 (617) 733-6878.

Funding: The authors received no specific funding for this study.

Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: This study applied algorithms for grouping diagnosis codes and prescription drugs from the ACG case-mix/risk adjustment methodology, developed at Johns Hopkins Bloomberg School of Public Health. The Johns Hopkins University receives royalties for non-academic use of the software based on the ACG methodology. Dr. Kan, Dr. Chang, Dr. Kharrazi, Dr. Bodycombe, Dr. Lemke and Dr. Weiner receive a portion of their salary support from this revenue. This does not alter our adherence to all PLOS ONE policies on sharing data and materials.

Introduction

Risk adjustment models are applied by payers and health care delivery organizations to adjust for differences in patient characteristics when estimating expected health care resource use, clinical outcomes, and quality of care. Commonly used predictors in risk adjustment models include demographic information and clinical variables. The dominant type of risk adjustment models in practice are standard linear regression based on ordinary least squares (OLS) [1]. For example, the HHS-Hierarchical Condition Categories (HHS-HCC) model, a risk adjustment model adopted for health plans participating in the Affordable Care Act, uses standard linear regression with age, gender, diagnoses and interactions between diagnoses to predict medical expenditure risk [2].

An emerging literature has begun to explore the potential application of machine learning methods to predict health care costs and utilization for risk adjustment purposes [3–6]. These studies compared a variety of machine learning techniques for risk adjustment including penalized regression, random forests, multivariate adaptive regression splines, boosted regression trees, neural network, and super learner. Early success has demonstrated the potential value of machine learning regression and classification methods for predicting costs and utilization. With new data sources becoming available for population health management [7–9], machine learning methods will become increasingly useful to process and analyze increasingly complex population-level health data.

However, despite the potential value of advanced machine learning approaches to predicting risk, payers and providers are still heavily relying on OLS regression to risk adjust and manage their patient populations. The slow adoption of advanced machine learning techniques can be partly explained by the unfamiliarity of risk stratification analysts with such techniques and complex interpretation and integration of results needed in practice. One approach to pushing the needle toward machine learning adoption in risk adjustment practice is through the introduction of incremental, effective and transparent machine learning regression models that stay within the framework of standard linear regression and also have as good performance as some more sophisticated but less transparent machine learning techniques [3]. This study concentrated on penalized linear regression models including lasso (least absolute shrinkage and selection operator) [10], ridge [11] and elastic net [12] and conducted a thorough comparison of penalized regression with standard linear regression in predicting total health care costs, which was not previously reported in published literature. We focused on older adults (≥65 years old) as they incur disproportionately more health care spending [13].

Multiple factors make penalized linear regression a viable potential next step beyond OLS for risk prediction and adjustment. First, transparency of a risk adjustment model is paramount for care management and resource allocation. Penalized linear regression provides almost the same level of transparency and interpretability as standard linear regression. Some machine learning techniques such as random forests and neural network are hard to estimate and difficult to interpret, and yet they do not offer better prediction compared to penalized regression in predicting health care costs [3]. Second, despite that standard linear regression is still the most popular risk adjustment approach, penalized linear regression can be as easily scaled and deployed in environments with limited computational power and thus represents a pragmatic step forward for risk adjustment. Third, penalized regression such as lasso regression selects and retains important variables for prediction. Providers often have incentives to increase the intensity of coding medical services (a practice referred to as “upcoding”), especially those included in a risk adjustment model, in order to maximize reimbursement [14]. Carefully selecting predictors for a risk adjustment model with clinical insights and statistical criteria may curtail the opportunity for upcoding. As an example, HCC models accomplished this by creating a hierarchy of grouped conditions only based on a subset of all available diagnosis codes [2]. In addition, keeping only important variables in a model may facilitate care management as it is easier for care managers to target key risk factors.

The study also assessed the value of penalized regression in generating more parsimonious models as well as using additional predictors collected over a longer period of time. We tested parsimonious OLS models by including only important predictors selected by lasso regression. OLS provides unbiased estimates when specified correctly whereas penalized regression sacrifices unbiasedness for a potential reduction of expected prediction error. Variable selection may reduce the number of irrelevant predictors included in a model and thus increase efficiency and reduce the chance of overfitting. We also compared predictive model performance using baseline predictors from 1 year versus 4 years in the past.

The overall goal of this study was to assess the potential of penalized linear regression models for risk adjustment. Specifically, the study 1) compared standard linear regression with penalized linear regression in predicting future total health care costs in older adults, 2) compared standard linear regression using full and reduced sets of predictors selected by lasso regression, and 3) assessed the value of using longitudinal data from 4 years versus 1 year in the past as predictors.

Methods

This retrospective cohort study used IMS LifeLink Health Plan Claims Database [15], which is comprised of fully adjudicated and de-identified medical and pharmaceutical claims from health insurance plans. The database captures a geographically diverse sample of health plan enrollees in the U.S. Charges, allowed and paid amounts are available for all services rendered, as well as date of service for all claims. The database is fully compliant with the Health Insurance Portability and Accountability Act (HIPAA). The Institutional Review Board at the Johns Hopkins Bloomberg School of Public Health reviewed the study proposal and determined that the human subjects research activity described in the application meets the criteria for Exemption under 45 CFR 46.101(b), Category (4). It approved proposed use of an existing limited data set from commercial health plan claims in the U.S. (IRB No: 00008699). Patients were selected from a large health plan with longitudinal patient records. Patients were required to have 5 years of continuous medical and pharmacy insurance benefits from 2009 to 2013 and be at least 65 years old at the end of 2012. Although they were all Medicare Advantage enrollees, the selected patients were not nationally representative of Medicare Advantage enrollees.

Total health care costs in 2013 were the target outcome for all predictive models. Predictors were extracted from data prior to 2013. Previous diseases and symptoms as indicated by recorded medical diagnoses and pharmacy claims were included as predictors. The Johns Hopkins Adjusted Clinical Groups (ACG) System version 11.0 [16] was applied to medical and pharmacy claims to generate binary comorbidity indicators by grouping International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis codes from inpatient and outpatient claims and National Drug Codes (NDCs) from pharmacy claims. Only diagnoses made by a physician (excluding labs, imaging, and other provisional diagnoses) were included for grouping. The high-level “rolled-up” comorbidity groups have up to 282 diagnosis-based conditions called Expanded Diagnosis Clusters (EDCs) and up to 67 pharmacy-based conditions called Rx-defined Morbidity Groups (RxMGs). EDC and RxMG grouping algorithms were created by clinicians based on clinical judgement and cover a large aggregate set of comorbidities. RxMGs represent conditions treated with medications and do not completely overlap with EDCs which are based solely on diagnosis codes. Comorbidities with zero prevalence were excluded. In addition to the yearly comorbidity indicators, age (at the end of 2012), age squared and sex were included as predictors in all predictive models. To compare predictive model performance using information from baseline periods of different length, yearly EDC and RxMG comorbidity indicators were first extracted from medical and pharmacy claims in 2012 for 1-year prospective prediction models, and then 4 sets of the same yearly indicators were extracted in each of the 4 years from 2009 to 2012 for longitudinal prediction models.

The primary difference between standard and penalized regression is that penalized regression adds a regularization term in a least squares loss function before it is optimized to estimate coefficients. Lasso regression adds the sum of absolute values of coefficient estimates as the regularization term (i.e., L1 regularization) whereas ridge regression adds the sum of squares of coefficient estimates as the regularization term (i.e., L2 regularization). Elastic net adds a weighted average of L1 and L2. One unique feature of lasso regression is that it selects predictors simultaneously with model estimation. We compared standard linear regression with penalized linear regression with lasso (α = 1), ridge (α = 0), and elastic net (0<α<1) regularization as defined by (1- α)/2ǁβǁ₂²+ αǁβǁ₁ (β is a vector of coefficients). We tested elastic net regularization with α ranging from 0.1 to 0.9 with an interval of 0.1. The regularization term is multiplied by a model hyperparameter called lambda that determines the total amount of regularization when added to the least squares loss function. This study used cross-validation to find the optimal value of lambda that achieved minimum cross-validation mean standard error [17]. In addition, we tested two parsimonious OLS models with 2012 and 2009–2012 predictors, including only predictors selected by lasso regression. The OLS regression predicting 2013 costs with the full set of 2012 predictors represented the standard base case model for comparison purposes.

The entire study sample was split into training (75%) and test (25%) sets. All model development and validation was conducted in the training set. OLS was estimated in the training set directly as no model tuning is needed. Penalized linear regression was tuned using 10-fold cross-validation in the training set. Tuned penalized regression models were re-estimated using the entire training dataset. Predictive performance of final estimated models was assessed in the test set by: (1) R squared (R²), representing the percent of total variation of actual costs explained by a model (a higher percent indicates better performance), (2) root mean squared error (RMSE): square root of mean squared differences between predicted and actual costs (a smaller value indicates better performance), (3) mean absolute prediction error (MAPE): mean absolute value of differences between predicted and actual costs (a smaller value indicates better performance), and (4) prediction ratio (PR): sum of predicted costs divided by sum of actual costs (a value closer to 1 indicates better performance). Model performance was assessed in the entire test set as well as within each of the 10 deciles of predicted costs in the test set. All programming was performed in R version 3.4.2 [18] with glmnet version 2.0–16 [19]. R codes can be found at https://github.com/hkan2018/risk_adjustment_with_penalized_regression.

Results

A total of 81,106 patients met the selection criteria with 60,737 split to a training set and 20,369 to a test set. In the entire study sample, mean (standard deviation (SD)) age was 73.8 (6.7) years old and 50.8% were females. Mean total health care costs (SD) in 2013 was $16,509 (41,376). Proportion of patients with a specific EDC (n = 277) or RxMG (n = 67) in 2012 in the training set can be found in S1 Table.

Table 1 shows the performance of all the predictive models using 2012 predictors assessed in the test set. The OLS model with the full set of 2012 predictors assessed in the training set had an R² of 18.5% (data not included in the table) versus 16.3% assessed in the test set, indicating some overfitting. OLS performed poorly, based on R² (16.3%), RMSE (35,801) and MAPE (15,331), compared to ridge, elastic net and lasso penalized regression models, all of which displayed similar performance with R² ranging from 16.8 to 16.9%, RMSE 35,669–35,690, and MAPE 15,244–15,260. However, the prediction ratio of OLS in the entire test set was 1.001 (note that this ratio assessed in the training set would be exactly 1 by the nature of standard linear regression), compared to the prediction ratios of penalized regression models (1.002–1.003), indicating a minor increase in bias of estimates of penalized linear regression as measured by prediction ratio.

Download:

Table 1. Prediction performance of models using 2012 predictors in predicting 2013 costs in the test set (n = 20,369).

https://doi.org/10.1371/journal.pone.0213258.t001

Out of the 347 original predictors, lasso regression selected 175 important variables including age and sex with coefficient estimates of all the other predictors shrunk to zero. Using these 175 variables, OLS performance improved (R² = 16.6%, RMSE = 35,749, MAPE = 15,237) relative to OLS with the full set of 347 predictors (R² = 16.3%, RMSE = 35,801, MAPE = 15,331). However, the performance of the parsimonious OLS model still lagged behind those of penalized regression models based on R² (16.6% versus 16.8–16.9%) and RMSE (35,749 versus 35,669–35,690), although the MAPE measure for the parsimonious model showed a small improvement (15,237 versus 15,244–15,260). In addition, the parsimonious OLS model retained the same smaller prediction ratio (1.001) as the OLS model with the full set of predictors. See S1 Table for top 50 most prevalent 2012 EDC and RxMG comorbidity indicators selected by the lasso model.

Table 2 shows model performance within each of the 10 deciles of predicted costs for OLS (with the full and reduced sets of predictors), ridge, and lasso regression using 2012 predictors. Among the 4 models, lasso regression showed prediction ratios consistently close to 1 across all the 10 deciles of predicted costs (e.g., PR of 0.979 in decile 1 and 1.019 in decile 10). OLS with the full set of predictors under-predicted costs in low predicted risk deciles and over-predicted costs in high predicted risk deciles (e.g., PR of .433 in decile 1 and 1.073 in decile 10). Although the parsimonious OLS and ridge regression improved on prediction ratio compared to OLS with the full set of predictors, both the models showed inferior prediction ratios in low and high ends of predicted costs compared to lasso regression (e.g., PR of .539 in the parsimonious OLS and 0.754 in ridge regression compared to .979 in lasso regression in decile 1; PR of 1.075 in the parsimonious OLS and 1.022 in ridge regression compared to 1.019 in lasso regression in decile 10). Elastic net regression showed similar performance by deciles as lasso regression (see Table A in S2 Table).

Download:

Table 2. Prediction performance of models using 2012 predictors in predicting 2013 costs in the test set, by deciles of predicted costs.

https://doi.org/10.1371/journal.pone.0213258.t002

The longitudinal predictive model included 1,387 predictors over the 4-year period from 2009 to 2012. Table 3 shows the same direction of performance gaps between standard and penalized linear regression with 4 years of predictors as shown by the models with 1 year of data, but the performance gaps enlarged as indicated by R², RMSE and MAPE. For example, the difference in R² between OLS with the full set of predictors (15.0%) and penalized regression models with 4 years of predictors (18.0–18.2%) was larger than between the models with 1 year of data (16.3% versus 16.8–16.9%). However, penalized regression with 4 years of data showed a slightly larger prediction ratio (1.004) compared to 1.002–1.003 in penalized regression with 1 year of data.

Download:

Table 3. Prediction performance of models using 2009–2012 predictors in predicting 2013 in the test set (n = 20,369).

https://doi.org/10.1371/journal.pone.0213258.t003

Improved performance of penalized regression models with 4 years versus 1 year of predictors (R²: 18.0–18.2% versus16.8–16.9%) indicates the value of longitudinal data for better prediction performance. However, this gain only occurred with penalized regression. OLS with full 2009–2012 predictors actually had worse performance (e.g., R² = 15.0%) than OLS with full 2012 predictors (R² = 16.3%). It is noteworthy that R² of OLS with 2009–2012 predictors assessed in the training set was 21.5% vs. 15.0% in the test set, indicating more serious overfitting. However, OLS with important predictors over 4 years selected by lasso performed better (e.g., R² = 17.4%) than OLS with full 2012 predictors (R² = 16.3%).

Out of the original 1,387 predictors over the 4-year period, lasso regression selected 276 important predictors, among which 46, 44, 65 and 119 comorbidity indicators came from 2009, 2010, 2011, and 2012, respectively, indicating that all of the 4 previous years of data contributed to prediction of 2013 health care costs with more recent years of comorbidities more likely being selected as important variables. Although the parsimonious OLS regression (e.g., R² = 17.4%) performed better than OLS with the full set of 2009–2012 variables (R² = 15.0%), it still fell short of the performance achieved by penalized regression (R²: 18.0–18.2%), indicating that variable selection for OLS was not enough to achieve the same level of prediction improvement displayed by penalized regression.

Table 4 shows model performance by deciles of predicted costs with 4 years of predictors of the same 4 models (i.e., OLS with full and reduced sets of 2009–2012 predictors, ridge, and lasso regression). Comparing Table 2 and Table 4 shows more pronounced differences in prediction ratios between lasso and the other three models with 4 years of predictors. Prediction ratios of lasso regression were much closer to 1 across low and high levels of predicted costs compared to the other three models (e.g., PR of -0.177 in OLS with the full set of predictors, 0.316 in the parsimonious OLS, and 0.611 in ridge regression, compared to 1.106 in lasso regression in decile 1; PR of 1.164 in OLS with the full set of predictors, 1.109 in the parsimonious OLS, and 1.005 in ridge regression, compared to 1.004 in lasso regression in decile 10). Elastic net regression showed similar performance by deciles as lasso regression (see Table B in S2 Table).

Download:

Table 4. Prediction performance of models using 2009–2012 predictors in predicting 2013 costs in the test set, by deciles of predicted costs.

https://doi.org/10.1371/journal.pone.0213258.t004

Discussion

Payers and providers commonly use standard OLS linear regression for risk adjustment and population health management. Although machine learning methods in general have shown initial promising results, payers and providers have been slow in adopting unfamiliar complex methods with difficult-to-interpret results. However, they might be more amenable to techniques such as penalized linear regression with underlying machine learning fundamentals but familiar and transparent regression framework. This study demonstrated important advantages of using penalized regression versus traditional standard OLS regression to predict future healthcare costs among older adults with demographic and comorbidity variables.

Specifically, our findings showed that penalized linear regression outperformed OLS with full and reduced (selected by lasso) sets of predictors, based on R², RMSE, and MAPE, except for prediction ratio in which OLS showed a slight advantage. Although all penalized regression models performed similarly when evaluated in the entire test set, lasso regression consistently showed superior prediction ratios across high and low levels of predicted risk compared to ridge and OLS. Coefficient shrinkage and variable selection may have helped lasso to achieve better performance across the entire risk spectrum. Built-in variable selection of lasso regression may reduce overfitting as well as the number of irrelevant predictors included in the model. In addition, lasso regression generated a much smaller number of negative predicted costs with only 2 observations in the test set with negative predictions compared to 120 negative predictions by the OLS model (data not shown). Although elastic net regression showed similar performance as lasso within deciles of predicted risk, lasso regression may be preferable for its simpler interpretation with built-in variable selection. In contrast, OLS suffers from biased prediction as indicated by prediction ratio deviating from 1 in low and high risk patients. Alleviating group-level biased prediction is critical to a health plan or a clinical care organization that may enroll a biased population of patients with underlying risk skewed towards either the high or low end of risk spectrum.

This study also demonstrated better prediction of parsimonious OLS models with a smaller set of important comorbidity indicators selected by lasso regression than OLS with the full set of predictors. OLS using a full set of predictors without any variable selection may suffer from including irrelevant predictors leading to increased standard error of estimates [20] and/or overfitting. In practice, including only important predictors in a risk adjustment model can both reduce opportunities for upcoding and facilitate care management by allowing care managers to focus on patients with key risk factors.

This study also compared predictive performance of OLS versus penalized regression models against various temporal cuts of the data to simulate situations where “longer” health care data is available (e.g., Medicare data). Comorbidities from each of the past 4 years contributed to better prediction by penalized regression compared to using only 1 year of prior data, and this gain in performance with longitudinal data can only be harnessed by penalized regression as standard linear regression actually showed worse performance using 4 years of prior predictors. We also compared overall performance of OLS and lasso regression with 1, 2, 3, and 4 years of prior data and saw a clean trend that with an increasing number of years of prior data, OLS lost prediction power while lasso gained prediction power (data not shown). This further confirms the advantage of using penalized regression such as lasso regression to model longitudinal data. Both payer and provider organizations can utilize this advantage of penalized regressions to increase the utility of their longer historical data that they are accumulating over time.

Although OLS may produce unbiased estimates when specified correctly, in practice, we do not expect a risk adjustment model for health care costs to be correctly specified, meaning incorporating only relevant variables and relating them to the cost outcome with correct functional specification. This is because individuals are exposed to numerous factors related to biology, behavior, health care, social and physical environment that may impact their health and health care through numerous complex and interactive pathways. Thus, it is not advisable to use causal inference and unbiased estimates to guide model selection for risk adjustment models. In this case, techniques like penalized regression that accept some bias in model estimates for a reduction in variance can be appropriate for improving overall expected prediction error. A favorable bias-variance tradeoff was clearly demonstrated for penalized regression in this study. Although penalized regression models produced slightly increased bias as measured by a 1% to 3% increase in prediction ratios relative to that of OLS in the entire test set, overall, penalized regression clearly achieved better prediction performance than OLS with and without variable selection. Furthermore, penalized regression, especially lasso and elastic net regression, even considerably improved on prediction ratios across low and high levels of predicted risk compared to OLS.

Numerous machine learning techniques exist for regression in the supervised learning setting [17]. Although some machine learning methods such as super learner [3] and deep learning [21] may boost prediction accuracy, they are usually not easy to train nor to understand and interpret, and may require substantial computing power. A transparent modeling technique such as lasso regression is easier to train and scale and empirically demonstrated superior performance among all the other standard and penalized linear regression models tested in this study.

This study only used comorbidity indicators as predictors, derived from recorded diagnoses and filled prescription drugs, reflecting the information a primary care physician (PCP) may typically have access to. A PCP usually knows relatively well diseases and symptoms as well as prescription drugs of his/her patients. Even without complete information, lasso regression demonstrated that only a subset of comorbidities was important for predicting costs. In addition, despite the lack of comprehensive information on health care costs and utilization in EHR systems [22,23], EHRs provide unique data sources for risk stratification [24–28]. Thus, the findings of this study are potentially applicable to both provider and payer settings for practical risk adjustment applications [29].

Finally, although this study did not intend to develop a full risk adjustment model ready to use for payment purposes, it is still worth noting that estimating the impact of improved risk adjustment on actual outcomes such as adverse selection and overpayments to health plans is not as straightforward as it may first appear to be because of the need to consider endogenous response of payers to specific incentives created by a risk adjustment model [30]. For example, an increase in R² of a risk adjustment formula does not necessarily result in an increase or a decrease of government overpayments in the Medicare Advantage program [31]. An empirical investigation of the Hierarchical Condition Categories (HCCs) risk adjustment approach developed by the Centers for Medicare and Medicaid Services (CMS) found that the introduction of the more sophisticated risk adjustment did not alter favorable selection into Medicare Advantage [32]. But we expect that more accurate prediction of costs especially across different levels of risk as demonstrated by penalized regression such as lasso may reduce the room for possible adverse selection and thus make it more difficult to find ways to outmaneuver risk adjustment for financial gains. More research is needed to assess payment-specific issues including risk selection for future new risk adjustment models. Equally importantly, from the clinical care perspective, more accurate identification of patients with low and high future health care needs can help care management programs effectively target appropriate patients for interventions.

The study has a few limitations. The distribution of total health care costs is highly skewed with large outliers. We conducted a sensitivity analysis by assigning $134,074 (99^th percentile of the distribution of total costs in 2013) to all cases with 2013 health care costs over that amount. The sensitivity analysis results did not alter the directions of our findings although the differences in model performance tended to be less pronounced. Although we used ICD-9-CM diagnosis codes in this study, EDCs derived from ICD-9-CM are consistent with those derived from ICD-10-CM. Thus, our study results are applicable to newer health care data with ICD-10-CM as well. This study did not test all regression techniques. However, we tested a generalized linear model with the log link and gamma distribution, which failed to show consistent advantages over standard linear regression. We also tested several more advanced machine learning techniques including random forests and neural network and found no better overall performance than penalized regression. As the study used administrative claims from a particular large health plan in IMS database, the results may not be generalizable to other health plans or to patients under 65 years old. The sample size of this study was limited. Further research is needed to confirm the findings in larger and more diverse samples and to further establish external validity using test data drawn from a different time period or from a different health plan. We also caution that the clear-cut favorable bias-variance trade-off of penalized regression observed in this study may change with a different outcome variable or even a different data source.

In conclusion, this study demonstrated the advantages of using transparent and easy-to-interpret penalized regression models for predicting future health care costs in older adults relative to standard linear regression. In particular, lasso regression showed better prediction performance across different levels of predicted risk. Such predictive analytic techniques, while incorporating underlying machine learning principles, still embody the familiar linear regression framework and provide transparence and interpretability with a gain in prediction performance. As digital data sources become ever more ubiquitous in the health care sector, it is imperative that advances in data science be considered and embraced as appropriate based on transparent and rigorous assessments. Health care insurers, providers and policy makers may benefit from adopting penalized regression such as lasso regression for cost prediction to improve risk adjustment and population health management and thus better address the underlying needs and risk of the populations they serve.

Supporting information

S1 Table. Proportions of patients with EDCs (n = 277) and RxMGs (n = 67) in 2012 in the training sample.

https://doi.org/10.1371/journal.pone.0213258.s001

(DOCX)

S2 Table. Prediction performance of elastic net regression models, by deciles of predicted costs.

https://doi.org/10.1371/journal.pone.0213258.s002

(DOCX)

References

1. Iezzoni L. Risk adjustment for measuring healthcare outcomes. 4th ed. Chicago, IL: Health Administration Press; 2012
2. Kautter J, Pope GC, Ingber M, Freeman S, Patterson L, Cohen M, et al. The HHS-HCC risk adjustment model for individual and small group markets under the Affordable Care Act. Medicare & Medicaid Research Review. 2014;4(3): E1–E46
- View Article
- Google Scholar
3. Rose S. A machine learning framework for plan payment risk adjustment. Health Serv Res. 2016;6: 2358–2374
- View Article
- Google Scholar
4. Duncan I, Loginov M, Ludkovski M. Testing alternative regression frameworks for predictive modeling of health care costs. North American Actuarial Journal. 1996;1: 65–87
- View Article
- Google Scholar
5. Tamang S, Milstein A, Sørensen HT, Pedersen L, Mackey L, Betterton JR, et al. Predicting patient ‘cost blooms’ in Denmark: a longitudinal population-based study. BMJ Open 2017;7: e011580 pmid:28077408
- View Article
- PubMed/NCBI
- Google Scholar
6. Shrestha A, Bergquist S, Montz E, Rose S. Mental health risk adjustment with clinical categories and machine learning. Health Serv Res. 2017. pmid:29244202
- View Article
- PubMed/NCBI
- Google Scholar
7. Kharrazi H, Lasser EC, Yasnoff WA, Loonsk J, Advani A, Lehmann HP, et al. A proposed national research and development agenda for population health informatics: summary recommendations from a national expert workshop. J Am Med Inform Assoc. 2017;24(1):2–12 pmid:27018264
- View Article
- PubMed/NCBI
- Google Scholar
8. Hatef E, Lasser EC, Kharrazi HHK, Perman C, Montgomery R, Weiner JP. A Population Health Measurement Framework: Evidence-Based Metrics for Assessing Community-Level Population Health in the Global Budget Context. Popul Health Manag. 2018;21(4):261–270 pmid:29035630
- View Article
- PubMed/NCBI
- Google Scholar
9. Hatef E, Kharrazi H, VanBaak E, Falcone M, Ferris L, Mertz K, et al. A State-wide Health IT Infrastructure for Population Health: Building a Community-wide Electronic Platform for Maryland's All-Payer Global Budget. Online J Public Health Inform. 2017 Dec 31;9(3):e195 pmid:29403574
- View Article
- PubMed/NCBI
- Google Scholar
10. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (methodological). 1996;58(1): 267–88.
- View Article
- Google Scholar
11. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 1970;12(1): 55–67
- View Article
- Google Scholar
12. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B (methodological). 2005;67(2): 301–320.
- View Article
- Google Scholar
13. Center for Medicare & Medicaid Services. NHE Fact Sheet, 2014. Available at: http://web.archive.org/web/20160329130935/https://www.cms.gov/research-statistics-data-and-systems/statistics-trends-and-reports/nationalhealthexpenddata/nhe-fact-sheet.html. Cited 22 May 2018.
14. Kronick R, Welch WP. Measuring coding intensity in the Medicare Advantage program. Medicare & Medicaid Research Review 2014;4(2): E1–E19
- View Article
- Google Scholar
15. IMS LifeLink® database. Watertown, MA: IQVIA
16. Johns Hopkins Bloomberg School of Public Health. The Johns Hopkins ACG® System, Version 11.0. Available at http://acg.jhsph.org. Cited 22 May 2018.
17. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference and prediction. 2nd ed. New York: Springer Verlag; 2009
18. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2017. URL https://www.R-project.org/.
19. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software. 2010;33(1): 1–22. URL http://www.jstatsoft.org/v33/i01/ pmid:20808728
- View Article
- PubMed/NCBI
- Google Scholar
20. Greene W. Econometric Analysis. 7th ed. Boston: Pearson; 2012.
21. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. npj Digital Medicine 2018;1:18.
- View Article
- Google Scholar
22. Kharrazi H1, Wang C, Scharfstein D. Prospective EHR-based clinical trials: the challenge of missing data. J Gen Intern Med. 2014;29(7):976–8 pmid:24839057
- View Article
- PubMed/NCBI
- Google Scholar
23. Kharrazi H, Gonzalez CP, Lowe KB, Huerta TR, Ford EW. Forecasting the Maturation of Electronic Health Record Functions Among US Hospitals: Retrospective Analysis and Predictive Model. J Med Internet Res. 2018 Aug 7;20(8):e10458. pmid:30087090
- View Article
- PubMed/NCBI
- Google Scholar
24. Kharrazi H, Chi W, Chang HY, Richards TM, Gallagher JM, Knudson SM, et al. Comparing Population-based Risk-stratification Model Performance Using Demographic, Diagnosis and Medication Data Extracted From Outpatient Electronic Health Records Versus Administrative Claims. Med Care. 2017;55(8):789–796 pmid:28598890
- View Article
- PubMed/NCBI
- Google Scholar
25. Lemke KW, Gudzune KA, Kharrazi H, Weiner JP. Assessing markers from ambulatory laboratory tests for predicting high-risk patients. Am J Manag Care. 2018;24(6):e190–e195 pmid:29939509
- View Article
- PubMed/NCBI
- Google Scholar
26. Kan HJ, Kharrazi H, Leff B, Boyd C, Davison A, Chang H, et al. Defining and Assessing Geriatric Risk Factors and Associated Health Care Utilization Among Older Adults Using Claims and Electronic Health Records. Med Care. 2018;56(3):233–239 pmid:29438193
- View Article
- PubMed/NCBI
- Google Scholar
27. Chang HY, Richards TM, Shermock KM, Elder Dalpoas S, J Kan H, Alexander GC, et al. Evaluating the Impact of Prescription Fill Rates on Risk Stratification Model Performance. Med Care. 2017;55(12):1052–1060 pmid:29036011
- View Article
- PubMed/NCBI
- Google Scholar
28. Kharrazi H, Chang HY, Heins SE, Weiner JP, Gudzune KA. Assessing the Impact of Body Mass Index Information on the Performance of Risk Adjustment Models in Predicting Health Care Costs and Utilization. Med Care. 2018;56(12):1042–1050 pmid:30339574
- View Article
- PubMed/NCBI
- Google Scholar
29. Kharrazi H, Weiner JP. A Practical Comparison Between the Predictive Power of Population-based Risk Stratification Models Using Data from Electronic Health Records Versus Administrative Claims: Setting a Baseline for Future EHR-derived Risk Stratification Models. Med Care. 2018;56(2):202–203
- View Article
- Google Scholar
30. Glazer J, McGuire TG. Optimal risk adjustment in markets with adverse selection: an application to managed care. Am Econ Rev. 2000:90 (4):1055–71
- View Article
- Google Scholar
31. Brown J, Duggan M, Kuziemko I, Woolston W. How does risk selection respond to risk adjustment? new evidence from the Medicare Advantage program. Am Econ Rev. 2014;104(10):3335–64 pmid:29533567
- View Article
- PubMed/NCBI
- Google Scholar
32. Morrisey MA, Kilgore ML, Becker DJ, Smith W, Delzell E. Favorable selection, risk adjustment, and the Medicare Advantage program. Health Serv Res. 2013;48(3):1039–56 pmid:23088500
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Iezzoni L. Risk adjustment for measuring healthcare outcomes. 4th ed. Chicago, IL: Health Administration Press; 2012

[ref2] 2. Kautter J, Pope GC, Ingber M, Freeman S, Patterson L, Cohen M, et al. The HHS-HCC risk adjustment model for individual and small group markets under the Affordable Care Act. Medicare & Medicaid Research Review. 2014;4(3): E1–E46
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Rose S. A machine learning framework for plan payment risk adjustment. Health Serv Res. 2016;6: 2358–2374
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Duncan I, Loginov M, Ludkovski M. Testing alternative regression frameworks for predictive modeling of health care costs. North American Actuarial Journal. 1996;1: 65–87
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref5] 5. Tamang S, Milstein A, Sørensen HT, Pedersen L, Mackey L, Betterton JR, et al. Predicting patient ‘cost blooms’ in Denmark: a longitudinal population-based study. BMJ Open 2017;7: e011580 pmid:28077408
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref6] 6. Shrestha A, Bergquist S, Montz E, Rose S. Mental health risk adjustment with clinical categories and machine learning. Health Serv Res. 2017. pmid:29244202
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref7] 7. Kharrazi H, Lasser EC, Yasnoff WA, Loonsk J, Advani A, Lehmann HP, et al. A proposed national research and development agenda for population health informatics: summary recommendations from a national expert workshop. J Am Med Inform Assoc. 2017;24(1):2–12 pmid:27018264
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref8] 8. Hatef E, Lasser EC, Kharrazi HHK, Perman C, Montgomery R, Weiner JP. A Population Health Measurement Framework: Evidence-Based Metrics for Assessing Community-Level Population Health in the Global Budget Context. Popul Health Manag. 2018;21(4):261–270 pmid:29035630
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref9] 9. Hatef E, Kharrazi H, VanBaak E, Falcone M, Ferris L, Mertz K, et al. A State-wide Health IT Infrastructure for Population Health: Building a Community-wide Electronic Platform for Maryland's All-Payer Global Budget. Online J Public Health Inform. 2017 Dec 31;9(3):e195 pmid:29403574
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref10] 10. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (methodological). 1996;58(1): 267–88.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref11] 11. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 1970;12(1): 55–67
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref12] 12. Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B (methodological). 2005;67(2): 301–320.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref13] 13. Center for Medicare & Medicaid Services. NHE Fact Sheet, 2014. Available at: http://web.archive.org/web/20160329130935/https://www.cms.gov/research-statistics-data-and-systems/statistics-trends-and-reports/nationalhealthexpenddata/nhe-fact-sheet.html. Cited 22 May 2018.

[ref14] 14. Kronick R, Welch WP. Measuring coding intensity in the Medicare Advantage program. Medicare & Medicaid Research Review 2014;4(2): E1–E19
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref15] 15. IMS LifeLink® database. Watertown, MA: IQVIA

[ref16] 16. Johns Hopkins Bloomberg School of Public Health. The Johns Hopkins ACG® System, Version 11.0. Available at http://acg.jhsph.org. Cited 22 May 2018.

[ref17] 17. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference and prediction. 2nd ed. New York: Springer Verlag; 2009

[ref18] 18. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2017. URL https://www.R-project.org/.

[ref19] 19. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software. 2010;33(1): 1–22. URL http://www.jstatsoft.org/v33/i01/ pmid:20808728
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref20] 20. Greene W. Econometric Analysis. 7th ed. Boston: Pearson; 2012.

[ref21] 21. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. npj Digital Medicine 2018;1:18.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref22] 22. Kharrazi H1, Wang C, Scharfstein D. Prospective EHR-based clinical trials: the challenge of missing data. J Gen Intern Med. 2014;29(7):976–8 pmid:24839057
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref23] 23. Kharrazi H, Gonzalez CP, Lowe KB, Huerta TR, Ford EW. Forecasting the Maturation of Electronic Health Record Functions Among US Hospitals: Retrospective Analysis and Predictive Model. J Med Internet Res. 2018 Aug 7;20(8):e10458. pmid:30087090
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref24] 24. Kharrazi H, Chi W, Chang HY, Richards TM, Gallagher JM, Knudson SM, et al. Comparing Population-based Risk-stratification Model Performance Using Demographic, Diagnosis and Medication Data Extracted From Outpatient Electronic Health Records Versus Administrative Claims. Med Care. 2017;55(8):789–796 pmid:28598890
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref25] 25. Lemke KW, Gudzune KA, Kharrazi H, Weiner JP. Assessing markers from ambulatory laboratory tests for predicting high-risk patients. Am J Manag Care. 2018;24(6):e190–e195 pmid:29939509
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref26] 26. Kan HJ, Kharrazi H, Leff B, Boyd C, Davison A, Chang H, et al. Defining and Assessing Geriatric Risk Factors and Associated Health Care Utilization Among Older Adults Using Claims and Electronic Health Records. Med Care. 2018;56(3):233–239 pmid:29438193
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref27] 27. Chang HY, Richards TM, Shermock KM, Elder Dalpoas S, J Kan H, Alexander GC, et al. Evaluating the Impact of Prescription Fill Rates on Risk Stratification Model Performance. Med Care. 2017;55(12):1052–1060 pmid:29036011
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref28] 28. Kharrazi H, Chang HY, Heins SE, Weiner JP, Gudzune KA. Assessing the Impact of Body Mass Index Information on the Performance of Risk Adjustment Models in Predicting Health Care Costs and Utilization. Med Care. 2018;56(12):1042–1050 pmid:30339574
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref29] 29. Kharrazi H, Weiner JP. A Practical Comparison Between the Predictive Power of Population-based Risk Stratification Models Using Data from Electronic Health Records Versus Administrative Claims: Setting a Baseline for Future EHR-derived Risk Stratification Models. Med Care. 2018;56(2):202–203
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref30] 30. Glazer J, McGuire TG. Optimal risk adjustment in markets with adverse selection: an application to managed care. Am Econ Rev. 2000:90 (4):1055–71
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref31] 31. Brown J, Duggan M, Kuziemko I, Woolston W. How does risk selection respond to risk adjustment? new evidence from the Medicare Advantage program. Am Econ Rev. 2014;104(10):3335–64 pmid:29533567
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref32] 32. Morrisey MA, Kilgore ML, Becker DJ, Smith W, Delzell E. Favorable selection, risk adjustment, and the Medicare Advantage program. Health Serv Res. 2013;48(3):1039–56 pmid:23088500
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

Figures

Abstract

Background

Methods and findings

Conclusions

Introduction

Methods

Results

Discussion

Supporting information

S1 Table. Proportions of patients with EDCs (n = 277) and RxMGs (n = 67) in 2012 in the training sample.

S2 Table. Prediction performance of elastic net regression models, by deciles of predicted costs.

References