A Generalized Measurement Model to Quantify Health: The Multi-Attribute Preference Response Model

Paul F. M. Krabbe

doi:10.1371/journal.pone.0079494

Abstract

After 40 years of deriving metric values for health status or health-related quality of life, the effective quantification of subjective health outcomes is still a challenge. Here, two of the best measurement tools, the discrete choice and the Rasch model, are combined to create a new model for deriving health values. First, existing techniques to value health states are briefly discussed followed by a reflection on the recent revival of interest in patients’ experience with regard to their possible role in health measurement. Subsequently, three basic principles for valid health measurement are reviewed, namely unidimensionality, interval level, and invariance. In the main section, the basic operation of measurement is then discussed in the framework of probabilistic discrete choice analysis (random utility model) and the psychometric Rasch model. It is then shown how combining the main features of these two models yields an integrated measurement model, called the multi-attribute preference response (MAPR) model, which is introduced here. This new model transforms subjective individual rank data into a metric scale using responses from patients who have experienced certain health states. Its measurement mechanism largely prevents biases such as adaptation and coping. Several extensions of the MAPR model are presented. The MAPR model can be applied to a wide range of research problems. If extended with the self-selection of relevant health domains for the individual patient, this model will be more valid than existing valuation techniques.

Citation: Krabbe PFM (2013) A Generalized Measurement Model to Quantify Health: The Multi-Attribute Preference Response Model. PLoS ONE 8(11): e79494. https://doi.org/10.1371/journal.pone.0079494

Editor: William John Taylor, University of Otago, New Zealand

Received: June 1, 2013; Accepted: September 22, 2013; Published: November 21, 2013

Copyright: © 2013 Paul F. M. Krabbe. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The author has no support or funding to report.

Competing interests: The author has declared that no competing interests exist.

Introduction

The measurement of health, which is defined as assigning meaningful numbers to an individual’s health status, has proliferated ever since the World Health Organization (WHO) provided its definition of health in 1946 [1]. It wasn’t until 1970 that Fanshel and Bush introduced the first instrument that was able to capture an individual’s health state in a single metric value [2]. Access to single metric values for health states is advantageous as these can be used in health outcomes research, disease modeling studies, economic evaluations, and to monitor the health status of patient groups in the general community. Often the values for health states are expanded by combining it with the duration of these states to obtain health summary measures. A well known example of such a summary measure is the disability-adjusted life years (DALYs) approach that is being used by the WHO to compare different countries with one another on diverse aspects of health. Health economists often apply a rather comparable health summary measure, namely the quality-adjusted life year (QALY).

To quantify health states, these must be described and classified in terms of seriousness and assigned meaningful values (variously called utilities, strength of preference, index, or weights). The first step is thus to clarify the concept of health status. Essentially an umbrella concept, it covers independent health domains that together capture the not yet well defined notion of health-related quality of life (HRQoL). The second step is to assign a value to the health-state description by means of an appropriate measurement procedure. In the past, several measurement models have been developed to quantify subjective phenomena and some of these models have found their way into the valuation of health states. Although the scientific enterprise of measuring health states has been going on for about 40 years, there are still concerns about validity.

The aim of this paper is to forge a linkage between two prominent measurement models to create a single general model that – at least in principle – resolves many of the problems posed by widely used but inferior valuation techniques. This new measurement framework for deriving health-state values is called the multi-attribute preference response (MAPR) model. It combines the characteristics of hypothetical health states with a respondent’s health-status characteristics to quantify both the hypothetical states as well as the location of the patients’ s state. In theory, this new model even allows individuals to choose the attributes (i.e., health domains) describing their health states. A health measurement model with such potential flexibility is unprecedented.

The first section of the paper presents some concerns about the validity of current health-state valuation techniques followed by a section about the basic measurement principles of subjective phenomena (such a health). The next section explains the probabilistic discrete choice model and expands on its relationship to measurement models used in economics and psychology. The subsequent section sketches the history of the Rasch model and summarizes its underlying theory. Finally, the merits of integrating these two measurement models into the MAPR model are discussed. All examples and suggestions in this article apply to health-state valuation. It should be kept in mind that because the MAPR model is very general, it can also be applied in a number of other fields where the goal is to quantify other subjective phenomena.

Existing Valuation Techniques

The standard gamble (SG) and time trade-off (TTO) are frequently used to assign values to health states. The former emerged from the field of economics, the latter from the area of operations research [3], [4]. SG, for years the gold standard, was developed under the expected utility theory of von Neumann and Morgenstern [3]. But as experience shows, assumptions underlying this theory were systematically violated by human behavior. In general, people have difficulty dealing with probabilities and may have an aversion to taking risk. As an alternative, Torrance and colleagues developed TTO, which is simpler to administer than SG. The main drawback of TTO is that the relationship between a health state, its duration, and its value is collapsed into a single measure. The problem is that this requires the values for health states to be independent of the duration of these states. Health-state values have also been derived by another technique, the visual analogue scale (VAS), which stems from the field of psychology [5]. Unfortunately, all of these conventional measurement techniques (SG, TTO, VAS) have theoretical and empirical drawbacks when used to value health states. With the possible exception of the VAS, they put a large cognitive burden on the respondents by demanding a relatively high degree of abstract reasoning [6]. The person trade-off (PTO) is another technique that has been used mainly in the area of policy making [7]. This technique was named by Nord [8], but the technique itself was applied earlier by Patrick et al. [9]. The PTO asks respondents to answer from the perspective of a social decision-maker considering alternative policy choices.

The currently dominant valuation technique for quantifying health states, certainly in the field of health economics, is the time trade-off (TTO). It may be intuitively appealing for three reasons. First, it seems to reflect the actual medical situation. Second, it shows some correspondence to the general health-outcome framework (since the TTO is essentially a QALY equivalence statement). And third, it is grounded in economic thinking (the trade-off principle). Nevertheless, compelling arguments against the TTO have been raised by several authors [10], [11], [12], [13], [14]. In fact, TTO seems to be associated with many problems: practical (difficult for people to perform), theoretical (axiomatic violations, problems in dealing with states worse than dead), and biases (time preference). From a measurement perspective, the TTO technique has been criticized for its susceptibility to framing issues (e.g., duration of the time frame, indifference procedure, states worse than dead). The same holds for the recently introduced technique known as lead-time TTO [15].

Patients versus General Population

Conventionally, values for the health states used in economic evaluations are derived from a representative community sample [16], or in the case of the DALY approach, values for disease states were derived from medical experts [17]. Besides asserting that a sample of the general population is a reflection of the average taxpayer, which is considered fair grounds for arriving at resource allocation, other arguments are put forward. For example, it has been noted that patients may adapt to their health state over a period of time. As a result, they may assign higher values to their own poor health state. Patients may also strategically underrate the quality of their health state, knowing they will directly benefit from doing so (e.g., certain patient groups may be considered as more relevant by policy makers, or effects in cost-effectiveness studies may show more favorable results). The proposition held in this paper is that while adaptation is a real phenomenon, this effect can largely be reduced and eventually eliminated if the health-state values are derived in a fitting measurement framework. Moreover, it is reasonable to assume that healthy people may be inadequately informed or lack the imagination to make an appropriate judgment on the impact of severe health states. This is one of the reasons why researchers in the field of HRQoL are engaged in a debate about which values are more valid [18], [19]. Many researchers assert that individuals are the best judges of their own health status [20]. Therefore, in a health-care context, it is sensible to defend the position that, from a validity perspective, it is the patient’s judgment that should be elicited in order to arrive at health-state values, not that of a sample of unaffected members of the general population. This explains the rise of the so-called patient-reported outcome measurement (PROMs) movement [21]. Voices from another area have also stressed that such assessments from patients (experienced utility) should get more attention [22], [23].

Measurement Principles

Interval Level

There are theoretical and methodological differences between the direct valuation techniques (SG, TTO, VAS) and indirect (latent) measurement models such as probabilistic discrete choice (DC; see next section). But they all assume that individuals possess implicit preferences for health states that range from good to bad. And all of the models maintain that it should be possible to reveal these preferences and express them quantitatively. Accordingly, differences between health states should reflect the increments of difference in severity of these states. For that reason, informative (i.e., metric) outcome measures should be at least at the interval level (cardinal data). This means that measures should lie on a continuous scale, whereby the differences between values would reflect true differences (e.g., if a patient’s score increases from 40 to 60, this increase is the same as from 70 to 90). To arrive at health-state values with these qualities, two other basic measurement principles should be fulfilled, namely unidimensionality and invariance.

Unidimensionality

The overall goal is to use health-state values for computational procedures (e.g., computing QALYs, Markov modeling). For that reason, informative (i.e., metric) outcome measures should be at least at the interval level. This implies positioning the values on an underlying unidimensional scale ranging from the worst health state to the best one. An (implicit) assumption made in the field of health-state valuation is that, in general, individuals evaluate health states similarly, which permits the aggregation of individual valuations to arrive at group or societal values. Specific analyses can be applied to find empirical evidence that health-state values represent a unidimensional structure. An early application of the statistical singular value decomposition routine compared TTO and VAS valuation data. The results showed a clear two-dimensional structure for the TTO [24]. Heterogonous responses (or even distinct response structures) by individuals may indicate that the phenomenon under study (health states) is not characterized as unidimensional or that a certain valuation technique is less appropriate for the task, since it may not fulfill the need for unidimensional responses. Therefore, it is important to determine how similar individuals’ judgments (inter-rater reliability) actually are.

Invariance

Invariance is a critical prerequisite for fundamental measurement (see section: Rasch model). It means that the outcome of judgments between two (or more) health states should not dependent on which group of respondents performed the assessments. The resulting judgments among health states should also be independent of the set of health states being assessed [25]. In the setting of health-state valuation the invariance principle appears to be closely related to the unidimensionality requirement. Rasch models embody the invariance principle. Their formal structure permits algebraic separation of the person and health-state parameters. Specifically, the person parameter can be eliminated during the process of statistical estimation of the health-state parameters. Not surprisingly, the invariance principle is a key characteristic of measurement in physics [25].

Discrete Choice Model

Background

Modern probabilistic discrete choice (DC) models, which come from econometrics, build upon the work of McFadden, the 2000 Nobel Prize laureate in economics [26]. DC models encompass a variety of experimental design techniques, data collection procedures, and statistical procedures that can be used to predict the choices that individuals will make between alternatives (e.g., health states). These techniques are applicable when individuals have the ability to choose between two or more distinct (‘discrete’) alternatives.

In the mid-1960s McFadden was working with a graduate student, Phoebe Cottingham, trying to analyze data on freeway routing decisions as a way to study economic decision-making behavior. He developed the first version of what he called the ‘conditional multinomial logistic model’ (also known as the multinomial logistic model and conditional logistic model). McFadden proposed an econometric model in which the utilities of alternatives depend on utilities assigned to their attributes, such as construction cost, route length, and areas of parkland and open space taken up [27]. He developed a computer program that allowed him to estimate this probabilistic model, which was based on an axiomatic theory of choice behavior developed by the mathematical psychologist Luce [28].

The DC strategy was conceived in transport economics and later disseminated into other research fields, especially marketing. There, DC modeling was applied to analyze behavior that could be observed in real market contexts. Instead of modeling the choices people actually make in empirical settings, Louviere and others started to model the choices made by individuals in carefully constructed experimental studies [29]. This entailed presenting the participants with profiles containing features of hypothetical products. Originally, these profiles were known as simulated choice situations, but later they were called discrete choice experiments (DCEs). So, instead of modeling actual choices, as McFadden had with the revealed preferences approach, Louviere modeled choices made in experimental studies with the stated preferences approach. This new approach also made it possible to predict values for alternatives that could not be judged in the real world. More recently, DC models have been used as an alternative way to derive people’s values for health states [30], [31], [32].

Measurement Model

The statistical literature classifies DC models among the probabilistic choice models that are grounded in modern measurement theory and consistent with economic theory (e.g., the random utility model) [33]. What all DC models have in common is that they can establish the relative merit of one phenomenon with respect to others. If the phenomena are characterized by specific attributes or domains with certain levels, extended DC models such as McFaddens’ model would permit estimating the relative importance of the attributes and their associated levels. DC modeling has good prospects for health-state valuation [32], [34], [35], [36], [37], [38]. Moreover, DC models have a practical advantage: when conducting DCEs, health states may be evaluated in a self-completion format. The scope for valuation research is thereby widened. Most TTO protocols for deriving values for preference-based health-state instruments are interviewer-assisted, as studies have clearly showed that self-completion is not feasible or leads to inaccurate results [39]. The simplicity of DC tasks, however, facilitates web-based surveys [38].

Discrimination mechanism.

The modern measurement theory inherent in DC models builds upon the early work and basic principles of Thurstone’s Law of Comparative Judgment (LCJ) [40], [41]. In fact, the class of choice- and rank-based models, with its lengthy history (1927 to the present), is one of the few areas in the social and behavioral sciences that has a strong underlying theory. It was Thurstone who introduced the well-known random utility model (RUM), although he used different notation and other terminology. The use of Thurstone’s model based on paired comparisons to estimate health-state values was first proposed by Fanshel and Bush [2] in one of the earliest examples of a composed QALY index model.

In Thurstone’s terminology, choices are mediated by a ‘discriminal process’. He defined this as the process by which an organism identifies, distinguishes, or reacts to stimuli. Consider the theoretical distributions of the discriminal process for any two objects (paired comparisons), like two different health states s and t. In the LCJ model, the standard deviation of the distribution associated with a given health state is called the discriminal dispersion (or variance, in modern scientific language) of that health state. Discriminal dispersions may differ for different health states.

Let v_s and v_t correspond to the scale values of the two health states. The difference (v_s–v_t) is measured in units of discriminal differences. The complete form of the LCJ is the following equation.(1)where σ_s, σ_t denotes the discriminal dispersions of the two health states s and t, denotes the correlation between the pairs of discriminal processes s and t, and is the unit normal deviate corresponding to the theoretical proportion of times health state s is judged greater than health state t. The difference is normally distributed with mean v_s – v_t and variance corresponding to , which reflects the standard deviation of the difference between two normal distributions. In its most basic form (Case V) the model can be represented as , for which the probability that state s is judged to be better than state t is.(2)where Φ is the cumulative normal distribution with mean zero and variance unity.

The discrimination mechanism underlying the LCJ is an extension of the ‘just noticeable difference’ that played a major role in early psychophysical research, as initiated by Fechner (1801–1887) and Weber (1795–1878) in Germany. Later on similar discrimination mechanisms were embedded in ‘signal detection theory’, which was used by psychologists to measure the way people make decisions under conditions of uncertainty. Much of the early work in this research field was done by radar researchers [42].

Random utility model.

Thurstone proposed that perceived physical phenomena or subjective concepts (e.g., health states, treatment outcomes, process characteristics) can be expressed as that a respondent r has a latent value (utility) for state s, U_rs, which includes a systematic component and an error term (This is equal to the fundamental idea of true score theory or classical test theory. The latter also consists of an observed score with two components, namely the true score and an error term. It too summarizes different health domains by combining the scores on several items.):(3)

Here, v is the measurable component and is not determined by characteristics of the respondents. In other words, a given health state has the same expected value across all respondents. The assumption in the model proposed by Thurstone is that ε is normally distributed. This assumption yields the probit model. The choice probability is P_rs = Pr(U_rs>U_rt, all t not equal to s), which depends on the difference in value, not on its absolute level. The fact that only differences in value matter has implications for the identification of this model and all its derivates. In particular, it means that the only parameters that can be estimated are those that capture differences across alternatives.

So, in Thurstone’s LCJ, the perceived value of a health state equals its objective level plus a random error. The probability that one health state is judged better than another is the probability that this alternative has the higher perceived value. When the perceived values are interpreted as levels of satisfaction, HRQoL, or utility, this can be interpreted as a model for economic choice in which utility is modeled as a random variable. This assertion was made in 1960 by the economist Marschak, who thereby introduced Thurstone’s work into economics. Marschak called his model the random utility maximization hypothesis or RUM [43], [44]. Like neoclassical economic theory, the RUM assumes that the decision-maker has a perfect discrimination capability. But it also assumes that the analyst has incomplete information, which implies that uncertainty (i.e., randomness) must be taken into account.

Multinomial model.

Another way to analyze comparative data is with the Bradley-Terry-Luce (BTL) model, which was statistically formulated by Bradley and Terry in 1955 [45] and extended by Luce in 1959 [28] (Later it was recognized that the German mathematician Ernst Zermelo had already published about a probabilistic paired comparison model [46]). The BTL models extends the Thurstone model by allowing a person to choose among more than two options. It postulates that measurement on a ratio scale level can be established if the data satisfy certain structural assumptions [47]. For mathematical reasons the BTL model is based on the simple logistic function instead of the normal distribution of the Thurstone model. It is this mathematical model that McFadden used to develop and construct his own specific type of multinomial logit model. If only pairs of alternatives are judged, the BTL model is nearly identical to Thurstone’s model. However, when more than two alternatives are judged, an important mathematical assumption must be made, namely the independence of irrelevant alternatives (see below).

Drawing upon the work of Thurstone, Luce, Marschak, and Lancaster [48], McFadden was able to show how his model fit in with the economic theory of choice behavior. McFadden then investigated further the RUM foundations of the conditional multinomial logistic model. He showed that the Luce model was consistent with the RUM model with IID (independent and identically distributed random variables) additive disturbances if and only if these disturbances had a distribution called extreme value type I. More importantly, instead of one function, as in the classical Thurstone model (only values for health states can be estimated), the conditional multinomial logistic model comprises two functions. First, it contains a statistical model that describes the probability of ranking a particular health state higher than another, given the (unobserved) value associated with each health state. Secondly, it contains a valuation function that relates the value for a given health state to a set of explanatory variables (it will be shown that the same holds for the MAPR model).

Assumptions.

Multinomial logistic regression (MNL) is based on three assumptions: (i) independence of irrelevant alternatives (IIA); (ii) error terms are independent and identically distributed across observations (IID); and (iii) no taste heterogeneity (i.e., homogeneous preferences across respondents). Luce’s choice axiom states that the probability of selecting one item over another from a pool of many items is not affected by the presence or absence of other items in the pool (IIA assumption). The axiom states that if A is preferred to B out of the choice set {A, B}, then introducing a third, irrelevant, alternative X (thus expanding the choice set to {A, B, X}) should not make B preferred to A. In other words, whether A or B is better should not be changed by the availability of X. The IIA axiom simplifies experimental collection of choice data by allowing multinomial choice probabilities to be inferred from binomial choice experiments. It is clear that assumptions i and iii bear some relation to the invariance principle from measurement theory.

Mathematics

In conditional logistic regression, none, some, or all of the observations in a choice set may be marked. McFadden’s choice model (discrete choice) is thus a special case of multinomial logistic regression. In the conditional logit (CL) model, the explanatory variables assume different values for each alternative and the impact of level changes is assumed to be constant across alternatives. The model may be summarized as shown below (Formula 4):(4)whereby v are latent values or utilities of individuals choosing health state s, z_rs indicates a vector of alternative-specific explanatory variables for individual r, and γ represents a single vector of unknown regression coefficients. Under the assumptions described above, the probability that health state s is chosen is equal to:(5a)or,(5b)where, K (one k has to be set as reference) is the number of alternatives (e.g., health states) in the choice set (e.g., 2 in most DC applications) and s is the chosen alternative.

The term multinomial logit (MNL) model refers to a model that generalizes logistic regression by allowing more than two discrete outcomes. It assumes that data are case-specific; that is, each independent variable has a single value for each case. Consider an individual choosing among K alternatives in a choice set. Let x_r represent the characteristics of individual r and β_s the regression parameters.(6)

The probability that individual r chooses health states s is:(7)

Both models can be used to analyze an individual’s choice among a set of K alternatives. The main difference between the two is that the conventional MNL model focuses on the individual as the unit of analysis and takes the individual’s characteristics as explanatory variables. The CL model, in contrast, focuses on the set of alternatives for each individual, while the explanatory variables are characteristics of those alternatives.

It is possible to combine these two models. Doing so would simultaneously take into account the characteristics of both the alternatives and the individual characteristics, using them as explanatory variables. This combination is sometimes called a conditional MNL or mixed model:(8)Where is the value of the alternative s assigned by the individual r. That value () depends on both the alternative characteristics x and on the individuals’ characteristics z. The probability that individual r chooses health states s is:

(9)The most commonly applied types of DC models are presented above. A clear distinction is made between models that take an individual’s characteristics as explanatory variables (MNL) and models with explanatory variables for characteristics of alternatives (i.e., health states). In the next section the Rasch model will be explained. It will be shown that this model has a close similarity to the CL model (Equation 5). As the basic data structure underlying the Rasch measurement model should meet the invariance assumption (see ‘measurement principles’), this rules out incorporating elements of the MNL model (Equation 6–9).