Suboptimal Choice in Pigeons: Stimulus Value Predicts Choice over Frequencies

Aaron P. Smith; Alexandria R. Bailey; Jonathan J. Chow; Joshua S. Beckmann; Thomas R. Zentall

doi:10.1371/journal.pone.0159336

Abstract

Pigeons have shown suboptimal gambling-like behavior when preferring a stimulus that infrequently signals reliable reinforcement over alternatives that provide greater reinforcement overall. As a mechanism for this behavior, recent research proposed that the stimulus value of alternatives with more reliable signals for reinforcement will be preferred relatively independently of their frequencies. The present study tested this hypothesis using a simplified design of a Discriminative alternative that, 50% of the time, led to either a signal for 100% reinforcement or a blackout period indicative of 0% reinforcement against a Nondiscriminative alternative that always led to a signal that predicted 50% reinforcement. Pigeons showed a strong preference for the Discriminative alternative that remained despite reducing the frequency of the signal for reinforcement in subsequent phases to 25% and then 12.5%. In Experiment 2, using the original design of Experiment 1, the stimulus following choice of the Nondiscriminative alternative was increased to 75% and then to 100%. Results showed that preference for the Discriminative alternative decreased only when the signals for reinforcement for the two alternatives predicted the same probability of reinforcement. The ability of several models to predict this behavior are discussed, but the terminal link stimulus value offers the most parsimonious account of this suboptimal behavior.

Citation: Smith AP, Bailey AR, Chow JJ, Beckmann JS, Zentall TR (2016) Suboptimal Choice in Pigeons: Stimulus Value Predicts Choice over Frequencies. PLoS ONE 11(7): e0159336. https://doi.org/10.1371/journal.pone.0159336

Editor: Jorge Mpodozis, Universidad de Chile, CHILE

Received: March 18, 2016; Accepted: June 30, 2016; Published: July 21, 2016

Copyright: © 2016 Smith et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: Alexandria R. Bailey was supported by an Oswald Research and Creativity Award.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Suboptimal choice, or when animals choose an alternative that leads to less reinforcement than another, has gained attention in part because such counterintuitive results are inconsistent with optimal foraging theory [1]. This theory, emphasizing an evolutionary basis of behavior, suggests that animals will prefer alternatives that maximize energy intake, or reinforcement, while minimizing effort. However, various choice procedures have demonstrated conditions in which animals prefer stimulus alternatives that lead to as little as 10% of the reinforcement as the optimal alternative [2–7]. Further, this suboptimal behavior may resemble and serve as a model for human gambling [8–10], in which humans regularly engage in behavior that normally results in a net loss of resources [11].

The design of such an experiment is illustrated in Fig 1. With this procedure, pigeons choose between two initial link stimuli. Choice of the left initial link stimulus leads, 20% of the time, to a red stimulus for 10 s that always signals reinforcement but leads to a blue stimulus 80% of the time that signals the absence of reinforcement. Thus, the left initial link predicts 20% reinforcement overall. Alternatively, choice of the right initial link stimulus always leads to a stimulus (20% of the time to a green stimulus and 80% of the time to a yellow stimulus) that predicts reinforcement 50% of the time. Thus, even though the right choice alternative predicts 2.5 times as much reinforcement, Stagner and Zentall [6] found that pigeons show a very strong preference for the left, suboptimal alternative. We refer to this procedure as the 4-stimulus design because of the number of terminal link stimuli. Using this and other similar 4-stimulus procedures [2, 12, 13], large and consistent preferences for the suboptimal alternative have been found.

Download:

Fig 1. Design of the 4-stimulus suboptimal choice experiment using a spatial discrimination between two white keys.

The left alternative shows possible outcomes for choosing the Discriminative alternative while the right alternative shows the Nondiscriminative outcome.

https://doi.org/10.1371/journal.pone.0159336.g001

It appears from this research that pigeons prefer the terminal link stimulus with the greatest ability to predict reinforcement (100%) despite the fact that it occurs only 20% of the time. This finding suggests that the signal for reward omission (the S-) that occurs 80% of the time has little conditioned inhibitory effect. Indeed, Laude, Stagner, and Zentall [14], using a compound cue test that combined the signal for reinforcement and the signal for the absence of reinforcement, found early in training the S- showed some conditioned inhibitory strength that weakened with added training. Additionally, using an unavoidable diffuse houselight as the S- signal [15] and varying the amount of time spent in the presence of the S- signal [16] showed little effect on suboptimal preference. Thus, it appears that the signal for nonreinforcement has little effect on choice and strengthens the conclusion that the primary determinant of initial link choice is the predictive value of the conditioned reinforcers [12].

Early research that first described suboptimal choice [3, 5, 17–19] could be interpreted as consistent with this conclusion. In those experiments, a design similar to that shown in Fig 2 (denoted the 3-stimulus design) was used in which the optimal (right) alternative led to a stimulus that always predicted reinforcement, whereas the suboptimal (left) alternative led, 50% of the time, either to a stimulus that was always followed by reinforcement or a different stimulus that was never followed by reinforcement. With this procedure, however, inconsistent suboptimal preferences were typically found with what often appeared to be large individual differences. If the value of the terminal link conditioned reinforcer determines the initial link preference, the pigeons in this earlier research should have been indifferent between the two alternatives because both had a conditioned reinforcer that perfectly predicted reinforcement.

Download:

Fig 2. Design of the 3-stimulus suboptimal choice experiment using a spatial discrimination between two white keys.

The left alternative shows possible outcomes for choosing the Discriminative alternative while the right alternative shows the Nondiscriminative outcome.

https://doi.org/10.1371/journal.pone.0159336.g002

Smith and Zentall [20] concluded that these inconsistent results may have resulted from the fact that the optimal and suboptimal alternatives were signaled by their spatial location alone. The problem with this use of a spatial initial link discrimination is that when pigeons are indifferent between two alternatives they often revert to spatial biases that may have given the impression of a strong preference for that alternative. To test this hypothesis, Smith and Zentall [20] used the 3-stimulus design illustrated in Fig 2 but employed a visual discrimination in the initial link stimuli signaling optimal and suboptimal alternatives that randomly changed locations between trials. Under these conditions, if the pigeons were indifferent between the two alternatives and developed a spatial bias, initial link preference would be at 50% due to their changing locations. Consistent with their hypothesis, the pigeons showed indifference between the initial link alternatives. This indifference was also not a result of a failure to discriminate between the initial link cues as further manipulation of the values of the conditioned reinforcers continued to accurately predict initial link preferences.

Although other factors may contribute to suboptimal choice (e.g., [3, 17, 18]), the results reported by Smith and Zentall [20] indicate that the conditioned reinforcement value of the terminal links indeed play a critical role in choice preferences that they termed the stimulus value hypothesis [8, 12, 20]. A similar conclusion was reached by Mazur (e.g. [3, 21]), however interpretation of this research is made difficult because, as described above, initial link spatial discriminations were used. A further extension of the stimulus value prediction is that, paradoxically, the relative frequencies of the predictive stimulus should play little role in its preference. Indeed, this conclusion was supported by recent research with starlings in which decreasing the frequency of the S+ stimulus showed little decline in initial link preference [7], but has yet to be demonstrated in pigeons.

Thus, the purpose of the present experiments was to extend the research reported by Smith and Zentall [20] by systematically varying the frequency of the conditioned reinforcers. Additionally, because of the evidence that the S- signal has been found to have little effect on initial link choice [14–16], the present study used a simplified 2-stimulus design involving only two terminal link stimuli with one following each initial link alternative (see Fig 3; see also [3]). In Experiment 1, the reinforcement rates associated with the two alternatives were initially equal with choice of the Discriminative option leading, 50% of the time, to a signal for either reinforcement or blackout for 10 s. Conversely, choice of the Nondiscriminative alternative always led to a terminal link stimulus that signaled reinforcement 50% of the time. If the stimulus value hypothesis is correct, these conditions should result in preference for the Discriminative alternative. Then, in subsequent phases, reducing the frequency of the Discriminative alternative’s predictive stimulus (and its reinforcement rate) should not reduce this preference even when it becomes suboptimal.

Download:

Fig 3. Design of the 2-stimulus suboptimal choice experiment using a visual discrimination.

The left alternative shows possible outcomes for choosing the Discriminative alternative while the right alternative shows the Nondiscriminative outcome.

https://doi.org/10.1371/journal.pone.0159336.g003

Experiment 1

Method

Subjects.

Ten pigeons (five Homing pigeons and five White Carneau) approximately 8–12 years old originally purchased from the Palmetto Pigeon Plant (Sumter, SC) with previous experience in probabilistic choice tasks were used in the experiment. Subjects were housed in individual cages measuring 28 × 38 × 30.5 cm and maintained at 80%-85% their free feeding weight with free access to grit and water on a 12:12 light-dark cycle (lights off at 7 pm).

Ethics statement.

All research was approved by the University of Kentucky Institutional Animal Care and Use Committee (Protocol 01029L2006).

Apparatus.

The experiment was conducted in a standard LVE/BRS (Laurel, MD) chamber measuring 56 × 42 × 37 cm. The pigeons responded to a panel with three square keys approximately 24 cm above the floor, 2.6 cm across, and 1.5 cm apart. The center key was not used in these experiments. A 12-stimulus inline projector (Industrial Electronics Engineering, Van Nuys, CA) behind each key projected one of four stimuli (red, green, plus on a dark background, or circle on a dark background) onto the response keys. A center mounted feeder was located 9 cm beneath the keys that, when raised, was illuminated by a V 0.04-A lamp and allowed access to mixed grain. White noise was generated from outside the chamber, and a computer in an adjacent room controlled the experiment using Med-PC IV [22] with a 10-ms resolution.

Procedure.

Subjects first trained on an autoshaping procedure. One of the four stimuli were presented on the left or right response key and, after either 30 s or a response to the key, the stimulus was turned off and the feeder was raised for 2 s. Sessions consisted of 60 trials (15 reinforcements per stimulus, counterbalanced across locations). Training continued until two consecutive sessions in which a peck was made 95% of the time.

The procedure (see Fig 3) consisted of both forced and free choice trials separated by a 10-s intertrial interval. On free choice trials, concurrently presented initial link alternatives of a plus and circle on a dark background appeared randomly on either side key (counterbalanced for spatial location across trials). A response to either initial link stimulus extinguished both stimuli. Choice of the Discriminative alternative resulted in either the illumination of the predictive terminal link stimulus (red or green) for 10 s that was always followed by reinforcement or a 10-s blackout period (during which the chamber was dark) 50% of the time. Choice of the Nondiscriminative alternative was always followed by the nonpredictive terminal link stimulus for 10 s but was followed by reinforcement only 50% of the time. Thus, the two alternatives each predicted reinforcement 50% of the time but the terminal link stimulus signaling that reinforcement differed. We refer to the stimulus following choice of the Discriminative alternative the predictive stimulus whereas the stimulus following choice of the Nondiscriminative alternative as the nonpredictive stimulus. Forced choice trials were identical to free choice except only one initial link alternative appeared randomly on either side key, forcing the subject to experience the contingencies associated with that alternative. Sessions consisted of 72 trials, 24 free and 48 forced, with initial and terminal link stimuli counterbalanced across subjects. A stability criterion of at least 15 sessions in which there was no visual or statistically significant trend as defined by a non-zero slope of a line fit through the last 5 sessions was used resulting in 25 sessions of training.

In Phase 2, the probability of the appearance of the predictive stimulus was reduced from 50% to 25%. Training continued again until stability was reached at 25 sessions. In Phase 3, this probability was further reduced to 12.5% with training to stability at 16 sessions. To ensure proper counterbalancing between stimuli and spatial locations, the number of trials in Phase 3 was reduced to 56, 24 free and 32 forced trials. The probability of reinforcement associated with the Nondiscriminative alternative remained at 50% thus making the choices between 25% or 12.5% against 50% reinforcement in Phases 2 and 3, respectively.

Data analysis.

Choice data were examined using linear mixed effects models in JMP over all sessions of training with subject, phase, and session as factors. Subject was treated as a nominal random factor, phase as a nominal fixed factor, and session as a nominal continuous factor. Latencies to choose and number of pecks to the terminal link stimuli during forced choice trials were also analyzed using linear mixed effects over the average of the last five sessions of training with subject, phase, and trial type (Discriminative or Nondiscriminative) as factors. Subject was again treated as a random nominal factor, phase as a fixed continuous factor, and trial type as a fixed nominal factor. Reported means and standard errors represent the average from the last five sessions of training in each phase.

Results

Choice data.

In Phase 1 (50% vs. 50%), where reinforcement was equated between the alternatives, the pigeons showed a strong preference for the Discriminative alternative (M = 88.58; SEM = 5.14; see Fig 4). In Phase 2, when the reinforcement associated with the Discriminative alternative was reduced to 25% (25% vs. 50%), a malfunctioning response key for several sessions resulted in a decreased preference for the Discriminative alternative. When the key was repaired, this preference returned to its previous level (M = 87.17; SEM = 4.37). Thus, there appeared to be little effect of the decrease in reinforcement rate on initial link preference for the majority of pigeons. Finally, in Phase 3, when the reinforcement rate for the Discriminative alternative was again cut in half to 12.5%, there appeared to be a small decrease in preference for the Discriminative alternative by the last five sessions of training (M = 79.83; SEM = 5.37). To quantify these apparent trends, the linear mixed effects model revealed significant effects of session, F(1, 9.58) = 14.07, p = .004, and a Phase × Session interaction, F(2, 17.7) = 20.06, p < .001. Post-hoc analyses indicated a significantly increasing slope in choice of the Discriminative alternative in Phase 1, p < .001, but slopes not significantly different from zero in Phases 2 and 3, p ≥ .209. Thus, the data suggest that although there were individual differences (see Stimulus Value section below), the pigeons preferred the Discriminative alternative when the reinforcement rate between the alternatives was equal and they showed no reduction in preference despite reducing its reinforcement rate by 75% in Phase 3. Finally, Table 1 shows the average food earned from each alternative across the last five sessions of each phase.

Download:

Fig 4. Experiment 1: Mean percentage choice (± SEM) of the Discriminative alternative as a function of session for Phases 1–3.

Vertical lines indicate phase changes, the horizontal line indicates indifference between the two altrnatives (50%), and the solid horizontal line indicates sessions where the left response key intermittently malfunctioned.

https://doi.org/10.1371/journal.pone.0159336.g004

Download:

Table 1. Average food reinforcers earned from each alternative across the last five sessions of Phases 1–3 in Experiment 1.

https://doi.org/10.1371/journal.pone.0159336.t001

Latency & response rate data.

Overall, latencies to choose the Discriminative alternative were generally shorter than the Nondiscriminative alternative (see Fig 5). Additionally, across phases, latencies to choose the Discriminative alternative appeared to increase as their reinforcement rates decreased. The linear mixed effects model, however, revealed only a significant effect of trial type, F(1, 9) = 134.61, p < 001, indicating that latencies to respond to the Discriminative alternative were significantly shorter; the Trial Type × Phase interaction did not reach statistical significance, p = .053. Response rates to the terminal link stimuli during forced choice trials (see Fig 5) were quite similar. There appeared to be a small increase in response rates to the Discriminative alternative’s predictive stimulus, however the linear mixed effects model revealed no significant effect of trial type, p = .052.

Download:

Fig 5. Experiment 1: Mean logged latencies (± SEM) to choose (left panel) and terminal link responses (right panel) for Phases 1–3.

https://doi.org/10.1371/journal.pone.0159336.g005

Discussion

The purpose of Experiment 1 was to further test the stimulus value hypothesis by systematically reducing the frequency of the Discriminative alternative’s predictive stimulus (and reinforcement rate) compared to an alternative that did not have as good a predictive stimulus but provided more food overall. If the stimulus value hypothesis is correct, the Discriminative alternative should be preferred initially and remain preferred despite a reduction in its reinforcement rate. In support of this prediction, the Discriminative alternative was highly preferred in Phase 1. When that alternative was devalued by reducing the amount of primary reinforcement associated with it, there was no significant change in preference for it in Phases 2 and 3. Further, even though there was a key malfunction in Phase 2 (see Fig 4, 25% vs. 50%), once it was repaired the preference for the discriminative alternative returned to the level it was in Phase 1. These findings support the results found with starlings [7], the general conclusions of Mazur [3, 21], and the predictions made by Smith and Zentall [20]. Additionally, these results indicate that omitting the signal for nonreinforcement following choice of the Discriminative alternative does not appear to change the preference for the suboptimal alternative compared with studies using a signal (S-) for the absence of reinforcement (e.g. [6]). Finally, although one might argue that carryover effects from Phase 1 may have maintained the preferences found in Phases 2 and 3, previous research has shown similar effects when the probability of the predictive stimulus was only 20% from the outset of training (e.g., [6]), suggesting that carryover effects are not responsible for these results.

The shorter latencies to respond to the Discriminative alternative are also consistent with its greater preference. While only qualitative, this finding supports sequential choice theories that have posited differential choice latencies are indicative of choice preferences [7, 23]. Also of interest was the fact that, as the Discriminative alternative was devalued in Phases 2 and 3, latencies to respond to the Discriminative alternative began to rise, while latencies to the Nondiscriminative alternative remained relatively stable. Although the interaction was not statistically significant, the finding that devaluing an alternative selectively affected only latencies to that alternative is consistent with sequential choice theories. Future research using latency of response to the alternatives may be able to reduce the variance of those latencies by requiring an orienting response to the center response key prior to presentation of the initial choice alternatives. Response rates to the terminal link stimuli, however, did not appear to reflect choice preferences despite a tendency for increased pecking to the Discriminative alternative’s terminal link. Indeed, one might posit that, due to its reduced frequency, the appearance of the Discriminative alternative’s clear signal for reinforcement might elicit greater behavioral activation as has been suggested elsewhere [24], yet the present study did not find evidence of this (perhaps due to a ceiling effect). Similar to previous accounts, the present results suggest that response rates are not always indicative of an alternative’s value (see [25] for a discussion).

In Experiment 1 we tested predictions made by the stimulus value hypothesis [8, 12, 20] by decreasing the frequency of the Discriminative alternative’s predictive stimulus. Alternatively, one could also test the stimulus value hypothesis by increasing the predictive value of the Nondiscriminative alternative’s terminal link stimulus. According to the stimulus value hypothesis, if the contingencies start as in Phase 1 with 50% reinforcement for both choices and then the predictive value of the Nondiscriminative alternative’s terminal link is increased to 100%, preference should remain for the Discriminative alternative until the terminal links have equal predictive value (100%). At this point, according to the stimulus value hypothesis, there should be indifference (50% preference) between the two alternatives because conditioned reinforcers following both alternatives should predict reinforcement equally [20].