The Frequency of Rapid Pupil Dilations as a Measure of Linguistic Processing Difficulty

Vera Demberg; Asad Sayeed

doi:10.1371/journal.pone.0146194

Abstract

While it has long been known that the pupil reacts to cognitive load, pupil size has received little attention in cognitive research because of its long latency and the difficulty of separating effects of cognitive load from the light reflex or effects due to eye movements. A novel measure, the Index of Cognitive Activity (ICA), relates cognitive effort to the frequency of small rapid dilations of the pupil. We report here on a total of seven experiments which test whether the ICA reliably indexes linguistically induced cognitive load: three experiments in reading (a manipulation of grammatical gender match / mismatch, an experiment of semantic fit, and an experiment comparing locally ambiguous subject versus object relative clauses, all in German), three dual-task experiments with simultaneous driving and spoken language comprehension (using the same manipulations as in the single-task reading experiments), and a visual world experiment comparing the processing of causal versus concessive discourse markers. These experiments are the first to investigate the effect and time course of the ICA in language processing. All of our experiments support the idea that the ICA indexes linguistic processing difficulty. The effects of our linguistic manipulations on the ICA are consistent for reading and auditory presentation. Furthermore, our experiments show that the ICA allows for usage within a multi-task paradigm. Its robustness with respect to eye movements means that it is a valid measure of processing difficulty for usage within the visual world paradigm, which will allow researchers to assess both visual attention and processing difficulty at the same time, using an eye-tracker. We argue that the ICA is indicative of activity in the locus caeruleus area of the brain stem, which has recently also been linked to P600 effects observed in psycholinguistic EEG experiments.

Citation: Demberg V, Sayeed A (2016) The Frequency of Rapid Pupil Dilations as a Measure of Linguistic Processing Difficulty. PLoS ONE 11(1): e0146194. https://doi.org/10.1371/journal.pone.0146194

Editor: Emmanuel Andreas Stamatakis, University Of Cambridge, UNITED KINGDOM

Received: May 1, 2015; Accepted: December 14, 2015; Published: January 22, 2016

Copyright: © 2016 Demberg, Sayeed. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Linguistic and visual world stimuli required to replicate the experiment are included as supporting information files. Result data required to replicate our analyses is available from the Harvard Dataverse at the following DOI link: http://dx.doi.org/10.7910/DVN/QGEAL1.

Funding: This work was supported by German Research Foundation (DFG); both authors were supported by the Cluster of Excellence Multimodal Computing and Interaction EXC284. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Pupil size has long been known to reflect arousal [1] and cognitive load in a variety of different tasks such as arithmetic problems [2], digit recall [3], attention [4] as well as language complexity [5–9], grammatical violations, context integration effects [10] and recently even pragmatic effects [11]. All of these studies have looked at the overall effect of pupil dilation; however, raw pupil dilation as a measure of cognitive load is always at risk of confounding the load reflex with the light reflex, especially in settings where the visual surroundings change or where the screen cannot in all conditions be fully controlled for luminosity of all objects on the screen (note also that the light reflex can even pose a problem in constant lighting conditions, because the pupil exhibits irregular oscillation under the influence of constant light). The “Index of Cognitive Activity” or ICA [12–14] proposes to solve this problem by identifying only those rapid dilations which are related to cognitive load but not the light reflex. The ICA separates the effect of the light reflex on pupil size (which causes larger and slower changes in pupil size) from the effect of the load reflex (observable in the frequency of rapid small dilations) by decomposing the raw pupil size signal with different wavelets to obtain high vs. low frequency components of the signal. The ICA is therefore more robust with respect to changes in ambient light than macro-level pupil dilation [12].

Furthermore, the ICA is a highly dynamic measure: it has low auto-correlation at a lag of 100 ms, and almost no correlation with its own value 200ms earlier [15]. Its low degree of autocorrelation makes the ICA potentially suitable for multi-task settings.

If it reliably reflects linguistic manipulations, the ICA could constitute a useful new method to assess processing load using an eye-tracker, in auditory experiments, as well as in naturalistic environments which are not well suited for the use of EEG, e.g. while driving a car. It could therefore usefully complement the range of experimental paradigms currently used. Secondly, if it proves to be robust to with respect to eye movements (which it theoretically should be, because eye-movement related changes would also show up in the signal as slower effects than the load reflex and hence be filtered out during wavelet analysis), the ICA has potential as an additional measure to be used in the visual world paradigm, thus allowing researchers to assess visual attention and cognitive load at the same time.

The goals of the experiments presented in this article are to test whether

the ICA is sensitive to linguistic manipulations;
the ICA is robust with respect to fixation position on the screen, making it suitable for use within the visual world paradigm;
the ICA allows us to tease apart effects of overlapping stimuli, for example in dual task settings.

This article presents the results of three self-paced reading experiments (Expts 1–3) three dual task experiments (driving and language comprehension; Expts 4–6), and one visual world experiment (Expt 7). Results of these experiments are supportive of the ICA being sensitive to linguistic manipulations; we found significant effects in the expected direction in all of our experiments. Furthermore, the experiments show that the ICA is suitable for usage in dual task scenarios and is robust with respect to fixation position, thus making for an ideal additional measure in visual world studies.

But how is it that the pupil muscles can indicate cognitive processing load or indeed linguistically induced processing difficulty? The fact that the light reflex is different from the load reflex and that the two can be disentagled [12] suggests that the ICA probably has little to do with vision, but is rather a non-functional symptom of cognitive load. Recent literature relates pupil dilation to activity in the locus caeruleus (LC) area [16, 17]. [16] report a strong correlation in primates between activity in the locus caeruleus area and pupil size. The locus caeruleus (LC) is a small bilateral region in the brain stem. LC neurons emit the neuro-transmitter norepinephrine (NE), which can be thought of as having an amplifying effect, i.e., neurons will fire at lower levels of excitation and also be inhibited more easily. Norepinephrine therefore facilitates the functional integration of different brain regions, as its enhancing property makes it more easily possible for neurons to synchronize, and thus it is likely to be beneficial in helping processing in the presence of processing difficulty [18]. For a literature review on the LC system, see also [19]. The LC could plausibly affect language processing, because projections from the LC area are very wide-spread in the brain and reach most language related cortical areas [20].

The LC-NE system is also known to affect heart rate and skin conductance, which are known (but slow) indicators of stress as well as cognitive load. NE is thought to have a role in memory retrieval and memory consolidation [18] and has been found to facilitate the functional integration of different brain regions involved in a task. At medium levels of LC-NE activation, this leads to increased performance, while low activation levels happen during drowsiness, and high activation levels lead to distractability. This is consistent with reports on the ICA [12, 14], which observe very low amounts of rapid dilations in a relaxed task, and very high level of rapid dilations during focused concentration on a task.

While the LC area is not an area usually argued to have much of a role in language processing, the LC-P3 hypothesis [21] argues for a tight correlation between activity in the LC region and the P3b component observed in ERP studies as a general reaction to task-relevant stimuli. The P3b component in turn has been proposed to be related to or functionally equivalent to the P600 effect often observed in psycholinguistic experiments [22]. The potential relation between the LC/NE area and the P600 has received further support by a recent paper [23], providing support for this hypothesis by showing in single-trial alignment analyses that P600 latency is more strongly aligned with response times (response times have been argued to be aligned with the P3b [24]), than with stimulus onset. Based on the P600-as-LC/NE-P3 hypothesis [23], we would like to suggest an interpretation of the ICA in which unexpected stimuli or other difficulty with language processing causes the higher cortical areas that are involved with language processing to signal to the brain stem (in particular, to the LC area) that processing resources are needed. The LC area releases norepinephrine, which floods the brain, thereby enhancing information processing in the language processing areas, while also innervating the pupil muscles (and also affecting heart rate, skin conductance etc.), as a side effect. This effect of NE on the pupil muscles may thus be what we measure when we observe more frequent rapid dilations of the pupil as a reaction to our linguistic manipulations.

The Index of Cognitive Activity

The Index of Cognitive Activity is a measure of cognitive load which has previously only been evaluated on a small range of tasks [12–14, 25, 26], including digit span tasks, visual tasks, and a simulated driving task. Using the ICA as a measure of processing load is motivated by the finding that pupil size can be affected by two different processes: lighting conditions and cognitive activity. In overall pupil dilation, these two effects are confounded, even when light conditions is stable due to the so-called “light reflex”, meaning that the pupil oscillates irregularly and continually. Pupil dilation is controlled by two groups of muscles: circular muscles, which make the pupil contract, and radial muscles, which make the pupil dilate. Furthermore, we know that the activation and inhibition patterns are different for reactions to light and reactions to cognitive activity [12, 27]: dilations due to cognitive activity are very rapid and small, while changes in pupil size due to lighting are slower and larger. The ICA disentangles these patterns by performing a wavelet analysis on the pupil dilation record to remove all large oscillations and retain only the small and rapid dilations. Among all small dilations and constrictions of the pupil, there are then the ones related to cognitive load in which we are interested as well as some random noise. [12] applies a denoising technique which tests for significance of changes in the signal and sets all non-significant changes to zero. The resulting signal then contains rapid dilations and constrictions larger than the threshold for denoising. For the calculation of the ICA, only the rapid dilations are considered. For an example of a 2-second recording from our own experiment along with marks for where a rapid dilation was detected, see Fig 1.

Download:

Fig 1. An example of a pupil size recording from our experiment, showing time points of where a rapid dilation was detected according to the ICA software (red triangles).

https://doi.org/10.1371/journal.pone.0146194.g001

[14] demonstrates in an experiment that uses a 2x2 design (no task vs. math calculations by light vs. dark room) that the ICA values obtained using this procedure are dependent on the level of cognitive load, but not on light conditions. The ICA has also been related to the task-evoked pupillary response (TEPR), see [28], but has been shown to be more robust with respect to light changes [14].

The method proposed in [12, 14] for calculating the ICA is to count the number of rapid small pupil dilations per second, to normalize by dividing by the number of expected rapid dilations per second, and then to transform the result using the hyperbolic tangent function. The method is patented, and the analysis program, the Cognitive Workload Module, has to be licensed from EyeTracking Inc., San Diego, CA. For details see [12]; results reported here were obtained using EyeWorks 3.8. To obtain a continuous measure, blinks are factored out by linear interpolation of adjacent time spans. By default, an ICA value per second is produced. Existing publications by Marshall refer to averaged tangent-transformed ICA values per second.

For our experiments here, we are interested in a much more fine-grained analysis; in particular, we would like to learn about the time course of the ICA and get an idea of where we should look for a stimulus-related effect. However, existing studies by Marshall and colleagues average ICA values across blocks, i.e., they do not provide any analysis of the more exact time course of the ICA as a response to an experimental manipulation. Furthermore, the compared conditions do not report effect sizes for different levels of difficulty on the same task; rather, they report effect sizes for doing no task vs. solving math problems or doing one task vs. two tasks at the same time. Therefore, we cannot derive from the published studies any specific predictions regarding the expected effect size for linguistic manipulations, or regarding the exact time course of the ICA effect.

The only work that provides a more detailed analysis of the time course of the ICA is [15], who employ the same kind of steering task as the one described here as part of experiment 4. [15] show in a cross-correlation analysis of a steering task and the ICA that the correlation between stimulus and effect on ICA is highest at a time lag of roughly 1.1 seconds. We subsequently conducted a more detailed analysis of this data, and found that subjects who performed well on the task (most subjects) had a latency between stimulus and ICA of 1 second, while subjects who performed very poorly on the driving task showed a delayed or no effect on the ICA (we also replicate this observation in this article as part of experiment 4). As all our participants here are highly proficient (native speakers) of German, we hypothesize that we will be able to observe the effects of our linguistic manipulations also with an approximate lag of 1s. For all of the experiments reported in this article, we report results for a window of 500ms of observations (750–1250ms post stimulus onset). This window was chosen post-hoc, due to the exploratory nature of the current study. From Marshall’s studies [14], we know that we can expect an ICA in the range of 0.67 to 0.87 (which corresponds to 2.4–3.9 rapid dilations per 100ms) for our driving task. For other tasks (e.g., math or visual tasks), Marshall reports much lower ICA values; however, those values are averaged across blocks which don’t require continuous attention. As all of the critical regions of tasks on which we will evaluate require continuous attention, we expect to see ICA values in the range of 2.4 to 3.9 rapid dilations per 100ms across all of our experiments.

In order to provide a more detailed insight into the time course of the ICA, we calculate a per-100-ms ICA value from the number of the rapid dilations per 100ms. Due to the short duration of the 100ms intervals, it does not make sense to interpolate for blinks in this setting, as blinks take about 100ms. We furthermore excluded all data points for which the pupil size estimate was smaller than 2.5 standard deviations from the average pupil size of that participant, because we wanted to be sure to avoid partial blinks. For our analyses here, we will report directly the number of rapid dilations per time span, for ease of interpretation and because the number of rapid dilations are normally distributed, whereas ICA values, due to the hyperbolic tangent transformation, have a skewed distribution; see for more detail also the analysis in Demberg (2013) [15].

All of the data reported in this article was recorded using an EyeLink II eye tracker, at 250Hz on both eyes. The ICA has however also been successfully employed by the developers with other trackers, see [29] for details. The wavelet transformation and calculation of rapid small dilations was performed using the EyeWorks Workload Module software. For each of the experiments, we decided to recruit 24 participants before starting to run the experiment. Each of the experiments also contained 24 items (with the exception of experiment 7, for which we had 20 items).

Overview

Experiments 1, 2 and 3 directly address our question of whether the ICA reflects linguistically induced cognitive load. We chose three well-established experimental manipulations, testing for the effect of ungrammaticality, thematic fit and subject vs. object relative clause effects. Experiments 4–6 test the same stimuli in a dual-task setting where the participants have to do a steering task in a driving simulator and listen to speech-synthesized versions of the stimuli. Experiment 7 tests the processing of short stories containing causal vs. concessive discourse connectives in the visual world paradigm.

Ethics Statement

The Head of the Saarland University ethics committee confirmed that if no confidential information is collected, if the experiments do not induce a stressful situation and do not involve negative, or emotionally adverse stimuli, then such a study does not need approval to be conducted. All data were anonymized before the authors had access for analysis. Written informed consent was obtained from all participants prior to the start of the experiment. The person to whom the eye image (Striking image) belongs provided consent to have the image of their eye published.

Experiment 1: Grammatical Gender Mismatch in SPR

Procedure

We recruited 24 German native speakers as participants in our experiment. Participants were 19–36 years old (average age 24 years), 18 of them were female, and 23 of them were right eye dominant. All of our participants received course credit for their participation.

Materials were presented using the word-by-word self-paced reading paradigm with center-of-screen presentation in order to minimize eye-movements. The materials for the experiment consisted of three training examples as well as 96 German sentences: 24 stimuli each, from experiments 1–3, as well as 24 fillers which were unambiguous relative clauses and contained either semantic or grammatical problems. Half of the items seen by a subject were thus grammatical and plausible stimuli, and half were either ungrammatical or implausible. Each person only saw one version of each item, and we made sure in all experiments that the critical region was not sentence final in order to avoid sentence wrap-up effects in the critical region.

Each sentence was followed by a question asking whether the sentence had been grammatical and made sense. Participants responded using a response pad. Answers were balanced so that “yes” was the correct answer half of the time. Experiment duration was 20–30 minutes.

Materials

The materials for the gender mismatch experiment included 24 items, where the gender of the determiner and adjective did vs. did not match the grammatical gender of the noun (see Example (1), with the noun in bold face; the full set of items is provided in S1 Text).

(1). Simone hatte eine(n) schreckliche(n) Traum und keine Lust zum Weiterschlafen.
“Simone had a_[masc/fem] horrible_[masc/fem] dream_masc and didn’t feel like sleeping any longer.”

Data Analysis

We centered and scaled pupil size estimates for each participant. In order to make sure that our results can be attributed to actual changes in pupil size (and not to partial blinks or track loss), we excluded all data points where the pupil size estimate was smaller than 2.5 standard deviations from the average pupil size of that participant, which resulted in a loss of 2% of data points. Binary predictors are encoded using dummy coding.

The data (for this and all other experiments reported in this article) was analysed using R version 3.1.2, lme4 package version 1.1–7 [30]. Spline models were calculated using package gamm4 version 0.2–3 [31].

All of the generalized linear mixed effects models reported in this article include random intercepts under participant and item, as well as random slopes for the predictor of interest (i.e., the linguistic manipulation) under both participant and item, unless otherwise specified (when a model with the full random effect structure did not converge). Confidence intervals were computed using the “Wald” method. We calculated separate models for the ICA on the left eye and on the right eye for a period of 500ms, peaking at 1s post stimulus-onset, as well as regression models that include the data from both eyes in a single regression model. In these models, a random effect of eye was included as a nested effect under subject.

The linear mixed effects model used the number of ICA events as a response variable, that is, the number of ICA events for the 500ms target window were added, and values for missing 100ms time windows were interpolated. Each trial is thus represented by one data point per eye. As the “raw ICA” response variable is a count variable, we use a poisson distribution in our mixed effects models. We compared models with and without the linguistic condition as a predictor to see whether our manipulation affected the ICA. As additional predictors in the model, we include the order in which items were shown within in the experiment (as a main effect and as an interaction between grammatical condition and item order), to account for any learning effects during the experiment. Furthermore we tested whether fixation position or the presence of large saccades affected the ICA: we calculated average fixation position in terms of X and Y coordinates, as well as the maximal difference between fixation positions during our critical region. As X and Y axis fixation position may not be linear predictors of the ICA, we always include these factors in our regression models as non-parametric smooth functions using gam / gamm4 models.

We performed backward model selection for fixed effects throughout the experimental analyses reported in this article. Predictors are only included in the final models in case they significantly improve model fit. We additionally calculated models with maximal random effects structure as well as models where we performed forward selection on random effects. All of our tables report results for maximal random effects structure, and the text furthermore reports results from random effects structure as determined by model selection.

Results

Self-paced Reading Times and Question-Answer Accuracy.

As a first sanity check of whether our experimental manipulation was successful, we analysed the self-paced reading times. As expected, reading times were longer on the critical region when the grammatical gender of the noun did not fit the gender marking of the preceding determiner, see Fig 2. A linear mixed effects model with random slopes under subject and item also confirmed that the effect is significant at the critical region: β = 176.87, t = 2.829, 95%CI = [54.32, 299.41]. We also observe that participants read slower in the grammatical condition a few words after the critical region. We interpret this effect as a consequence of the grammaticality judgment task that people had to perform in this experiment: when they had figured out that the sentence is ungrammatical, they read faster because the knew how to answer the question; when the sentence was grammatical, they increase attention and read more slowly as to not miss any problem with the sentence. Question answer accuracies were very high in both conditions for this experiment: 96% correct “yes” answers for the grammatically correct sentences, and also 96% correct “no” answers for the ones containing a gender violation.

Download:

Fig 2. Self-paced reading times for the grammatical gender violation experiment.

Critical regions for the experiment is located at word number 0.

https://doi.org/10.1371/journal.pone.0146194.g002

Next, we are interested in whether we find a higher frequency of rapid pupil dilations (i.e., higher ICA values) in the critical region of the more difficult linguistic condition.

Index of Cognitive Activity.

Fig 3 shows the average number of rapid dilations per 100ms window per time frame. The plotting was performed based on the 100ms bins. We can see a sharp increase in the number of ICA events for the ungrammatical condition in the time period of 600 to 1200 after the onset of the critical word showing on the display. We can also see in the figure that the ICA is very similar for both eyes (compare dotted and solid lines of each color). Another interesting observation from this figure is that the ICA values are lower in the ungrammatical condition compared to the grammatical ones, much like we also observed in self-paced reading times.

Download:

Fig 3. Raw ICA for the grammatical gender violation experiment.

Critical region onset for the experiment is located at word number 0; the grey bar marks the hypothesised critical region of 500ms, peaking at 1s. The difference between conditions in this region is significant at p < 0.05 on both eyes, see also Table 1.

https://doi.org/10.1371/journal.pone.0146194.g003

Regression analysis showed that the predictors for item order, the interaction of item order and grammaticality, X axis fixation position, and Y axis fixation position did not improve model fit for regression models of either eye for the gender mismatch experiment.

We ran linear mixed effects models for both eyes (separately and combined) to test whether the difference between conditions is statistically significant for a time period of 500ms, peaking at 1s. This region from 750 to 1250ms post critical region onset is marked by a grey bar in Fig 3. In the grammaticality experiment, models with a fixed effect for condition and full random effects structure showed significantly better fit than models without the fixed effect (but still including all the same random effects) (left eye: χ² = 4.34, p = 0.037; right eye: χ² = 5.38, p = 0.02; both: χ² = 4.66, p = 0.031). The resulting model for both eyes is shown in Table 1. We can see that gender mismatch is a significant positive predictor of average ICA values in the region of 750–1250ms post critical region onset. Separate models for each eye yield equivalent results for the predictor gender mismatch (left eye: β = 0.133, z = 2.34, p = 0.19, with maximal random effects structure; right eye: β = 0.112, z = 2.46, p = 0.14, without a slope of gender mismatch under item, as the full random effects structure model didn’t converge).

Download:

Table 1. Linear mixed effects models for both eyes in the self-paced reading experiment with grammatical gender match / mismatch manipulation.

https://doi.org/10.1371/journal.pone.0146194.t001