Order among chaos: Cross-linguistic differences and developmental trajectories in pseudoword reading aloud using pronunciation Entropy

Elisabetta De Simone; Elisabeth Beyersmann; Claudio Mulatti; Jonathan Mirault; Xenia Schmalz

doi:10.1371/journal.pone.0251629

Abstract

In this work we propose the use of Entropy to measure variability in pronunciations in pseudowords reading aloud: pseudowords where participants give many different pronunciations receive higher Entropy values. Monolingual adults, monolingual children, and bilingual children proficient in different European languages varying in orthographic depth were tested. We predicted that Entropy values will increase with increasing orthographic depth. Moreover, higher Entropy was expected for younger than older children, as reading experience improves the knowledge of grapheme-phoneme correspondences (GPCs). We also tested if interference from a second language would lead to higher Entropy. Results show that orthographic depth affects Entropy, but only when the items are not strictly matched across languages. We also found that Entropy decreases across age, suggesting that GPC knowledge becomes refined throughout grades 2-4. We found no differences between bilingual and monolingual children. Our results indicate that item characteristics play a fundamental role in pseudoword pronunciation variability, that reading experience is associated with reduced variability in responses, and that in bilinguals’ knowledge of a second orthography does not seem to interfere with pseudoword reading aloud.

Citation: De Simone E, Beyersmann E, Mulatti C, Mirault J, Schmalz X (2021) Order among chaos: Cross-linguistic differences and developmental trajectories in pseudoword reading aloud using pronunciation Entropy. PLoS ONE 16(5): e0251629. https://doi.org/10.1371/journal.pone.0251629

Editor: Madelon van den Boer, Universiteit van Amsterdam, NETHERLANDS

Received: May 20, 2020; Accepted: April 30, 2021; Published: May 19, 2021

Copyright: © 2021 De Simone et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data and script files are available at this link https://osf.io/94wjx/.

Funding: EB was supported by a Discovery Early Career Researcher Award (DECRA) by the Australian Research Council (DE190100850) https://www.arc.gov.au/grants/discovery-program/discovery-early-career-researcher-award-decra The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

It is common practice in reading research to use pseudowords in order test participants’ ability to use grapheme-phoneme correspondences (GPCs) to correctly retrieve sound from print [1]. This ability is considered fundamental to learning to read: since children at the beginning of reading acquisition do not have a large sight vocabulary, they need to more heavily on their knowledge of letter-sound correspondences to assemble the correct pronunciation, a process known as phonological decoding [2].

Pseudowords have received a great deal of attention in this field. Pseudowords are graphotactically legal stimuli with plausible pronunciations [3]. Their importance lies in their helpfulness in predicting poor reading skills: studies have shown that dyslexic readers perform worse than their non-impaired peers on pseudowords reading aloud tasks [4, 5]. Pseudowords are usually assessed by calculating reaction times (the time between stimulus onset and voice onset) and reading accuracy (the number of errors that participants make while reading). Concerning reaction times, two assumptions underlie its use for inference: Firstly, they have to assume that if a participant is taking more time in naming a particular item, it means that item is more difficult than others. Secondly, the researcher has to hypothesize about features of that particular item that make it difficult to name. For example, when 100 participants read aloud two pseudowords, “rop” and “wap”, they might have faster reaction times to the former than to the latter. With this finding, we can calculate differences, on the linguistic level, between these two pseudowords (e.g. in terms of vowel consistency, orthographic neighborhood or letter bigram frequency). This would allow for indirect inferences about which linguistic characteristics affect reading aloud processes, which would, in turn, allow us to hypothesize a cognitive structure that would explain why this particular characteristic should affect reading processes. The transcribed responses of the participants give more direct information about the cognitive processes [6–8]. For example, for the two pseudowords above, participants might pronounce the former consistently as /ɹɔp/, and for the latter, some participants might pronounce the pseudoword as /wæp/ or as /wɔp/. This is more direct evidence that consistency (i.e., the presence of more than one possible pronunciation for the letter cluster wa, “in wasp” versus “wax”) affects reading aloud processes.

As for accuracy, since pseudowords do not have conventional pronunciations, it is difficult to decide whether they are pronounced correctly or not [7, 9]. Often, faced with the variety of responses participants give, researchers need to arbitrarily decide whether a pseudoword is correctly read by analyzing all the plausible pronunciations that they think it could have [9–11]. Even if a given software is used to score accuracy, decisions need to be made concerning response accuracy. For example, if we accept any pronunciation as correct whenever there is at least one instance of the grapheme-phoneme correspondence in the language, we would consider, the pronunciation /jɔn/ for the English pseudoword <yan> as correct, although, intuitively, most English native speakers would consider this pronunciation incorrect, because it corresponds to the vowel pronunciation of the word <yacht>.

With this in mind, we aim to investigate the number and kind of different pronunciations participants give, an information that is not captured by only scoring the answers as correct and incorrect [6–8]: The quantification of response variability to a given pseudoword may be a more sensitive measure of pseudoword reading aloud performance, since it does not involve any kind of arbitrary decisions from the researchers. Of course, the variability of responses and accuracy may be correlated: If participants give many different pronunciations to a given pseudoword, by definition, the variability will be high for this item. This also implies that any scoring scheme would likely mark more responses as incorrect.

Considering this, our study’s goal was to test an alternative variable, namely pseudoword reading aloud Entropy, as a way to quantifying participants’ pseudoword reading aloud performances [12]. This approach has the advantage that rather than making decisions about whether a given pronunciation is incorrect, we can include and analyze all responses.

Pseudowords pronunciation Entropy

Entropy is a concept first introduced by Shannon’s Information Theory [13], which can be defined as the degree of chaos within a closed system. Earlier studies in psycholinguistic research used Entropy as a measure to investigate processing difficulty in sentence comprehension [14], quantify orthographic transparency in different orthographies (using word onsets: [15–17], using mono-syllabic words: [18], using whole words: [19]), and to assess variability in responses to disyllabic English pseudowords [20] as well as diversity in vowel pronunciation in German and English children reading aloud pseudowords [12].

In the present study, we use Entropy to calculate the variability of responses to both monosyllabic and multisyllabic pseudowords. This considering, we focus in this study on the following three aspects:

Orthographic depth, by investigating orthographies varying in depth (English, German, French, Italian);
Age (adults and children) and grade (2, 3, 4, for monolingual German children):
r Bilingualism (comparing bilingual English-German children, reading German items, with monolingual German children)

Entropy values are calculated as follows: the more alternative pronunciations a given pseudoword has, the bigger its Entropy value is. Since Entropy focuses on the whole pseudoword pronunciation, Entropy values are not affected by the readers’ strategy to retrieve sound from larger (morphemes, bodies) or smaller embedded reading units (letters, graphemes). For each pseudoword, we have the transcription of each participant reading this particular item. Entropy is calculated, for each item, by taking the percentage of each type of response, multiplying it by its logarithm, and summing the resulting value for all possible pronunciations of this item. This process is described in the formula: where p(i|j) refers to the percentage of responses i for item j, where N is the number of different pronunciations provided across the participants. Negative numbers were converted into positive numbers (because the logarithm of a proportion, i.e., a number between 0 and 1, is always negative) for easier interpretability, by multiplying the summed Entropy value for each item j by -1. An example of how Entropy is calculated for a specific item can be found in Table 1.

Download:

Table 1. How to calculate Entropy from participants pronunciations for the pseudoword <wap>.

https://doi.org/10.1371/journal.pone.0251629.t001

When participants provided the same pronunciation for a given pseudoword, the Entropy value of that item was zero, because log1 = 0. Higher Entropy values (H > 0) instead resulted from participants giving different pronunciations, and as the distribution of multiple pronunciations approaches equiprobability. This formula allows us to focus on item-level differences, that is, to calculate Entropy per item, while for subject-level performances, we average across participants.

To summarize, Entropy is defined here as the number of different pronunciations that participants give to the same pseudoword (pseudoword pronunciation variability). For example, in a sample of five participants, Participants 1 and 3 could read the pseudoword <wap> as /wæp/; Participant 2, instead, would read the item as /wɔp/, while Participants 4 and 5 would agree on a yet different pronunciation: /wælp/. These different choices would increase the Entropy value associated with the pseudoword <wap>, calculated as seen in Table 1. However, the same five participants could agree on the pronunciation of another pseudoword: for example, all of them could read <drell> as /drel/. In this case, the Entropy value of <drell> would be equal to zero. As we will discuss below, there are reasons to think that pseudoword pronunciation variability (Entropy) may vary according to Language, Bilingualism and Age.

Orthographic depth

As for orthography, the relationship between letters and sounds can affect Entropy. The closeness of this relationship is referred to as orthographic depth, and is traditionally described as a continuum [21]. For example, on the shallow end of the continuum are orthographies like Finnish or Italian, where one letter typically corresponds to one sound (i.e, only maps to /i/), while on the deep end are orthographies with a high degree of inconsistency between its letters and sounds (i.e. in the word “gist” <g> is read /ʤ/, but the grapheme itself could be read /g/ as well), like English [22].

Shallow orthographies are easier to read and learn [21, 23–26] because of the straightforward mapping between graphemes and phonemes. Italian and German, for instance, are considered to have shallow orthographies [21], therefore we expected that the Entropy value of pseudowords read by our Italian and German participants will be very low, because the consistent correspondences between graphemes are phonemes will lead to none or very few possible alternative pronunciations (e.g., in Italian, <fulm> can be only read /fulm/ because all the letters in that pseudoword have only one phoneme corresponding to them, leading to only one possible pronunciation). Consequently, since the pseudowords do not have many different pronunciations, their Entropy was also expected to be low.

On the opposite end of the continuum are deep orthographies like English. Children learning to read in deep orthographies have been found to take longer to learn the correspondences between letters and sounds, because of their inconsistent and unpredictable relationships (the same grapheme , found in words like <kit> and <pint> will be read /ɪ/ in the first case and /aɪ/ in the second). As a result, it takes longer to acquire the ability to read accurately [21, 23–25]. We expect that the pseudoword Entropy value for English-speaking children and adults will be the highest, because letters are normally associated with more than one sound, leading to multiple alternative pronunciations (for the pseudoword <sind> can be read /s nd/ or /sa nd/). For this reason, we predicted higher respoonse variability in English-speaking children than in adults (because of their scarcer knowledge of GPCs); and higher response variability in English-speaking participants than in French-, Italian- and German-speaking children, who are learning to associate graphemes to phonemes in more consistent and transparent orthographies.

Complexity and unpredictability

More recent work suggests that orthographic depth should not be seen as a single continuum, but rather as a multidimensional space [27–29]. Even within Europe, orthographies differ on many aspects which are difficult to condense into a single construct. While inconsistency of the print-to-speech correspondences has always been central to the concept of orthographic depth, the study from Schmalz, Marinus, Coltheart and Castles [28] showed that, across orthographies, inconsistency can result either from “complexity” or “unpredictability” which, according to models of reading, should have differential effects on cognitive processes underlying reading and reading acquisition.

Complexity, on the one hand, can lead to inconsistency on the level of letters or graphemes due to the presence of multiletter-correspondences (<aw> → /ɔ:/; this is a complex correspondence because the reading of the individual letters will not give the exact pronunciation), or due to the presence of context-sensitive correspondences (<g[i]> -> /ʤ/; <g[a]> → /g/) or from both (<ch[r]> → /k/; <ch[i]> -> /tʃ/). The French word “ciseaux”, for example, contains three complex correspondences: the context-sensitive rule dictates that <c[i]> is read /s/, while the multiletter grapheme <au> corresponds to /o/ and a position correspondence dictates that the plural morpheme <x> is silent because of its position at the end of the word. Nonetheless, even if there are three different context correspondences, the pronunciation is entirely predictable. Unpredictability, on the other hand, refers to the degree to which the reading system is capable of correctly translating written words into their phonological equivalents [28]. The pronunciation of the word “yacht”, for example, is unpredictable, because this word cannot be read correctly without the reader having encountered it before.

Within languages, complexity and unpredictability are correlated. This makes it difficult to dissociate between them. For example, in the English orthography it can be hard to dissociate complexity from unpredictability, as for example in the words “range” and “flange”. English phonotactics correspondences state that if an <a> is to be found before the ending <nge> then it should be read as /eɪ/, as in “range” (/reɪnʤ/). However, “flange” is not read /fleɪnʤ/, but /flænʤ/. In this case there is a grapheme which is read differently while being in the same context: in “range” a complex correspondence is applied (a + nge), while in “flange” a simple grapheme-phoneme correspondence is used (<a> is read /æ/). Thus, complex context-sensitive correspondence alone cannot predict how we should read <a>, and readers are often unsure about which strategy is to be applied (context-sensitive or simple GPCs?). Instances like the case we described are not rare, and they make English orthography both highly complex and unpredictable.

The French orthography, on the contrary, is high in complexity, but low in unpredictability. On the one hand. it presents many complex correspondences, caused by multiletter and context-sensitive graphemes (respectively like <au> and <c>). On the other hand, these correspondences are mostly predictable (<au> will be always only read as /o/, while <c> will always be read /s/ before <i, e> and /k/ before <a, o, u>).

Considering the relation between complexity and unpredictability, in the current study we will look at languages that are simple and predictable (Italian and, to a lesser degree, German), complex and predictable (French) and complex and unpredictable (English), in order to investigate the possibility that these features may differentially affect Entropy.

Bilingualism

Another factor that may influence pseudowords pronunciation Entropy is bilingualism. Two scenarios are possible: when told to read pseudowords in Language A, individuals could show interference from Language B, by associating phonemes of Language B to graphemes of Language A. For example, English/German bilingual may read a German pseudoword like “moch” as /moʦ/ instead of /moχ/, because the grapheme <ch> is read differently in English. Similarly Treiman, Kessler and Evans [30] found interferences from French to English<c>and<g>pronunciation in English-speaking students who just started learning French. Thus, a grapheme-phoneme correspondence from Language B that interferes with reading Language A, may increase Entropy for bilingual individuals compared to monolingual individuals.

The second scenario goes in the opposite direction. Studies have shown that bilingualism improves metalinguistic awareness, that is the ability “to think about and reflect upon the nature and functions of language” [31]. Metalinguistic awareness refers to different aspects of language, as for example word awareness and phonological awareness. Moreover, results from Yelland, Pollard and Mercuri [43] show that this improved metalinguistic awareness in bilingual children also enhances reading skills, at least in regards to word recognition. Consequently, there are reasons to believe that bilingual children’s metalinguistic awareness could improve the overall understanding and sensitivity to GPCs, especially if one of the languages is more transparent than the other. For example, the prior learning of one consistent orthography could help understand the mechanisms underlying the GPCs in the other language, because children already have experience with the dynamics of associating letters to sounds, thus producing a facilitatory effect on the other language.

Aim and hypothesis

Our study’s goal was to evaluate the use of Entropy (H) in participants’ pseudoword reading aloud responses. Although Entropy has already been used to measure the diversity of vowel pronunciations in German and English children reading aloud pseudowords across grades [12], alternative pronunciations of disyllabic pseudowords in English [20], we are the first, to our knowledge, to use it to compare individual responses to both mono-syllabic and multi-syllabic pseudowords across age (primary school children and adults) and languages (shallow and deep orthographies), including a consideration for bilingualism (in children).

In Experiment 1, we re-analyze novel and published pseudoword reading aloud data from different languages (Italian, German and English) which are on different points along the orthographic depth continuum. In Experiment 2, 3 and 4 we report new data from different age groups. According to the Orthographic Depth Hypothesis [32], we expect that readers of shallow orthographies (like Italian, and, to a lesser degree, German) will be associated overall with low Entropy values, because the very predictable and consistent GPC of their orthography should prevent the possibility of many different alternative pronunciations for pseudowords.

Readers of deep orthographies (like English) will be more likely to be associated with higher Entropy values: this is because in deep orthographies different phonemes can be assigned to one grapheme, which translates to the higher probability that the same pseudoword will be read differently, depending on which phonemes the individual will decide to assign to the graphemes contained in the given pseudoword. A second prediction concerns age.

Adults, as well as children from different grades (2, 3 and 4), participated in this study. We expect that overall children would show a greater variability in responses in all language groups compared to adults (exception made for Italians, for which we only have data from children), because their reading skills development is still on-going, that is, their knowledge of graphemes-phonemes mapping is still incomplete. Hence, children may assign a greater number of phonemes to a given grapheme, because of a greater uncertainty regarding GPCs. A direct comparison will be made among monolingual German children in grade 2, 3 and 4 to investigate whether younger children show greater response variability in responses compared to older children. Overall, we expected that grade 2 children’s responses to show higher Entropy values compared to grade 3 and grade 4 children, and grade 3 children to show higher Entropy values compared to grade 4 children.

With respect to bilingualism, as discussed earlier in the introduction, we believe that two outcomes may be possible: If it is true that grapheme-phoneme correspondences from one language interfere with the reading of the other language, we would expect that higher Entropy values will be reported in bilingual children’s responses. However, if it is true that enhanced metalinguistic awareness in bilinguals lead to enhanced reading skills compared to monolinguals, we would expect that, on the contrary, bilingual children responses will be associated with lower Entropy values compared to monolinguals.

Experiment 1: Entropy in German and English adults reading matched pseudowords

In the first experiment, we re-analyzed pseudoword reading aloud data from a previously published study [33]. This study aimed to compare the nature of sublexical processing in English and German. The items were chosen such that they were matched on orthographic characteristics, such as the number of letters and orthographic neighborhood. In the published study, only RT data were analyzed. Here, we are extending the published data by providing new insights into the role of Entropy on pseudoword reading in German and English.

Methods

Participants.

German (n = 19) and Australian (n = 48) adults participated in this study. All were staff or students at universities in Germany and Australia, respectively, and received course credit or a small monetary compensation for their participation. The procedure was approved by the ethics committees of both Macquarie’s University, Australia (Macquarie University Faculty of Human Sciences (FHS) Ethics Committee) and Ludwig-Maximilian University, Germany (Ethikkommission bei der Medizinischen Fakultät der LMU München).

Materials.

Participants read aloud pseudowords in their respective language, which were chosen in respect to the size of their body-neighborhood (see [34]). The size of the body-neighborhood (body-N) for all items was measured thanks to the CELEX database, which is available for both German and English. In the original experiment, participants read aloud both words and pseudowords (in their respective languages) varying in body-N while being matched across body-N condition on length and orthographic neighborhood. Here, we analyze only the pseudoword data. The pseudowords were monosyllabic and matched on the number of letters and orthographic neighborhood [35], as well as on body-neighborhood [34]. Moreover, all items had consistent bodies (i.e., while the number of body-neighbors was manipulated, all body-neighbors had the same pronunciation). Altogether, there were 90 English and 90 German pseudowords, half of which contain high-frequency bodies and the other half contain low-frequency bodies.

Procedure.

Each participant was tested individually in a dimly lit, sound-proof testing booth. Each item was shown on the screen for 5 seconds or until the voicekey was triggered, in random order. The items were presented, one at a time, using the software DMDX [55], which created audio recordings for each participants and each item. Here, we analyze only the pseudoword reading aloud responses. A native speaker of each language transcribed the participants’ responses from the audiofiles previously recorded and a scorers who had received training in the phonology of the respective language scored the pronunciation accuracy. Both scorers were told to follow a lenient marking criterion, that is, all legally possible grapheme-phoneme relations (including context-inappropriate relations) were considered correct [23, 36, 37]. We then calculated the Entropy, for each pseudoword, using the formula described in the introduction and analyzed the data using the statistical environment software R [38]. Afterwards, as an additional analysis, we accounted for non-plausible pronunciations and random noise (meaningless misreadings, such as “dolt” read as /bolt/) by calculating Levenshtein distance [39] from the most common reading to a given pseudowords and all other alternative readings. We did a normalization of the distances obtained (by dividing the distance by the number of phonemes) so that it could be compared one to another. Since our shorter items counted three letters, we decided to exclude all pronunciations whose Levenshtein distance was higher than 0.334. With the resulting, diminished datasets, we then re-calculated Entropy and statistical tests (this re-analysis will be referred from now on as “pronunciation plausibility analysis”). The Python scripts which we used to calculate the Entropy values, as well as supplementary files, can be found here: https://osf.io/94wjx/.

Results and discussion

Non-responses (1 trial from the German data, 6 trials from the English data) were excluded before calculating the Entropy. For German, the median of the Entropy value, across all items, was 0.48 (min = 0, max = 2.21), and for English, the median Entropy was 0.39 (min = 0, max = 1.96).

As the Entropy measure is still relatively new to the field of pseudoword reading, the first question we asked was whether Entropy for each item depends on random or systematic factors. As the English sample was larger than the German sample, we randomly split the English sample 25 times into two groups of 24 participants each, and calculated the item-level Entropy for each item for the two different sub-samples. The mean of the correlations between the fifty sub-samples was 0.89, with a standard deviation of 0.02. All of the correlations were significant r(90) = p < 0.001.

The second question was if and how Entropy correlated with accuracy. Two scorers scored English pronunciation accuracy, while one scorer scored German pronunciation accuracy. We then calculated a correlation matrix between Entropy, accuracy, number of answers and percentage of the most common responses for both groups. Table 2 shows the results for English speaking participants, while Table 3 shows the results for German speaking participants. The agreement between scorers was calculated with Cohen’s kapp to measure inter-rater reliability [40]. Results show that, for the English data, the scorers were in a moderate agreement (k = 0.57).

Download:

Table 2. Intercorrelations for English-speaking participants (Exp 1).

https://doi.org/10.1371/journal.pone.0251629.t002

Download:

Table 3. Intercorrelations for German-speaking participants (Exp 1).

https://doi.org/10.1371/journal.pone.0251629.t003

Entropy was weakly correlated with accuracy, in a significant fashion for scorer 2: r = 0.26, p < 0.05 but not for scorer 1: r = 0.04, p = 0.70. This result was unexpected: Entropy was expecteed to be correlated negatively with accuracy, because it was calculated based on the number of pronunciations. This means that scorers were more likely to accept several alternative pronunciations as correct for English than for German, with the latter showing a negative correlation (r = −0.34, p < 0.05).

As expected, we found a significant positive correlation with the number of pronunciations per English pseudowords: r = 0.73, p < 0.001, showing that items with a high Entropy received more different pronunciations than items with a low Entropy, and a significant negative correlation with the percentage of the most common pronunciation (r = −0.86, p < 0.001). In German participants, Entropy negatively correlated with the accuracy scoring (r = −0.34, p < 0.05). This is more in line with what we would expect: as accuracy is high, Entropy is naturally low. However, since we could not recruit a second scorer for the German data, the reliability of this correlation remains to be seen. For the other measures, Entropy correlated positively, with the number of pronunciations (r = 0.92, p < 0.001) and negatively with the percentage of the most common response (r = −0.94, p < 0.001).

The third, theoretically relevant question, was whether or not the observed Entropy differed between the English and German readers. To visualize the distribution of the Entropy values, we generated a density plot of the English and German Entropy values (see Fig 1). Fig 1 shows that the distribution is right-skewed, with many items having an Entropy value close to zero. Therefore, we performed a Mann-Whitney test, with language as a predictor of Entropy. The difference in Entropy between English and German was not significant, W = 3710, p = 0.33, 95%CI = [−0.15, 0.10]. The pronunciation plausibility analysis confirmed the non significance of the original analysis: W = 3689, p = 0.29, 95%CI = [−0.15, 0.09].

Download:

Fig 1. Distribution of Entropy values for German and English adults.

https://doi.org/10.1371/journal.pone.0251629.g001

Tables 2 and 3 in S1 Appendix show the participants pronunciations to the ten items with the highest Entropy values. Participants mistakenly read some pseudowords as real words, but there was no significant difference in number of real words pronunciation between German(m = 0.05, sd = 0.22) and English adults(m = 0.02, sd = 0.16): p = 0.18. A list can be found in the Table 1 in S1 Appendix.

Both in English and in German, we found a non-normal distribution of Entropy values, with many Entropy values being close to zero (suggesting consistent pronunciations across participants). Thus, even in the English orthography, despite a number of items which result in a high degree of variability of responses, there is often a consensus about how to pronounce a given item (see also [20] for a similar conclusion). Mousikou, Sadat, Lucas and Rastle [20] argue that this agreement in English pseudowords pronunciation, despite the inconsistency of its orthography, can be explained by the influence that a pseudoword’s orthographic neighbors have on its pronunciation (for example key could interfere with the pronunciation of kuy), and by the fact that, even if a grapheme maps into several phonemes ( can be read as /ai/, /ɪ/ or /ɜ:/), participants will tend to pronounce it with the phoneme that is most frequently associated with it. For example, participants read the pseudoword “dize” mostly as /daiz/ (14 participants) and less likely as /dɪze/ (5 participants).

In German, the analysis of the ten items with the highest Entropy values revealed that there were few phonotactic properties that were not systematically applied to pseudowords. For example, the final consonant devoicing phenomenon, which normally makes the voiced final consonant voiceless in words (Rad—bike being read as /rat/) was not always applied: the pseudoword gund was read only half of the time gunt. Two context correspondences also triggered higher Entropy values: the first concerns the pronunciation of the grapheme <s> in front of the grapheme . Normally, in words like Sport, the <s> would be read as /ʃ/. However, in our data, participants read pseuwords like sprau either /ʃprau/ or /sprau/. Similarly, the grapheme <n> before the final grapheme <g> should give the phoneme /ɳ/, but participants productions in pseudowords like quang varied from /ɳ/, /ɳg/ to final /n/.

The present cross-linguistic comparison did not reveal differences in Entropy between English and German. Previous studies have found differences in accuracy as a function of orthographic depth (e.g., [21]). Since a low accuracy should be evident with high Entropy, we expected to find higher Entropy values in English compared to German. However, most previous reading aloud studies were conducted with children [23–25]. Adult studies have often used lenient marking criteria, and accuracy tends to reach ceiling. Thus, there is little evidence to suggest that cross-linguistic differences in accuracy or pronunciation variability persist into adulthood. The current analysis overcomes this limitation by using Entropy instead of a lenient marking criterion and suggests that, in adulthood, orthographic depth has a minimal influence on the heterogeneity of pseudoword reading aloud responses.

Experiment 2: Entropy in German monolingual children and German/English bilingual children

The aim of the second experiment was to test whether there were differences in Entropy in a younger population: that is, in primary school children. Although the results of Experiment 1 demonstrate that the Entropy of pseudoword reading aloud responses did not differ across German and English-speaking adults, this does not rule out that Entropy differences may exist between German and English-speaking primary school children who are still in the process of learning to read. Entropy differences in adults may be washed out by the fact that the skilled reading system has already established an optimal prediction system for letter-sound correspondences, which may not yet have developed to the same level of precision in developing readers. Experiment 2 put this hypothesis to test by acquiring data from monolingual German children and German/English bilingual children in grades 2, 3, and 4 reading matched pseudowords both in German and in English. This allowed us to compare Entropy within the same items and participants across grade (in German monolingual children) and across orthographies within the same participants.

Overall, we predicted higher Entropy in younger than in older children, because the knowledge of the GPCs may not be full developed, which could lead to a greater level of noisiness in their decision about how to pronounce a given GPC [12]. Moreover, Entropy was expected to be higher for the English than German items, because the depth of English may make it more difficult for children to learn the GPCs. Such a finding would be in line with previous studies, suggesting that pseudoword reading aloud accuracy is lower in English than in shallower orthographies (e.g., [21]). Finally, we hypothesized that Entropy may be higher in bilingual children than monolingual children, because the knowledge of GPCs within one language may interfere with the pseudowords reading aloud responses in the other language [30].