Research


My research focuses on how people understand speech. I’m interested in how understanding speech influences our perception of low-level acoustic input and vice versa. I am particularly interested in how learning processes, such as discriminative error-driven learning, are involved in speech comprehension and the neural underpinnings of prediction, speech perception and learning. I'm also interested in eye movements as a window onto the cogntive processes involved in speech perception and reading. In my PhD work, I presented a new method for the analysis of 'Visual World' eyetracking data, namely Generalised Additive Mixed Models (GAMMs). These nonlinear mixed effects regression models are highly valuable for the analysis of time series data like eyetracking data, as they allow for the inclusion of nonlinear effects and interactions, as well as potentially nonlinear random effects, while also dealing with the non-independence of data points and potential autocorrelation in time series data.

Selected representative publications

See Publications for more

 

Of mice and men: Speech sound acquisition as discriminative learning from prediction error, not just statistical tracking

Jessie S. Nixon

Cognition, 2020

Despite burgeoning evidence that listeners are highly sensitive to statistical distributions of speech cues, the mechanism underlying learning may not be purely statistical tracking. Decades of research in animal learning suggest that learning results from prediction and prediction error. Two artificial language learning experiments test two predictions that distinguish error-driven from purely statistical models; namely, cue competition – specifically, Kamin’s (1968) ‘blocking’ effect (Experiment 1) – and the predictive structure of learning events (Experiment 2).

In Experiment 1, prior knowledge of an informative cue blocked learning of a second cue. This finding may help explain second language learners’ difficulty in acquiring native-level perception of non-native speech cues. In Experiment 2, learning was better with a discriminative (cue–outcome) order compared to a non-dis- criminative (outcome–cue) order. Experiment 2 suggests that learning speech cues, including reversing effects of blocking, depends on (un)learning from prediction error and depends on the temporal order of auditory cues versus semantic outcomes.<\p>

Together, these results show that (a) existing knowledge of acoustic cues can block later learning of new cues, and (b) speech sound acquisition depends on the predictive structure of learning events. When feedback from prediction error is available, this drives learners to ignore salient non-discriminative cues and effectively learn to use target cue dimensions. These findings may have considerable implications for the field of speech acquisition.

Photo by Jason Rosewell, Unsplash

Photo by Jason Rosewell, Unsplash

 

Photo by Azmaan Baluch on Unsplash

The temporal dynamics of perceptual uncertainty: eye movement evidence from Cantonese segment and tone perception

Jessie S. Nixon, Jacolien van Rij, Peggy Mok, R. Harald Baayen and Yiya Chen

Journal of Memory and Language, 2016

Since its introduction in the late 1990s, the ‘visual world’ eyetracking paradigm (VWP) has come to be highly valued in linguistic and related research as a means of investigating moment-by-moment online processing. However, the very aspect that makes the visual world paradigm so exciting as a research tool – tracking looking behaviour over time – also creates challenges for statistical analysis. Several papers in the late 2000s raised issues with the analysis methods typically used for analysing VWP data (e.g. Barr, 2008; Mirman et al., 2008). In this paper, we present a new solution to these issues. We introduce the use of Generalised Additive Mixed Models (GAMMs) for the analysis of VWP data. As GAMMs have not previously been used to analyse VWP data, we provide a brief introduction to the use and benefits of GAMMs for analysing time series data. In addition, we also introduce a new measure of uncertainty (i.e. a new response variable), Euclidean Distance from the target and competitor.

Two visual world eyetracking experiments investigated how acoustic cue value and statistical variance affect perceptual uncertainty during Cantonese consonant (Experiment 1) and tone perception (Experiment 2). Participants heard low- or high-variance acoustic stimuli. Euclidean distance of fixations from target and competitor pictures over time was analysed using Generalised Additive Mixed Modelling. Distance of fixations from target and competitor pictures varied as a function of acoustic cue, providing evidence for gradient, nonlinear sensitivity to cue values. Moreover, cue value effects significantly interacted with statistical variance, indicating that the cue distribution directly affects perceptual uncertainty. Interestingly, the time course of effects differed between target distance and competitor distance models. The pattern of effects over time suggests a global strategy in response to the level of uncertainty: as uncertainty increases, verification looks increase accordingly. Low variance generally creates less uncertainty, but can lead to greater uncertainty in the face of unexpected speech tokens.

 

Prediction and error in early infant speech learning: A speech acquisition model

Jessie S. Nixon & Fabian Tomaschek

Cognition, 2021

In the last two decades, statistical clustering models have emerged as a dominant model of how infants learn the sounds of their language. However, recent empirical and computational evidence suggests that purely statistical clustering methods may not be sufficient to explain speech sound acquisition. To model early development of speech perception, the present study used a two-layer network trained with Rescorla-Wagner learning equations, an implementation of discriminative, error-driven learning....

The model contained no a priori linguistic units, such as phonemes or phonetic features. Instead, expectations about the upcoming acoustic speech signal were learned from the surrounding speech signal, with spectral components extracted from an audio recording of child-directed speech as both inputs and outputs of the model. To evaluate model performance, we simulated infant responses in the high-amplitude sucking paradigm using vowel and fricative pairs and continua. The simulations were able to discriminate vowel and consonant pairs and predicted the infant speech perception data. The model also showed the greatest amount of discrimination in the expected spectral frequencies. These results suggest that discriminative error-driven learning may provide a viable approach to modelling early infant speech sound acquisition.

Photos by Chien Pham and Irina Murza. Unsplash

Photos by Chien Pham and Irina Murza. Unsplash

 

Photo by Kaitlyn Baker on Unsplash

Keys to the future? An examination of statistical versus discriminative accounts of serial pattern learning

Fabian Tomaschek, Michael Ramscar & Jessie S. Nixon

Cognitive Science, 2024

Sequence learning is fundamental to a wide range of cognitive functions. Explaining how sequences—and the relations between the elements they comprise—are learned is a fundamental challenge to cognitive science. However, although hundreds of articles addressing this question are published each year, the actual learning mechanisms involved in the learning of sequences are rarely investigated. We present three experiments that seek to examine these mechanisms during a typing task. Experiments 1 and 2 tested learning during typing single letters on each trial. Experiment 3 tested for “chunking” of these letters into “words.” The results of these experiments were used to examine the mechanisms that could best account for them, with a focus on two particular proposals: statistical transitional probability learning and discriminative error-driven learning....

Experiments 1 and 2 showed that error-driven learning was a better predictor of response latencies than either n-gram frequencies or transitional probabilities. No evidence for chunking was found in Experiment 3, probably due to interspersing visual cues with the motor response. In addition, learning occurred across a greater distance in Experiment 1 than Experiment 2, suggesting that the greater predictability that comes with increased structure leads to greater learnability. These results shed new light on the mechanism responsible for sequence learning. Despite the widely held assumption that transitional probability learning is essential to this process, the present results suggest instead that the sequences are learned through a process of discriminative learning, involving prediction and feedback from prediction error.

 

Photo by Adam Winger on Unsplash

The PERCEPtual span is DYNAMically adjusted in response to FOVeal load by Beginning READERS

Johannes M. Meixner, Jessie S. Nixon & Jochen Laubrock

Journal of Experimental Psychology: General, 2022

The perceptual span describes the size of the visual field from which information is obtained during a fixation in reading. Its size depends on characteristics of writing system and reader, but—according to the foveal load hypothesis—it is also adjusted dynamically as a function of lexical processing difficulty. Using the moving window paradigm to manipulate the amount of preview, here we directly test whether the perceptual span shrinks as foveal word difficulty increases....

We computed the momentary size of the span from word-based eye-movement measures as a function of foveal word frequency, allowing us to separately describe the perceptual span for information affecting spatial saccade targeting and temporal saccade execution. First fixation duration and gaze duration on the upcoming (parafoveal) word *N + 1* were significantly shorter when the current (foveal) word N was more frequent. We show that the word frequency effect is modulated by window size. Fixation durations on word *N + 1* decreased with high-frequency words *N*, but only for large windows, that is, when sufficient parafoveal preview was available. This provides strong support for the foveal load hypothesis. To investigate the development of the foveal load effect, we analyzed data from three waves of a longitudinal study on the perceptual span with German children in Grades 1 to 6. Perceptual span adjustment emerged early in development at around second grade and remained stable in later grades. We conclude that the local modulation of the perceptual span indicates a general cognitive process, perhaps an attentional gradient with rapid readjustment.

 

Age estimation in foreign-accented speech by non-native speakers of English

Dan Jiao, Vicky Watson, Sidney Gig-Jan Wong, Ksenia Gnevsheva & Jessie S. Nixon

Listeners are able to very approximately estimate speakers’ ages, with a mean estimation error of around ten years. Interestingly, accuracy varies considerably, depending on a number of social aspects of both speaker and listener, including age, gender and native language or language variety. The present study considers the effects of four factors on age perception. It investigates whether there is a main effect of speakers’ native language (Arabic, Korean and Mandarin) even when speaking a second language, English. It also investigates a particular speaker-listener relationship, namely the degree of linguistic familiarity. Linguistic familiarity was expected to be greater between Mandarin and Korean than between Mandarin or Korean and Arabic. In addition, it considers the effect of the acoustic cues of mean fundamental frequency (F0) and speech rate on age estimates.

Fifteen Arabic-accented, fifteen Korean-accented and twenty Mandarin-accented English speakers participated as listeners. They heard audio stimuli produced by forty-eight speakers, equally distributed between native Arabic, Korean and Mandarin speakers, reading a short passage in English. Listeners were instructed to estimate speakers’ ages in years. Listeners’ age estimates and reaction times were recorded.

Results indicate a significant main effect of speaker native language on perceived age such that Mandarin speakers were estimated to be younger than Arabic speakers. There was also a significant effect of linguistic familiarity on age estimation accuracy. Age estimates were more accurate with greater linguistic familiarity, i.e., native Korean and Mandarin listeners estimated ages of speakers of their own native languages more accurately than native Arabic speakers’ ages and vice versa. In terms of acoustic cues, mean F0 and speech rate were significant predictors of age estimation. These effects suggest that in perception, age may be marked not only by biological changes that occur over the lifetime, but also by language-specific socio-cultural features.

Photo by Cristina Gottardi, Unsplash

Photo by Cristina Gottardi, Unsplash

 

More publications

Background photo by Kamto Wong