About The Speaker

Yi Xu
Yi Xu is Professor of Speech Sciences at the Department of Speech, Hearing and Phonetic Sciences, Division of Psychology and Language Sciences, University College London. He received his BA degree in English language and literature in 1981 from Shandong University, China, his MA degree in Phonetics in 1984 from Institute of Linguistics, Chinese Academy of Social Sciences, China, and his Ph.D. degree in linguistics in 1993 from the University of Connecticut, USA. He has taught at Northwestern University, USA and University of Chicago, USA. His research covers speech production, speech perception, speech prosody and computational modelling of tone, intonation and speech acquisition.
Main areas of research
Cognitive and computational psychology
Linguistics
Neurosciences
Applied and developmental psychology
Biological psychology
Artificial intelligence
How does human language become digital?
It is by now widely recognized that human language is a digital system, and this is what gives language the power to form infinite number of expressions by combining a finite set of discrete units. What is less clear, however, is how language becomes digital. One influential view is that digitization starts from the level of syntax, where discrete lexical items are combined into infinite number of “hierarchically structured expressions”. These expressions are then externalized at the sensorimotor interface and turned into audible speech (Chomsky, 2013). The problem with this view, however, is that it overlooks two basic facts: a) the total number of words is enormously large in any language, and b) words are composed of units that are also discrete, including morphemes, syllables and phonemes.
In this talk, I will argue that, instead of syntax, phonetics is the level at which language is digitized. That is, instead of being an interface for externalizing a digital syntactic core, phonetics is what makes language digital. I will show that phonetic digitization is done through discretization of consonants, vowels and laryngeal categories in a self-organizing process of interactive articulatory encoding and perceptual decoding. The encoding is enabled by the syllable, an articulatory mechanism that synchronizes the onsets of consonantal, vocalic and laryngeal gestures. The perceptual decoding is done by direct processing of the acoustic signal of each phonetic category without separately extracting abstract cues. Both production and perception skills are acquired through learning, a process that resolves variability arising from coarticulation, speaker differences and multi-functional interactions. Of the two, however, perceptual learning is much easier, as it requires no prior knowledge of articulation. Production learning, in contrast, relies on the guidance of perception. The digitization of language is therefore realized by maximizing the production-perception consistency only at the level of contrastive phonetic categories.
In summary, it is the discretization of the phonetic categories that creates the basic digits of language that can be combined, often recursively, to form morphemes, words, phrases and sentences that make up the hierarchical structure of all human languages.