About
Vaclav Brezina
Lancaster University

Main areas of research
- Corpus linguistics
- Statistics
- Applied Linguistics
Bio
My main research interests are corpus linguistics, statistics, and the application of corpus methods in the study of speech and writing, learner language, collocations, phraseology and vocabulary. I am also interested in corpus design and corpus tools development.
Recent works
Corpus linguistics and AI:# LancsBox X in the context of emerging technologies. – 2025 – V BREZINA
Adjective+ noun collocations in L2 spoken English: How robust is the role of proficiency? – 2025 – D Gablasova, V Brezina
Developing a coding scheme for annotating opinion statements in L2 interactive spoken English with application for language teaching and assessment – 2025 – Y Jung, D Gablasova, V Brezina, H Schmück
Keynote Lecture
How to Identify Multi-Word Expressions in Corpora?
Outline
Multi-word expressions (MWEs) such as of course, in the light of, as good as new or take into account are central to fluent communication, yet relatively difficult to identify systematically. This lecture draws on the forthcoming Frequency Dictionary of Multi-Word Expressions in British English (Brezina & Gablasova, Routledge, 2026) to present a clear, corpus-based method for analysing a wide range of MWEs across genres.
I begin by contrasting corpus evidence with current AI language models. Modern AI generates language by answering a simple question – What is the next word? – a principle long used in corpus linguistics to measure collocation. Yet while AI can imitate fluent usage, corpora remain more transparent and reliable for identifying MWEs because they trace patterns directly to authentic human interaction in speech and writing.
The lecture outlines a practical framework combining frequency, association strength and dispersion to capture the core phraseology of contemporary British English. Examples from the dictionary illustrate how this method reveals stable, meaningful MWEs and supports applications in language teaching, language testing, lexicography and applied linguistic research. The central claim is this: AI can model language, but corpora allow us to understand it.
