Dirk Geeraerts
Dirk Geeraerts is a professor emeritus of linguistics at the University of Leuven, Belgium. His main research area involves the overlapping fields of lexical semantics, lexicology, and lexicography, with a theoretical focus on cognitive linguistics and a descriptive focus on lexical variation and change. A prominent member of the first generation of cognitive linguists, he played an instrumental role in the international expansion of cognitive linguistics, as the founder of the journal Cognitive Linguistics and as the editor (with Hubert Cuyckens) of the Oxford Handbook of Cognitive Linguistics. From 1995 to 2005, he was editor-in-chief, with T. den Boon, of the Van Dale Groot Woordenboek van de Nederlandse Taal. His publications include the following monographs:
1985 Paradigm and Paradox. Leuven: Leuven University Press
1994 (with S. Grondelaers and P. Bakema) The Structure of Lexical Variation. Berlin: Mouton de Gruyter
1997 Diachronic Prototype Semantics. Oxford: OUP
2000 (with S. Grondelaers and D. Speelman) Convergentie en divergentie in de
Nederlandse woordenschat. Amsterdam: Meertens
2006 Words and Other Wonders. Papers on Lexical and Semantic Topics. Berlin: Mouton de Gruyter
2010 Theories of Lexical Semantics. Oxford: OUP
2017 Conceptual Structure and Conceptual Variation. Shanghai: Foreign Language Education Press
2018 Ten Lectures on Cognitive Sociolinguistics. Leiden: Brill
2024 (with D. Speelman, K. Heylen, M. Montes, S. De Pascale, K. Franco and M. Lang) Lexical Variation and Change. A Distributional Semantic Approach . Oxford: OUP.
Lexical Semantics, Lexicography and LLMs
In an attempt to provide some background for the overall theme of the Euralex 21 conference, the talk explores the position of LLMs in the field of lexical studies from two angles, a lexicographic and a theoretical linguistic one.
From a theoretical perspective, LLMs currently constitute the epitome of distributional semantics, and distributional semantics (for the reasons that I specified in Theories of Lexical Semantics, 2010) is eminently suited as a methodological basis for usage-based cognitive semantics, allowing for a convergence of major theoretical trends in lexical semantics. But given that LLMs have taken distributional semantics well beyond the shape it had in 2010, does that evaluation still hold?
For the lexicographical perspective, I will first draw attention to the too often ignored process through which lexicography not only gave a major descriptive impetus to the development of corpus linguistics, but also specifically contributed to an essential step in the emergence of computational methods for corpus research: LLMs are a tool with at least to some extent lexicographic roots. But again, given that the tool has grown well beyond its original format, how does that affect its relationship to lexicography?
Tony Veale
Tony Veale is the outgoing chair of the International Association for Computational Creativity, and the author of several monographs on the topic of creative language generation, including Exploding The Creativity Myth: The Computational Foundations of Linguistic Creativity (Bloomsbury, 2012), Twitterbots: Making Machines That Make Meaning (with Mike Cook; MIT Press, 2017) and Your Wit Is My Command: Building AIs with a Sense of Humor (MIT Press, 2021). He has researched the crossover between AI and language for three decades in academia and in industry. He teaches Computational Creativity and Generative AI as an associate professor in UCD’s school of Computer Science.
YOU TALK FUNNY! ONE DAY ME TALK FUNNY TOO: Investigating the Capacity of Large Language Models for Humour
Recent developments in the scaling and training of large language models (LLMs) have led to a dramatic change in how the public views Artificial Intelligence. No longer the vaguely aspirational preserve of science fiction stories, AI is now expected to work, and not just in the laboratory but in a wide range of consumer products. Yet as AI outperforms people on tasks that were once considered yardsticks of human intelligence, one area of yhuman experience still holds out, for now at least: our very human sense of humour. This is not for want of trying, as this talk will show. There is good reason for computer science to take humour seriously, By building computer systems with a sense of humour, capable of appreciating the jokes of human users or even of generating jokes of their own, we can turn academic theories into practical realities that amuse, explain, provoke, and delight. The writer Clive James once pronounced that one should not trust anyone lacking a sense of humour, even, indeed, to post a letter, for what is humour but our sense of equanimity and poise in the face of the unpredictable when common sense has been pushed to the brink? My talk will describe where researchers are on this road to more humorous machines, and explore how we might go further towards giving LLMs a robustly human funny bone. The talk will also cover related issues such as acceptability and value alignment in LLMs, since humour often pushes the bounds of what is socially acceptable in polite company.
Kory Stamper
Kory Stamper has been a lexicographer for twenty-six years. Her career began at Merriam-Webster, where she was trained in lexicography by E. Ward Gilman, and continued at Cambridge Dictionaries and Dictionary.com, where she was mostly recently the Senior Editor of Lexicography leading a team of lexicographers and thesaurists. She has written dictionaries and thesauruses for native speakers of English and English-language learners. She specializes in the analysis and reorganization of lexical data inside dictionaries and thesauruses with the goal of presenting salient lexicographical information more easily to a wide variety of users. Her writing on language and lexicography for general audiences has appeared in The New York Times, the Guardian, the Washington Post, and the Times Literary Supplement, and in her best-selling book Word By Word: The Secret Life of Dictionaries. She is the current president of the Dictionary Society of North America, and a member of the Language Council of the Miami Nation of Indiana, where she practices relational lexicography in the revitalization and recording of the Eastern Myaamia dialect.
Case Studies in the Successes and Limits of Frame Semantics in Practical Lexicography
Practical, commercial lexicography in the United States, in particular, is a field that relies heavily on tradition, and it has been loath to abandon the tried-and-true methods of corpus creation, analysis, and defining that have been established since the time of Murray. Yet frame semantics has provided a broader lens through which the practical lexicographer can view meaning, and its integration (though slow) into the practice of lexicography has yielded defining methods that are more user-oriented while giving the lexicographer tools to move beyond their own unconscious or implicit biases -something that is increasingly important in successful modern lexicography. But technological and social changes in the last several decades – the ease with which mis- and disinformation moves into the mainstream, the rise of generative AI and the regular presentation of generated text as natural language, the proliferation of varieties of English accessible to the lexicographer that are sometimes themselves removed from context, and the changing ways in which online dictionaries are used–have presented difficulties to the practical lexicography who seeks to integrate frame semantics deeper into their practice. This paper will present case studies on the successes of integrating frame semantics into lexicographical practice, and the current challenges that lexicographers face when the “frame” itself is illusory, shifting, or debated.
Tiago Torrent
Tiago Timponi Torrent is a Cognitive Linguist working on Multimodal Natural Language Processing within the framework of Frame Semantics and Construction Grammar. He is the head of the FrameNet Brasil Computational Linguistics Lab, PI of ReINVenTA – Research and Innovation Network for Video and Text Analysis of Multimodal Objects – and Professor of the Graduate Program in Linguistics at the Federal University of Juiz de Fora, Brazil. He is a Research Productivity Grantee of the Brazilian National Council for Scientific and Technological Development (CNPq), and winner of the 2021 edition of the Technology in Linguistic Research Award of the Brazilian Linguistics Association (ABRALIN). Tiago Torrent served as a Guest Professor at the Department of Swedish, Multilingualism and Language Technology at the University of Gothenburg. He is the one of the co-authors of Copilots for Linguists: AI, Constructions, and Frames.
Possible Futures for Semantic Lexical Resources in the Age of Artificial Intelligence
Large Language Models (LLMs) have been dominating the discussion fora on language technology for at least the past seven years. As much as LLMs have spurred progress in NLP, recent research has been demonstrating their performance seems to reach a limit which cannot be overcome with more training data. Therefore, hybrid approaches combining LLMs and Language Resources have been gaining momentum. In this talk I explore possible futures for research in semantic lexical resources in combination with LLMs and AI techniques. As examples of possible research paths, I discuss the application of the FrameNet model to the development of a tool for identifying territories prone to suffer from gender based violence, as well as to the growing field of multimodal NLP.le in polite company.
Lana Hudeček
Lana Hudeček is a Senior Research Fellow (equivalent to Full Professor) at the Institute for the Croatian Language. She serves as the principal investigator and head of the Croatian Web dictionary – Mrežnik project. This project was initially funded by the Croatian Science Foundation from 2017 to 2021, later became an internal project of the Institute for the Croatian Language from 2021 to 2023, and in 2024, received funding from the European Union within the NextGenerationEU program. Between 2007 and 2011, she led the Croatian Normative Desk Dictionary project (Hrvatski normativni jednosveščani rječnik), funded by the Ministry of Science and Education. This project culminated in the creation of two dictionaries: The First School Dictionary of the Croatian Language (Prvi školski rječnik hrvatskoga jezika), published in 2008, and The School Dictionary of the Croatian Language (Školski rječnik hrvatskoga jezika), published in 2012. Both are also available online at rjecnik.hr. Her primary areas of interest include lexicography, terminology, standard language, and language planning. She has collaborated on numerous international and Croatian projects and is the author or co-author of many monographs and papers. Her contributions have been recognized through six awards received in collaboration with co-authors.
Croatian Web Dictionary – MREŽNIK
From 2017 until 2021, the Croatian Web Dictionary – Mrežnik was a project of the Croatian Science Foundation; from 2022 until now, it has been an internal project of the Institute of Croatian Language, and from 2024, it will be funded by the EU program NextGeneration EU. The project goals are to compile an e-dictionary of the Croatian language that is online, free, corpus-based, monolingual, hypertext, searchable, normative, and based on the contemporary results of e-lexicography and computational linguistics. Mrežnik consists of three modules: for adult native speakers of Croatian, schoolchildren, and non-native speakers of Croatian. It will be the central meeting point of the existing language resources of the Institute of Croatian Language and Linguistics but also of all language resources created within the project. Croatian Web Dictionary – Mrežnik is conceived as a dynamic dictionary that will be further compiled and edited even after the end of the NextGeneration EU project, as it is a long-term project of the Institute for the Croatian Language. The reason for launching the Mrežnik project was primarily because in 2016, at the time of the project application, Croatia was still one of the countries that did not have an online dictionary of their national language compiled according to the rules of contemporary e-lexicography. The need for extensive scientific research in e-lexicography was also recognized, i.e., getting to know the theory and practice of creating e-dictionaries and the possibilities that new dictionary platforms offer. Mrežnik is compiled taking into account semantic relations and the systematic nature of language. The systematic nature of the dictionary can be seen in almost all areas: accentuation of entry words, the selection and accentuation of forms in the grammatical block, the definition of words that belong to closed grammatical and semantic groups, etc. The two essential computer tools for compiling this three-module dictionary are Sketch Engine, a corpus query system (loaded with the corpora) to support language analysis, and TLex, a dictionary writing system. Word Sketches are specially adapted to the needs of the project and are based on developed Sketch Grammar. In 2022, a part of the dictionary (A – F) was exported from TLex to both the web application (https://rjecnik.hr/mreznik/) and the CLARIN European science infrastructure repository (clarin.si repository and the github.com public data management system). The presentation will focus on the corpora and wordlist(s), normative and pragmatic aspects of Mrežnik, micro- and macrostructure of Mrežnik, and the place of grammar in Mrežnik. The fact that Mrežnik is the first gamified Croatian web dictionary and the first dictionary with recorded pronunciation will be stressed. The comparison of the three modules will also be addressed, and it will be shown that the center of all lexicographic decisions was always the user.
Eve Sweetser
Eve Sweetser is Professor Emerita of Linguistics and former Director of the Program in Celtic Studies at the University of California Berkeley. Her primary research interests include historical linguistics, semantics and meaning changes, the semantics of grammatical constructions, cognitive linguistics, metaphor and iconicity, subjectivity and viewpoint, the relationship between language and gesture, and the Celtic language family. Her 1990 book, From Etymology to Pragmatics (Cambridge University Press), explores generalizations about synchronic and diachronic patterns of meaning in the areas of model verbs and conjunctions. Her 2005 book, Mental Spaces in Grammar: Conditional Constructions (CUP 2005), was coauthored with Barbara Dancygier, and examines the syntax and semantics of a wide range of English conditional constructions, using a Mental Spaces model of semantics. More recently, she and Dancygier co-edited Viewpoint in Language: A Multi-Modal Perspective (CUP 2012); and their co-authored Cambridge linguistics textbook Figurative Language was published in March 2014. She has published articles on topics including modality, polysemy, metaphor, conditional constructions, grammatical meaning, performativity, gesture, and Medieval Welsh poetics.
METANET – its current status and future directions
The new MetaNet Wiki is under construction, and will (fairly soon) become accessible, including databases of French, English, and Spanish metaphors relating to (1) COVID and the pandemic; (2) cancer; and (3) climate change. Our largest financial support is a Canadian SSHRC grant, specifically to examine these metaphors in North American and French varieties of French, English, and Spanish. The database, collected initially into spreadsheet format, is gathered from a combination of reviews of the published literature on metaphors for these target domains and searches of online major (nationally important) media sources, with an attempt to balance the content. For example, our COVID data includes data from the #ReFrameCOVID project and publications on metaphors for the pandemic as well as data we have collected ourselves. Comparative data has been gathered for Canadian and Hexagonal French, as well as for Iberian, Mexican, Bolivian, Chilean and US Spanish, and Canadian and U.S. English varieties. The most recent extension, still in its initial stages, is to Mandarin COVID/pandemic metaphors, in a collaboration with a group of scholars from Peking University and
Beijing Language and Culture University as well as Mandarin-speaking researchers at UCB and UBC. We’ll give examples of spreadsheets with their tagged parameters.
This project has of course all of the scholarly community’s shared concerns with the methods of metaphor identification and analysis: the paper will recap the MetaNet procedure for searching and identifying data, as well as our process for multiple layers of verification by multiple analysts, before an entry is accepted into the database.
The particular focus of this paper will be on the comparative aspects of the project. How can we use the wealth of documentation of English metaphors as a resource, without allowing our knowledge about English to “colonialize” our other databases? At a higher level, we will discuss how the project has been approaching both cross-linguistic contrasts and contrasts between dialects/varieties of the “same” language; we would also like to comment on issues which have specifically arisen as our work
on Mandarin has developed.
Alexander Ziem
Prof. Dr. Alexander Ziem holds the Chair of German Linguistics at Heinrich Heine University Düsseldorf, Germany, and is he director and initiator of the German FrameNet and Constructicon (https://framenet-constructicon.hhu.de/). After teaching and research activities at the Technical University Berlin and the University of Basel (Switzerland), he was a Fellow at the International Computer Science Institute (FrameNet) in Berkeley, California, USA, in 2013 and 2014, as well as Principial Investigator of the DFG projects “Linguistic Constructions of ‘Crises’ in Germany (1973-2009)” and “Methods and Methodologies of Discourse Analysis”. From 2011 to 2019, Ziem headed the Integrated Research Training Group for CRC 991 “The Structure of Representations in Language, Cognition and Science”. In 2019, he was a visiting professor at Università degli Studi di Milano in Italy; at the same time, from 2018 to 2021, together with Prof. Oliver Czulo and Prof. Tiago Torrent (Juiz de Fora, Brazil) he led the DAAD-funded cooperation project “Comparing semantic frames and grammatical constructions across languages: from linguistic analyses and multilingual resources to machine translation”. His current research focuses in particular on developing and expanding the German FrameNet and Constructicon, also taking into account the Global FrameNet perspective. Generally, his work focuses on cognitive semantics, construction grammar (including at the interface with phraseology) as well as corpus and discourse linguistics.
Conceptual Metaphors in the German FrameNet Constructicon: From Conception to Implementation
There are increasing efforts to build large databases on structures and functions of linguistic units specific to a given target language – the development of constructica is probably one of the most prominent. It should be without doubt that conceptual metaphors (CM) constitute an integral part of a constructicon. But: How to integrate CM in a construction? Focussing on this general issue, the talk will begin with a brief introduction to the German FrameNet and Constructicon. The project „German FrameNet-Constructicon“ (www.german-constructicon.de) investigates form-meaning structures of current German in the continuum of lexicon and grammar. The overall aim is to compile an inventory of constructions of the standard variety of German that is as representative as possible. The inventory takes the form of a constructicon in which multiple structural relationships between (meanings of) constructions are recorded and documented. On this basis, the intention is to also scrutinize other (historical, areal) varieties, including relations among constructions within and across varieties. The FrameNet-Constructicon comprises three repositories: (a) an index of meaning-bearing linguistic forms, (b) a FrameNet for capturing the meanings of linguistic forms, including the ConceptulMetaphrNet (CoMetNet), in which frames for conceptual metaphors are documented, and (c) construction entries for each form-meaning/function pair. One subproject specifically aims at developing CoMetNet as part of the German FrameNet. This subproject relies on several general project goals, including (a) identifying and mapping German lexical and grammatical constructions and the families in which they are organized, and (b) capturing meanings of constructions with frames, including conceptual metaphors, and documenting the meaning structures in a FrameNet repository. With reference to the existing infrastructure, the main focus of the talk will lie on (a) scrutinizing the habitat of CM in a constructicon, (b) explaining how CM – conceived as frames with conventionalized mapping between a source and a target frame – have been conceptually integrated in the constructicon, and (c) elaborating on how we empirically identified and implemented CM in our system (constructicon).