Large Language Models and Lexicography

Chair: Simon Krek

Length: half-day (09:00 – 13:00)

Format: open call for papers/presentations/demos + invited talks
Audience size: 30 participants

Description of the agenda:
The proposed workshop will delve into the integration of large language models (LLMs) in lexicography. The workshop aims to explore how these models aid in linguistic analysis and generation of dictionary data, enhancing dictionary development through automation of processes.
The expected topics will also include identifying new word usages and trends, and how LLMs facilitate multilingual lexicography, as well as the ethical implications of AI in lexicography, including
concerns about bias and cultural sensitivity. The workshop will be of interest to lexicographers and language technology experts, offering insights into the trends of AI-assisted lexicography and
preparing them for digital transformation.
Structure and schedule: two invited talks in the morning + coffee break + accepted presentations

Figurative Language (Research) and Large Language Models

Chairs: Kristina Š. Despot, Ana Ostroški Anić


Figurative and metaphorical language constitute a fundamental aspect of human communication and cognition. Despite its ubiquity, figurative language identification and interpretation, as a high-level semantic task, has been a challenge for NLP for at least a decade (Shutova 2011, Tong et al. 2021). Most existing work focuses on identifying figurative language at the word level, often failing to capture novel metaphors effectively. The conventional nature of identified metaphors limits the ability to handle creative and innovative uses of figurative language.

This workshop aims to bring attention to the intersection of figurative language research and the capabilities of Large Language Models (LLMs). Recent advancements in LLMs present new opportunities for metaphor detection and sub-type labeling, as well as for their automatic production and interpretation. LLMs, such as GPT-3, demonstrate superior language understanding and contextual semantic comprehension compared to earlier models, making them promising for figurative language research. However, the evaluation of state-of-the-art language models on the task of figurative language production and interpretation reveals that while models perform above chance, there is still a notable gap compared to human performance, especially in zero- or few-shot settings (Liu et al. 2022; He et al. 2023; Despot, Ostroški Anić, and Veale 2023). Moreover, existing metaphor generation research has primarily focused on constructing single metaphors (‘A is like B’) rather than extended metaphors (‘A is like B, C is like D’) and LLMs fall short of human writers in metaphor writing, emphasizing the need for further improvements in LLMs for generating metaphors that meet writers’ requirements (Kim et al. 2023).

Generative LLMs, trained on vast amounts of text, have the potential to offer a rich pool of metaphor ideas and develop underlying links among sub-metaphors, presenting compelling alternatives to traditional approaches. The workshop addresses the challenges specific to metaphor generation and understanding, contributing toward understanding constraints and offering an opportunity for the discussion about the creation of a comprehensive, annotated dataset of metaphors for future research, one that contains novel and high-quality metaphors that are unlikely to have been included in the training data, which is a challenge given the enormous text corpora used to train LLMs.

We aim to provide a platform for researchers and practitioners to collaborate on advancing figurative language understanding using state-of-the-art LLMs, but also on advancing LLMs using state-of-the-art figurative language theories and resources. We would like to facilitate the discussions on methodologies for evaluating language models on nonliteral language understanding tasks, explore applications of LLMs in metaphor detection and generation, emphasizing their potential in addressing shortcomings identified in the research, and define future research directions, particularly in enhancing LLMs to meet the requirements of metaphor generation in various contexts, including creative and extended metaphors.

The proposed workshop, “Figurative Language (Research) and Large Language Models,” aims to bridge the gap between figurative language research and the capabilities of modern LLMs. By addressing the challenges in nonliteral language understanding and exploring the potential of LLMs in metaphor generation, the workshop intends to propose benchmarks, foster collaboration, share insights, and pave the way for advancements in the field.

This will be achieved through presentations covering the topic of figurative language and large language models, and a general discussion on the topic. The workshop will be of relevance to those with an interest in semantic resources, figurative language, and large language models.

There will be a minimum of 15 attendees, with a maximum of 30 attendees.

Format and Activities

The proposed workshop will last 3.5 hours (including a break).

It will consist of presentations followed by general discussion.

Program outline

9. 30. – 10.00   Kristina Despot and Ana Ostroški Anić: Welcome and introduction – An overview of Figurative Language Repositories and recent attempts of integrating LLMs into figurative language research and benchmarks for testing LLMs to properly model nonliteral language

10.00. – 10.40.    Eve Sweetser: MetaNet – its current status and future directions

10.40. – 11.20.    Alexander Ziem: Conceptual metaphors in the German FrameNet Constructicon: from conception to implementation

11.20. – 11.40.    Coffee break

11.40. – 12.20.    Marko Robnik Šikonja, Kristina Š. Despot, Ana Ostroški Anić, Polona Gantar: A proposal of multilingual figurative language benchmarks

12.20. – 13.00.    General discussion (main discussants Tony Veale and Tiago Torrent)


Despot, Kristina Š., Anić, Ana Ostroški and Veale, Tony. “Somewhere along your pedigree, a bitch got over the wall!” A proposal of implicitly offensive language typology” Lodz Papers in Pragmatics, vol. 19, no. 2, 2023, pp. 385-414.

He, Q., Cheng, S., Li, Z., Xie, R., & Xiao, Y. (2022). Can Pre-trained Language Models Interpret Similes as Smart as Human? ArXiv, abs/2203.08452.

Kim, Jeongyeon; Sangho Suh, Lydia B Chilton, and Haijun Xia. 2023. Metaphorian: Leveraging Large Language Models to Support Extended Metaphor Creation for Science Writing. In Designing Interactive Systems Conference (DIS ’23), July 10–14, 2023, Pittsburgh, PA, USA. ACM, New York, NY, USA 21 Pages.

Liu, Emmy; Chenxuan Cui, Kenneth Zheng, and Graham Neubig. 2022. Testing the Ability of Language Models to Interpret Figurative Language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4437–4452, Seattle, United States. Association for Computational Linguistics.

Shutova, Ekaterina. 2011. Computational approaches to figurative language.

Tong, Xiaoyu; Ekaterina Shutova, and Martha Lewis. 2021. Recent advances in neural metaphor processing: A linguistic, cognitive and social perspective. In NAACL 2021.

Lexicography and Accessibility

Chairs: Geraint Paul Rees, Blanca Arias-Badia, Elisenda Bernal, Sergi Torner

A great deal of research in accessibility studies and lexicography shares a similar aim: to make resources accessible. In accessibility studies, this relates to the extent to which a wide range of resources (e.g., drinking water, buildings, or information) can be accessed by any person irrespective of personal features such as having a disability or impairment. In lexicography, it relates to the extent to which linguistic information is structured and presented in such a way that it is easily accessible to
users in general. Traditionally, the needs of those with physical impairments have been of greater concern in accessibility studies than in lexicography. However, some recent research has highlighted how the websites of several commercial dictionaries are effectively inaccessible for users who cannot navigate with the mouse (Rees 2023). The visual accessibility of resources has been researched extensively in lexicography, typically without considering the specific needs of persons with low vision
or blindness. Research has been conducted on phenomena including ‘signposts’ (DeCesaris, 2012; Dziemianko, 2016, 2019a; Nesi & Tan, 2011), the effect of advertisements in online dictionaries on users (Dziemianko, 2019b, 2020), dictionary typography (Hao et al., 2022), and studies on how to best
visualize information in resources such as writing assistants (Frankenberg-Garcia et al., 2019). A concern for the understandability of lexicographic information, which is especially relevant for people with intellectual disabilities, is not only evident in dictionaries of easy-to-understand language (Perego, 2020) such as Diccionario fácil (García Muñoz, 2019) and Hurraki (Hurraki, 2022) but also more generally in the debates surrounding controlled defining vocabularies(Kamińsky, 2021) and innovative definition styles (Rundell, 2006).

Despite the common ground between lexicography and accessibility studies, currently, many lexicographic resources are, to varying degrees, inaccessible for users with disabilities (Arias-Badia & Torner, 2023; Rees, 2023). This is due in part to a lack of understanding of the challenges users with disabilities face when accessing lexicographic resources and/or a lack of the knowledge and resources necessary to tackle these challenges. The proposed workshop aims to raise awareness of the accessibility issues faced by users with disabilities when using dictionaries and other lexicographic resources and make participants aware of tools and techniques to mitigate them. This will be achieved through a series of short presentations on the findings of research in accessibility studies accompanied by examples of how these have been or could be applied in lexicography. Presentations will be followed by hands-on activities in which participants apply the lessons of previous research.
It is hoped that the lessons learned in the workshop will not only benefit users of lexicographic resources who have disabilities but also users in general. The workshop will be of relevance to those
with an interest in lexicography research, as well as those who have worked on, are currently working on, or plan to work on practical lexicography projects of all scales. It will be of particular significance to those interested in dictionary user interfaces, dictionary use studies, definition styles, and controlled
defining vocabularies. Accordingly, it is expected that the workshop will appeal to a broad cross-section of EURALEX attendees. It could feasibly take place with a minimum of 12 attendees to a maximum of 40 attendees.

Format and Activities
The proposed workshop will last approximately four hours (including a break). It will consist of a series of short presentations followed by guided practical exercises for workshop participants, as illustrated in this indicative agenda:

Block 1: Welcome and introduction to accessibility
Introductions (30 minutes): a short presentation about the relationship between accessibility studies and lexicography.
Block 2: Web accessibility
Previous research on the accessibility of online dictionaries (30 minutes) This presentation summarises the findings of previous research on the accessibility of online dictionaries (e.g., Arias-Badia & Torner, 2023; Rees, 2023). In doing so, it will introduce tools and techniques for evaluating the accessibility of websites and apps. The Web Content Accessibility Guidelines (WCAG) (W3C, n.d.), an international standard that aims to make web content more accessible to people with disabilities, will be discussed. Special emphasis will be placed on those
guidelines relevant to lexicographic resources.
Online dictionary accessibility: a guided analysis (30 minutes)
With the participation of the audience, a workshop organizer carries out a live analysis of an online lexicographic resource.
Online dictionary accessibility: freer analysis (30 minutes)
Participants conduct an accessibility analysis of an online lexicographic resource of their choice.
Workshop organizers will be on hand to help if required.
Break (30 minutes)
Block 3: Easy-to-understand language
Easy-to-understand language applications for lexicography (60 minutes)
This block introduces the notions of easy-to-understand language, easy-to-read language, and clear and plain language and shows the extent to which they have been applied to lexicography to date. The potential of applying the principles of easy-to-understand language in prospective dictionary
projects is explored, with a focus on definition writing. Participants are asked to apply the principles of easy-to-understand language to the definition of a selection of lexical units.
Block 4: Conclusions and questions
Summary and Q&A (30 minutes)
A summary of the key takeaways of the workshop is provided. Participants are given the opportunity to ask any remaining questions or share any comments they may have about
lexicography and accessibility.

Arias-Badia, B., & Torner, S. (2023). Bridging the gap between website accessibility and lexicography: Information access in online dictionaries. Universal Access in the Information Society.
DeCesaris, J. (2012). On the Nature of Signposts. In R. V. Fjeld & J. M. Torjusen (Eds.), Proceedings of the 15th EURALEX International Congress (pp. 532–540). Department of Linguistics and Scandinavian Studies, University of Oslo.
Dziemianko, A. (2016). An insight into the visual presentation of signposts in English learners’ dictionaries online. International Journal of Lexicography, 29(4), 490–524.
Dziemianko, A. (2019a). Homogeneous or Heterogeneous? Insights into Signposts in Learners’ Dictionaries. International Journal of Lexicography, 32(4), 432–457.
Dziemianko, A. (2019b). The role of online dictionary advertisements in language reception, production, and retention. ReCALL, 31(1), 5–22.
Dziemianko, A. (2020). Smart advertising and online dictionary usefulness. International Journal of Lexicography, 33(4), 377–403.
Frankenberg-Garcia, A., Lew, R., Rees, G. P., Roberts, J., Sharma, N., & Butcher, P. (2019, September). Collocations in e-Lexicography: Lessons from Human-Computer Interaction Research [Workshop presentation]. Pre-conference workshop on collocations at eLex 2019, Sintra.
García Muñoz, Ó. (2019). Diccionario Fácil, una propuesta colaborativa para públicos con dificultades
de comprensión lectora. Diccionario Fácil, Una Propuesta Colaborativa Para Públicos Con Dificultades de Comprensión Lectora, 327–345.
Hao, J., Xu, H., & Hu, H. (2022). A Multimodal Communicative Approach to the Analysis of Typography in Online English Learner’s Dictionaries. International Journal of Lexicography,
35(2), 234–260.
Hurraki. (2022). Main Page—Hurraki—Plain Language Dictionary, Hurraki—Dictionary for Plain Language.
Kamińsky, M. P. (2021). Defining with Simple Vocabulary in English Dictinaries. John Benjamins.
Nesi, H., & Tan, K. H. (2011). The Effect of Menus and Signposting on the Speed and Accuracy of
Sense Selection. International Journal of Lexicography, 24(1), 79–96.
Perego, E. (2020). Accessible Communication: A Cross-Country Journey. Frank & Timme.
Rees, G. P. (2023). Online Dictionaries and Accessibility for People with Visual Impairments.
International Journal of Lexicography, 36(2), 107–132.
Rundell, M. (2006). More than one Way to Skin a Cat: Why Full-Sentence Definitions Have not Been Universally Adopted. In C. O. Elisa Corino Carla Marello (Ed.), Proceedings of the 12th
EURALEX International Congress (pp. 323–337). Edizioni dell’Orso. W3C. (n.d.). Web Content Accessibility Guidelines (WCAG) Overview. Web Accessibility Initiative
(WAI). Retrieved 15 November 2021, from