Workshops – EURALEX XXI

Large Language Models and Lexicography

Chair: Simon Krek

Length: half-day (09:00 – 13:00)

Format: open call for papers/presentations/demos + invited talks
Audience size: 30 participants

Description of the agenda:
The proposed workshop will delve into the integration of large language models (LLMs) in lexicography. The workshop aims to explore how these models aid in linguistic analysis and generation of dictionary data, enhancing dictionary development through automation of processes.
The expected topics will also include identifying new word usages and trends, and how LLMs facilitate multilingual lexicography, as well as the ethical implications of AI in lexicography, including
concerns about bias and cultural sensitivity. The workshop will be of interest to lexicographers and language technology experts, offering insights into the trends of AI-assisted lexicography and
preparing them for digital transformation.
Structure and schedule: two invited talks in the morning + coffee break + accepted presentations

Additional information about the workshop

Figurative Language and Large Language Models

Chairs: Kristina Š. Despot, Ana Ostroški Anić

Rationale

Figurative and metaphorical language constitute a fundamental aspect of human communication and cognition. Despite its ubiquity, figurative language identification and interpretation, as a high-level semantic task, has been a challenge for NLP for at least a decade (Shutova 2011, Tong et al. 2021). Most existing work focuses on identifying figurative language at the word level, often failing to capture novel metaphors effectively. The conventional nature of identified metaphors limits the ability to handle creative and innovative uses of figurative language.

This workshop aims to bring attention to the intersection of figurative language research and the capabilities of Large Language Models (LLMs). Recent advancements in LLMs present new opportunities for metaphor detection and sub-type labeling, as well as for their automatic production and interpretation. LLMs demonstrate superior language understanding and contextual semantic comprehension compared to earlier models, making them promising for figurative language research. However, the evaluation of state-of-the-art language models on the task of figurative language production and interpretation reveals that while models perform above chance, there is still a notable gap compared to human performance, especially in zero- or few-shot settings (Liu et al. 2022; He et al. 2023; Despot, Ostroški Anić, and Veale 2023). Additionally, LLMs fall short of human writers in metaphor writing, emphasizing the need for further improvements in LLMs for generating metaphors that meet writers’ requirements (Kim et al. 2023).

Generative LLMs, trained on vast amounts of text, have the potential to offer a rich pool of metaphor ideas presenting compelling alternatives to traditional approaches. The workshop addresses the challenges specific to metaphor generation and understanding, contributing toward understanding constraints and offering an opportunity for the discussion about the creation of a comprehensive, annotated dataset of metaphors for future research, one that contains novel and high-quality metaphors that are unlikely to have been included in the training data, which is a challenge given the enormous text corpora used to train LLMs.

We aim to provide a platform for researchers and practitioners to collaborate on advancing figurative language understanding using state-of-the-art LLMs, but also on advancing LLMs using state-of-the-art figurative language theories and resources. We would like to facilitate the discussions on methodologies for evaluating language models on nonliteral language understanding tasks, explore applications of LLMs in metaphor detection and generation, emphasizing their potential in addressing shortcomings identified in the research, and define future research directions, particularly in enhancing LLMs to meet the requirements of metaphor generation in various contexts, including creative and extended metaphors.

The proposed workshop, “Figurative Language and Large Language Models,” aims to bridge the gap between figurative language research and the capabilities of modern LLMs. By addressing the challenges in nonliteral language understanding and exploring the potential of LLMs in metaphor generation, the workshop intends to propose benchmarks, foster collaboration, share insights, and pave the way for advancements in the field.

This will be achieved through presentations covering the topic of figurative language and large language models, and a general discussion on the topic. The workshop will be of relevance to those with an interest in semantic resources, figurative language, and large language models.

Format and Activities

The proposed workshop will last 3.5 hours (including a break).

It will consist of presentations followed by general discussion.

Registration for the Workshop

This workshop is held in conjunction with the EURALEX 2024 conference; full details are available on the conference website: https://euralex.jezik.hr/workshops/. Registration for the workshop is managed independently from the conference registration. Participants may choose to register for the conference only, the workshop only, or both. For EURALEX 2024 attendees who have paid the conference registration fee, access to the workshop is complimentary. Participants attending only the Workshop must pay a registration fee of 70 EUR, which includes a coffee break and lunch at the hotel restaurant. For information regarding travel and accommodation, please consult the conference website at https://euralex.jezik.hr/venue/. When booking hotel accommodations, ensure to follow the provided instructions and use the designated code to get a special rate as we have arranged a special price for EURALEX participants if booked until the 7th of August 2024.

To register for the Workshop, please send an email to kdespot@ihjj.hr with your name, affiliation, and a few sentences describing your academic background. Please use “FL & LLMs” as the subject of your email. Conference registration will open in mid-May. Once we receive your expression of interest, we will send you a link to register and pay the registration fee.

Program outline

October 8th 2024, Hotel Croatia, Cavtat, Croatia

9:30 – 9:45 Kristina Despot and Ana Ostroški Anić Welcome and Introduction

9:45 – 10:30 Eve Sweetser and Elise Stickles MetaNet – Its Current Status and Future Directions

10:30 – 11:15 Alexander Ziem Conceptual Metaphors in the German FrameNet Constructicon: From Conception to Implementation

11:15 – 11:45 Coffee Break

11:45 – 12:05 Kristina Š. Despot, Ana Ostroški Anić, Polona Gantar, Mija Bon, Matej Klemen, Marko Robnik Šikonja, Simon Krek, Benedikt Perak, and Jaka Čibej Creating a Multilingual Figurative Language Dataset

12:05 – 13:00 General discussion and research plan Main discussants: Tony Veale and Tiago Torrent

References

Despot, Kristina Š., Anić, Ana Ostroški and Veale, Tony (2023). “Somewhere along your pedigree, a bitch got over the wall!” A proposal of implicitly offensive language typology” Lodz Papers in Pragmatics 19/2, pp. 385-414. https://doi.org/10.1515/lpp-2023-0019

He, Q., Cheng, S., Li, Z., Xie, R., & Xiao, Y. (2022). Can Pre-trained Language Models Interpret Similes as Smart as Human? ArXiv, abs/2203.08452.

Kim, Jeongyeon; Sangho Suh, Lydia B Chilton, and Haijun Xia. (2023). Metaphorian: Leveraging Large Language Models to Support Extended Metaphor Creation for Science Writing. In Designing Interactive Systems Conference (DIS ’23), July 10–14, 2023, Pittsburgh, PA, USA. ACM, New York, NY, USA 21 Pages. https://doi.org/10.1145/3563657.3595996

Liu, Emmy; Chenxuan Cui, Kenneth Zheng, and Graham Neubig (2022). Testing the Ability of Language Models to Interpret Figurative Language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4437–4452, Seattle, United States. Association for Computational Linguistics.

Shutova, Ekaterina (2011). Computational approaches to figurative language.

Tong, Xiaoyu; Ekaterina Shutova, and Martha Lewis (2021). Recent advances in neural metaphor processing: A linguistic, cognitive and social perspective. In NAACL 2021.

Lexicography and Accessibility

Chairs: Geraint Paul Rees, Blanca Arias-Badia, Elisenda Bernal, Sergi Torner

Universitat Pompeu Fabra, Barcelona, Spain

Schedule
8 October 2024, Hotel Croatia, Cavtat, Croatia. Šipun Hall.

9.30 – 10.00 – Welcome and introduction to accessibility
10.00 – 11.00 – Easy-to-understand language applications for lexicography
11.00 – 11.30 – Coffee break
11.30 – 13.00 – Web accessibility and discussion

Rationale

A great deal of research in accessibility studies and lexicography shares a similar aim: to make resources accessible. In accessibility studies, this relates to the extent to which a wide range of resources (e.g., drinking water, buildings, or information) can be accessed by any person irrespective of personal features such as having a disability or impairment. In lexicography, it relates to the extent to which linguistic information is structured and presented in such a way that it is easily accessible to users in general. Traditionally, the needs of those with physical impairments has been of greater concern in accessibility studies than in lexicography. However, some recent research has
highlighted how the websites of several commercial dictionaries are effectively inaccessible for users who cannot navigate with the mouse (Rees 2023). The visual accessibility of resources has been researched extensively in lexicography, typically without considering the specific needs of persons with low vision or blindness. Research has been conducted on phenomena including ‘signposts’ (DeCesaris, 2012; Dziemianko, 2016, 2019a; Nesi & Tan, 2011) , the effect of advertisements in online
dictionaries on users (Dziemianko, 2019b, 2020) , dictionary typography (Hao et al., 2022) , and studies on how to best visualise information in resources such as writing assistants (Frankenberg-Garcia et al., 2019) . A concern for the understandability of lexicographic information, which is especially relevant for people with intellectual disabilities, is not only evident in dictionaries of easy-to understand language (Perego, 2020) such as Diccionario fácil (García Muñoz, 2019) and Hurraki (Hurraki, 2022) but also more generally in the debates surrounding controlled defining vocabularies (Kamińsky, 2021) and innovative definition styles (Rundell, 2006) .
Despite the common ground between lexicography and accessibility studies, currently many lexicographic resources are, to varying degrees, inaccessible for users with disabilities (Arias-Badia & Torner, 2024; Rees, 2023) . This is due in part to a lack of understanding of the challenges users with disabilities face when accessing lexicographic resources and/or a lack of the knowledge and the resources necessary to tackle these challenges. The proposed workshop aims to raise awareness of
the accessibility issues faced by users with disabilities when using dictionaries and other lexicographic resources and make participants aware of tools and techniques to mitigate them. This
will be achieved through a series of short presentations on the findings of research in accessibility studies accompanied by examples of how these have been or could be applied in lexicography.

Presentations will be followed by hands-on activities in which participants apply the lessons of previous research.

It is hoped that the lessons learned in the workshop will not only benefit users of lexicographic resources who have disabilities, but also users in general. The workshop will be of relevance to those
with an interest in lexicography research, as well as those who have worked on, are currently working on, or plan to work on practical lexicography projects of all scales. It will be of particular
significance to those interested in dictionary user interfaces, dictionary use studies, definition styles, and controlled defining vocabularies. Accordingly, it is expected that the workshop will appeal to a broad cross section of EURALEX attendees.

Format and Activities

The workshop will last approximately 3.5 hours (including a break). It will consist of a series of short presentations followed by guided practical exercises for workshop participants.

References
Arias-Badia, B., & Torner, S. (2024). Bridging the gap between website accessibility and lexicography: Information access in online dictionaries. Universal Access in the Information Society.
https://doi.org/10.1007/s10209-023-01031-9

DeCesaris, J. (2012). On the Nature of Signposts. In R. V. Fjeld & J. M. Torjusen (Eds.), Proceedings of the 15th EURALEX International Congress (pp. 532–540). Department of Linguistics and Scandinavian Studies, University of Oslo.

Dziemianko, A. (2016). An insight into the visual presentation of signposts in English learners’ dictionaries online. International Journal of Lexicography, 29(4), 490–524.
https://doi.org/10.1093/ijl/ecv040

Dziemianko, A. (2019a). Homogeneous or Heterogeneous? Insights into Signposts in Learners’ Dictionaries. International Journal of Lexicography, 32(4), 432–457.
https://doi.org/10.1093/ijl/ecz011

Dziemianko, A. (2019b). The role of online dictionary advertisements in language reception, production, and retention. ReCALL, 31(1), 5–22. https://doi.org/10.1017/S0958344018000149

Dziemianko, A. (2020). Smart advertising and online dictionary usefulness. International Journal of Lexicography, 33(4), 377–403. https://doi.org/10.1093/ijl/ecaa017

Frankenberg-Garcia, A., Lew, R., Rees, G. P., Roberts, J., Sharma, N., & Butcher, P. (2019, September). Collocations in e-Lexicography: Lessons from Human Computer Interaction Research [Workshop presentation]. Pre-conference workshop on collocations at eLex 2019, Sintra.

García Muñoz, Ó. (2019). Diccionario Fácil, una propuesta colaborativa para públicos con dificultades de comprensión lectora. Diccionario Fácil, Una Propuesta Colaborativa Para Públicos Con Dificultades de Comprensión Lectora, 327–345.

Hao, J., Xu, H., & Hu, H. (2022). A Multimodal Communicative Approach to the Analysis of Typography in Online English Learner’s Dictionaries. International Journal of Lexicography,
35(2), 234–260. https://doi.org/10.1093/ijl/ecab031

Hurraki. (2022). Main Page—Hurraki—Plain Language Dictionary, Hurraki—Dictionary for Plain Language. https://hurraki.org/english/w/index.php?title=Main_Page&oldid=1048

Kamińsky, M. P. (2021). Defining with Simple Vocabulary in English Dictinaries. John Benjamins.

Nesi, H., & Tan, K. H. (2011). The Effect of Menus and Signposting on the Speed and Accuracy of Sense Selection. International Journal of Lexicography, 24(1), 79–96.
https://doi.org/10.1093/ijl/ecq040

Perego, E. (2020). Accessible Communication: A Cross-Country Journey. Frank & Timme.

Rees, G. P. (2023). Online Dictionaries and Accessibility for People with Visual Impairments. International Journal of Lexicography, 36(2), 107–132. https://doi.org/10.1093/ijl/ecac021

Rundell, M. (2006). More than one Way to Skin a Cat: Why Full-Sentence Definitions Have not Been Universally Adopted. In C. O. Elisa Corino Carla Marello (Ed.), Proceedings of the 12th EURALEX International Congress (pp. 323–337). Edizioni dell’Orso.

W3C. (n.d.). Web Content Accessibility Guidelines (WCAG) Overview. Web Accessibility Initiative (WAI). Retrieved 15 November 2021, from https://www.w3.org/WAI/standards-
guidelines/wcag/

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.