RC2 – Can patterns of word usage tell us what lemon and moon have in common? Analyzing the semantic content of distributional semantic models

Lecturer: Pia Sommerauer
Fields: Computational linguistics, cognitive linguistics

Content

Can patterns of textual contexts in which words appear tell you (or your model) that both, a lemon and the moon are described as yellow and round but differ with respect to (almost) everything else? In other words: How much information about concepts is encoded in patterns of word usage (i.e. distributional data)?

In this course, I will take stock of what we know about the semantic content encoded in data-derrived meaning representations (e.g Word2Vec), which are commonly used in Natural Language Processing and cognitive modelling (e.g. metaphor interpretation).

I will focus on how we can find out whether (and what) semantic knowledge they represent (beyond a general sense of semantic word similarity and relatedness). Drawing on methods in the area of neural network interpretability, I will discuss how we can “diagnose” semantic knowledge to find out whether a model can in fact distinguish flying from non-flying birds or tell you what lemons and the moon have in common.

Objectives

Become familiar with linguistic theories of the semantic encoded in linguistic context and what we could expect from it
Understand how distributional word representations are created, evaluated and used (with practical examples)
Understand why distributional word representations provide rich information for machine learning systems, but at the same time do not allow for straight-forward semantic interpretation
Understand the challenges of diagnostic methods and how they can be dealt with

Literature

Lenci, A., 2008. Distributional semantics in linguistic and cognitive research. Italian journal of linguistics, 20(1), pp.1-31. http://www.italian-journal-linguistics.com/wp-content/uploads/ALenci.pdf
Gladkova, A. and Drozd, A., 2016, August. Intrinsic evaluations of word embeddings: What can we do better?. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP (pp. 36-42). https://www.aclweb.org/anthology/W16-2507.pdf
Sommerauer, P. and Fokkens, A., 2018, November. Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (pp. 276-286). https://www.aclweb.org/anthology/W18-5430.pdf
Sommerauer, P., Fokkens, A. and Vossen, P., 2019. Towards Interpretable, Data- derived Distributional Semantic Representations for Reasoning: A Dataset of Properties and Concepts. In Wordnet Conference (p. 85). https://clarin-pl.eu/dspace/handle/11321/718
A practical introduction to working with word embedding models in Python (also suitable for people with limited coding skills): https://github.com/PiaSommerauer/distributional_semantics/blob/master/embeddings_intro.ipynb

Lecturer

Pia Sommerauer is a PhD student at the Computational Lexicology and Terminology Lab at Vrije Universiteit Amsterdam. Her research focuses on the type of semantic information captured by distributional representations of word meaning and whether they could be used for semantic reasoning. She has authored papers on this topic at venues specialized in lexical semantics and model interpretability together with her supervisors Antske Fokkens and Piek Vossen.

Website: https://piasommerauer.github.io/