Copenhagen Center for Social Data Science (SODAS)

Interpretability and Model Analysis in NLP

Both research and commercial NLP applications rely on state-of-the-art deep learning models, but their inherent opacity remains a challenge. Model analysis and interpretability are a subfield of Natural Language Processing that focus on better understanding blackbox models.

This research area is concerned broadly with understanding blackbox NLP models: what happens during training NLP models and at inference time, interpreting model predictions, understanding how and when it fails and how we can help it to generalize to unseen data.

Model analysis

BERT remains one of the most popular NLP models, but we still know little about how it achieves its remarkable performance, and to what extent we should trust its linguistic knowledge.

Bhargava, P., Drozd, A., & Rogers, A. (2021). Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics. Proceedings of the Second Workshop on Insights from Negative Results in NLP, 125–135. https://aclanthology.org/2021.insights-1.1

Kovaleva, O., Kulshreshta, S., Rogers, A., Rumshisky, A. (2021) BERT Busters: Outlier LayerNorm Dimensions that Disrupt BERT. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. https://aclanthology.org/2021.findings-acl.300.pdf

Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics, 8, 842–866. https://doi.org/10.1162/tacl_a_00349

Prasanna, S., Rogers, A., & Rumshisky, A. (2020). When BERT Plays the Lottery, All Tickets Are Winning. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 3208–3229. https://www.aclweb.org/anthology/2020.emnlp-main.259/

Featured in The Gradient

Prior relevant work by the current SODAS staff

Kovaleva, O., Romanov, A., Rogers, A., & Rumshisky, A. (2019). Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 4356–4365. https://doi.org/10.18653/v1/D19-1445

Rogers, A., Drozd, A., Rumshisky, A., & Goldberg, Y. (2019). Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP. https://www.aclweb.org/anthology/papers/W/W19/W19-2000

Rogers, A., Hosur Ananthakrishna, S., & Rumshisky, A. (2018). What’s in Your Embedding, And How It Predicts Task Performance. Proceedings of the 27th International Conference on Computational Linguistics, 2690–2703. http://aclweb.org/anthology/C18-1228

Li, B., Liu, T., Zhao, Z., Tang, B., Drozd, A., Rogers, A., & Du, X. (2017). Investigating different syntactic context types and context representations for learning word embeddings. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2411–2421. http://aclweb.org/anthology/D17-1257

Rogers, A., Drozd, A., & Li, B. (2017). The (Too Many) Problems of Analogical Reasoning with Word Vectors. Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (* SEM 2017), 135–148. http://www.aclweb.org/anthology/S17-1017

Drozd, A., Gladkova, A., & Matsuoka, S. (2016). Word embeddings, analogies, and machine learning: Beyond king - man + woman = queen. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 3519–3530. https://www.aclweb.org/anthology/C/C16/C16-1332.pdf

Interpretability methods

Being able to explain predictions of blackbox models is prerequisite to their safe deployment, especially in the areas where their decisions can have significant consequences and have to avoid certain kinds of biases.

- Gonsalez, A.V., Rogers, A., Søgaard, A. (2021) On the Interaction of Belief Bias and Explanations. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. https://aclanthology.org/2021.findings-acl.259

Researchers

Internal researchers

Name	Title	Phone	E-mail
Search in Name	Search in Title	Search in Phone

Funded by:

Copenhagen Center for Social Data Science (SODAS)

Full project name:
Interpretability and model analysis: understanding blackbox models in NLP

Contact

Anna Rogers
Postdoc
Social Data Science
Mail: arogers@sodas.ku.dk
Phone: +45 35 32 65 48

External researchers:

Name	Title	Phone	E-mail
Alexander Drozd	Research scientist at RIKEN CSS	+81-80-4332-5304	E-mail
Anna Rumshisky	Associate professor at UMASS	+978-934-3619	E-mail
Anders Søgaard	Professor at DIKU, UCPH	+45 35 32 90 65	E-mail