LLM's unintended memories

Speaker: Yves-Alexandre de Montjoye

Title: LLM's unintended memories

Abstract: A year ago, ChatGPT surprised the world with its extraordinary language generation capabilities. Chatbots have since become of the fastest adopted consumer product in history with investments in genAI forecasted to reach $12B this year. In this talk, I will first review the fast-evolvingliterature on the document-level membership inference task for LLMs: the methods proposed to detect--a posteriori--whether a specific piece of text was seen during training by an LLM and at least partially memorized, the distribution shift concerns, and some of the solutions proposed. I will then discuss the use of randomized controlled setups to causally study LLM memorization. In particular, Iwill discuss how randomized controlled setup have shed lights on the determinant of memorization and showed LLMs to have a mosaic memory. I will conclude the talks with same thoughts on the security and privacy challenges ahead when it comes to LLMs and the use of synthetically generated trap sequences for membership inference.