Transformers and sequential data representations for big administrative data
Public defence of PhD thesis by Magnus Lindgaard Nielsen.
This thesis explores the use of sequential data representations with big administrative data to predict outcomes across multiple domains. Transitioning away from the traditional tabular data representations, i.e., rectangles of numbers, enables more granular data representations where each person is represented by a sequence of life events. These life-event sequences are then used as input to the transformer architecture to predict diverse life outcomes across four articles. In articles 1, 2, and 3 the outcomes of interest are higher education completion, vulnerable youth status, academic performance in primary school, and income in early midlife across; article 4 expands to 21 nationwide prediction tasks covering all facets of life. The contribution is threefold: First, the articles remain anchored in important outcomes which are of interest in the social sciences, both academically and for policy. Second, it explores new methods and how these new methods can create better predictive models. Third, it examines the benefits of expanding the diversity of data.
Assessment committee
- Professor Roberta Sinatra, University of Copenhagen (chair)
- Professor Matthew Salganik, Princeton University
- Professor Luca Maria Aiello, IT University of Copenhagen
Supervisor
- Associate Professor Andreas Bjerre-Nielsen, University of Copenhagen
