6 February 2024

Processing a life like a language helps predict future personal events

processing a life like language
Photo by Christian Bowen on Unsplash

Life is a messy thing: One event happens after another, some related to the same story thread, some belonging to an entirely different part of the narrative. How can we use sequences of events to predict what will happen next in a person’s life? In an already much-discussed paper in Nature Computational Science, researchers from the Technical University of Denmark (DTU), SODAS, and other institutions show how we can model life events by treating them like another series of information bits: language.

The language of life

“The paper proposes a novel method for studying socioeconomic and demographic phenomena by drawing parallels between life trajectories and written language,” says Germans Savcisens, PhD candidate at DTU and first author of the paper. “It introduces an algorithm, life2vec, that can compress information about the progression of someone’s life, and represent it as sequences of numbers, or vectors. Using Danish registry data, Life2vec gives us interpretable predictions of early mortality, the chance that someone will move, and personality nuances related to extraversion.”

In many cases, the methods we use to analyze sequential data related to human lives, such as health outcomes, work histories, and behavior, are insufficient to handle the complex structure of such data. For most classical statistical models (for example, linear regression), you have to predefine specific features. For example, to include information about a person’s past experience with moving houses, we could include the total number of times they moved, or we could count the times they moved as a child, as a teenager and so on. We may also want a feature that indicates whether a person moved from a smaller to a larger city. There are infinite possible features to specify, and as we make these choices, we potentially remove crucial pieces of information.

It gets even more complicated if we try to combine various data sources. For example, we may have information about the date on which each person moved houses, but also when they were diagnosed with Alzheimer's disease or got a job as an astrophysicist. Each of these has a very different significance when it comes to life outcomes. It is not a trivial task to tell a linear model how to distinguish between these events or how to consider the combinations.

The researchers’ solution to this problem came from the idea that human language, when analyzed by computers, comes with similar issues. Just as each event contributes to a person’s life story, each word we add to a sentence brings new context and meaning. In the field of Natural Language Processing, we have models called transformers that can deal with such complex sequences of input. These are the models behind ChatGPT, for example, or Bing Copilot.

So, the authors of the paper propose a method with two major steps: representing individual life trajectories in a text-like format, and using the new transformer-based model life2vec to learn representations of these life trajectory “texts”.

Predicting the lives of Danes

“We used the Danish National Registry, which consists of fine-grained socioeconomic and health information about 6 million Danish residents,” explains Savcisens. “This way, we could assemble life trajectories for each resident: each record in the registry becomes a text-like sentence describing the life event. Next, we created a transformer-based model that could learn the structure of this new ‘language’ of socioeconomic and health events. We call the result life2vec.”

Life2vec can be used to predict the next events in a person’s life. For example, when used to predict mortality, it outperforms other advanced algorithms by 17%. Moreover, it is possible to check the decision-making process and see what factors life2vec takes into account when it makes a prediction. These interpretations are consistent with existing findings in social sciences. For example, mental health and income are highly correlated with mortality outcomes.

Life2vec also learns meaningful relationships between the different concepts in this new language of life events. Life2vec turns each event of a sequence into a vector, or a series of numbers. These vectors turn out to capture real meaning; for example, the vectors for different diagnoses related to pregnancy end up being close to one another, and the vectors for professional occupations related to aviation are similar to each other, too. It opens up the possibility of using life2vec to study how life events can fall into different categories and potentially find the causal links between them.

The researchers hope that their results will become a part of a conversation about the ethical and transparent use of AI in decision-making. “We have provided insights into what state-of-the-art computational technology can infer about human behavior, what is possible today, and what are the current limitations on predictive power. In our case, we do it under the strict regulations of the Danish state,” Savcisens adds.

“Large technology corporations are probably using similar technologies on a vast quantity of human-generated data to predict and influence our behavior. Legislation of big tech is still in the early stages of development, so I hope our work will contribute towards more informed and well-considered regulation of AI.”