Researchers have created an Artificial Intelligence tool that uses sequences of life events—such as health history, education, job and income—to predict everything from a person’s personality to their mortality.
Built using transformer models, which power large language models (LLMs) like ChatGPT, the new tool, life2vec, is trained on a data set pulled from the entire population of Denmark—6 million people. The data set was made available only to the researchers by the Danish government.
The tool the researchers built based on this complex set of data is capable of predicting the future, including the lifespan of individuals, with an accuracy that exceeds state-of-the-art models. But despite its predictive power, the team behind the research says it is best used as the foundation for future work, not an end in and of itself.
“Even though we’re using prediction to evaluate how good these models are, the tool shouldn’t be used for prediction on real people,” says Tina Eliassi-Rad, professor of computer science and the inaugural President Joseph E. Aoun Professor at Northeastern University. “It is a prediction model based on a specific data set of a specific population.”
Eliassi-Rad brought her AI ethics expertise to the project. “These tools allow you to see into your society in a different way: the policies you have, the rules and regulations you have,” she says. “You can think of it as a scan of what is happening on the ground.”
By involving social scientists in the process of building this tool, the team hopes it brings a human-centered approach to AI development that doesn’t lose sight of the humans amid the massive data set their tool has been trained on.
“This model offers a much more comprehensive reflection of the world as it’s lived by human beings than many other models,” says Sune Lehmann, author on the paper, which was recently published in Nature Computational Science. A Research Briefing on the topic is presented in the same journal issue.
At the heart of life2vec is the massive data set that the researchers used to train their model. The data is held by Statistics Denmark, the central authority on Danish statistics, and, although tightly regulated, can be accessed by some members of the public, including researchers. The reason it’s so tightly controlled is it includes a detailed registry of every Danish citizen.
The many events and elements that make up a life and are spelled out in the data, from health factors and education to income. The researchers used that data to create long patterns of recurring life events to feed into their model, taking the transformer model approach used to train LLMs on language and adapting it for a human life represented as a sequence of events.
“The whole story of a human life, in a way, can also be thought of as a giant long sentence of the many things that can happen to a person,” says Lehmann, a professor of networks and complexity science at DTU Compute, Technical University of Denmark and previously a postdoctoral fellow at Northeastern.
The model uses the information it learns from observing millions of life event sequences to build what is called vector representations in embedding spaces, where it starts to categorize and draw connections between life events like income, education or health factors. These embedding spaces serve as a foundation for the predictions the model ends up making.
One of the life events that the researchers predicted was a person’s probability of mortality.
“When we visualize the space that the model uses to make predictions, it looks like a long cylinder that takes you from low probability of death to high probability of death,” Lehmann says. “Then we can show that in the end where there’s high probability of death, a lot of those people actually died, and in the end where there’s low probability of dying, the causes of death are something that we couldn’t predict, like car accidents.”
The paper also illustrates how the model is capable of predicting individual answers to a standard personality questionnaire, specifically when it comes to extroversion.
Eliassi-Rad and Lehmann note that although the model makes highly accurate predictions, they are based on correlations, highly specific cultural and societal contexts and the kinds of biases that exist in every data set.
“This kind of tool is like an observatory of society—and not all societies,” Eliassi-Rad says. “This study was done in Denmark, and Denmark has its own culture, its own laws and its own societal rules. Whether this can be done in America is a different story.”
Given all those caveats, Eliassi-Rad and Lehmann view their predictive model less like an end product and more like the beginning of a conversation. Lehmann says major tech companies have likely been creating these kinds of predictive algorithms for years in locked rooms. He hopes this work can start to create a more open, public understanding of how these tools work, what they are capable of, and how they should and shouldn’t be used. https://news.northeastern.edu/2023/12/19/predictive-ai-human-lifespan-model/
Recent Comments