SODAS Data Discussion 11 Feb 2022

Copenhagen Center for Social Data Science (SODAS) aspirers to be a resource for all students and researchers at the Faculty of Social Sciences. We therefore invite researchers across the faculty to present ongoing research projects, project applications or just a loose idea that relates to the subject of social data science.

The rules are simple: short research presentations of ten minutes are followed by twenty minutes of debate. No papers will be circulated beforehand, and the presentations cannot be longer than five slides.

Author:
Clara Johan Vandeweerdt, post-doctoral fellow at the University of Copenhagen, Department of Political Science.

Title:
Using Twitter bios to map Americans' identities over time

Abstract:
What social identities do Americans have, and how are those identities changing over time? The most common method for answering this question has been closed-ended survey questions about ethnicity, gender, partisanship and so on. The downside of this approach is that it is difficult to capture rare identities, or identity categories that the researcher had not anticipated. A second approach has been to use open-ended survey questions. These, however, turn out to be rather difficult for respondents to understand, and answers are likely to be heavily influenced by context (e.g. previous survey questions, example identities used to clarify the question, assumed purpose of the research).

Recently, a third method has emerged: extracting identities from the self-descriptions ("bios") that users enter on social media. These bios have the advantage of being spontaneously written self-presentations, which people are able to adapt over time. So far, a few studies have tracked the appearance and disappearance of specific identity terms in user bios on Twitter. In this paper, we take a far more comprehensive approach: we use word embeddings and clustering to group all of the words used in Twitter bios into categories such as "family" or "religious". This allows us to monitor over-time change in the self-presentation of about 3.2 million geo-located American users since mid-2020, across dozens of identity categories that arise from the data itself. The method has exciting application possibilities, including pinpointing uncommon combinations of identities (e.g. "Democrat" and "religious"), linking changes in identities to real-world events (such as elections, scandals or protests) and showing how identities spread through networks.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Author:
Marilena Hohmann, the IT University of Copenhagen.

Title:
Estimating Polarization on a Social Network

Abstract:
Measuring polarity on social networks is important, but all current methods to do so either look only at the distribution of values ignoring the network structure, or are unable to meaningfully reduce the complexity of the input structure. Classically, users' opinions are represented on a spectrum from -1 to +1. We propose to create a polarization score, which estimates the average distance a random walker on the network will have to cover to travel from one opinion to the other. Differently from other methods, this approach takes into account both the dispersion of the opinion scores and how they are distributed on a network.

 We validate our score via toy examples, synthetic networks with increasing complexity, and real world data extracted from Twitter on real issues like gun control, Obamacare, and abortion in the US. We show how our polarization score is able to correctly distinguish between different scenarios that would not be able to be identified with the current alternative approaches.

Venue: CSS, Øster Farimagsgade 5, SODAS conference room 1.2.26.
Or you can also join via Zoom: https://ucph-ku.zoom.us/j/64418196849