SODAS Data Discussion, Frederik Georg Hjorth & Snorre Ralund

Data discussion. Green colour in waterCopenhagen Center for Social Data Science (SODAS), is pleased to announce that we are continuing the success with SODAS Data Discussions this spring.


SODAS aspirers to be a resource for all students and researchers at the Faculty of Social Sciences. We therefor invite researchers across the faculty to present ongoing research projects, project applications or just a loose idea that relates to the subject of social data science.


Every month two researchers will present their work. The rules are simple: short research presentations of ten minutes are followed by twenty minutes of debate. No papers will be circulated beforehand, and the presentations cannot be longer than five slides.

Frederik Georg Hjorth, Assistant Professor, Department of Political Science, and Snorre Ralund, PhD student at SODAS, will present at the fouth and final SODAS Data Discussion of this spring the 10th of May.

Frederik George Hjorth:
Establishment Responses to Populist Challenges: Evidence from Legislative Speech

In recent years, many political systems have witnessed the rise of right-wing populist parties, sometimes challenging foundational norms of the established political system. In the face of such challenges, establishment actors face the important choice of how to respond to the challenger party. A rich literature in comparative party politics examines this question. Though typologies abound, the literature broadly speaking identifies two types of response: either engaging with the challenger on par with other parties, or employ a strategy of disparagement, i.e. seeking to portray the challenger as democratically illegitimate. However, the existing literature conceptualizes this response solely as a party system- or party-level phenomenon.
This paper argues that legislative speech, previously unexamined in the literature, offers a window into individual-level variation in establishment responses to right-wing populist challenges. I revisit an oft-studied case in the literature, responses in the Danish party system to the entry of the right-wing populist Danish People's Party in the mid-1990's. I take a text as data approach, applying machine learning methods to a total of around 130,000 paragraphs of legislative speech in order to characterize responses at the level of individual speeches. I link these speech-level estimates to an original data set on political and demographic characteristics of individual legislators.
Using this novel approach, which allows for a uniquely granular characterization of responses to right-wing populist parties, I uncover systematic individual-level, within-party variation in legislators' choice of an engagement or a disparagement response to the entry of the Danish People's Party. The results suggest that political and demographic characteristics of individual legislators, unexamined in the existing literature, play an important role in explaining establishment responses to populist challenges.

Snorre Ralund:
When models don’t speak data: Resolution limit in topic modelling

Topic modelling is a statistical method for the analysis of high-dimensional sparse data widely used across many scientific fields. Application range from automated text classification, recommendation systems, dimensionality reduction, topic-based search engines, and has become the de facto standard for modelling text within the social sciences.
Although widely used in both research and industry for many years, we still lack an understanding of potentially critical behavior of the algorithm under different empirical scenarios. A growing number of studies document critical behavior such as high level of multimodality and instability, but it has only recently been proposed to systematically study the behavior of the algorithm using well-designed synthetic data (Shi et al. 2019).
The model is praised for the ability to model linguistic phenomenon such as polysemy (words carry different meaning given context), heteroglossia (documents contain different voices and themes), and compared to its supervised counterparts in automated content analysis, for its data-driven and inductive qualities. In this paper, we use synthetic data to test the algorithms ability to let the data to drive the results. We show resolution limits and problems with inference related to both the local concentration (document-level), global prevalence (corpus level), separability (word overlap) of the latent topics, and word overlap, with critical consequences for real world scenarios.

The SODAS Data Discussion will take place at SODAS' new location in building 1, 2nd floor, room 26 (1.2.26) of the CSS Campus, University of Copenhagen, from 11.00 am to 12.00 pm.

If you have questions or want to know more, please write Agnete Vienberg Hansen at