Text as Data

Textual data abound on human communication. We leave textual traces on a great variety of our everyday doings. Our intimate concerns are formulated in google searches, we coordinate community initiatives in Facebook groups and articulate political ideologies on social media platforms. Text has become a fundamental medium through which many people interact, express and position themselves. In Digital Society the analysis, categorization and organization of textual content is off out most importance to governments, businesses, the press and academics alike. The field of computational text analysis is one of the central fields within the wider data revolution. Many methods are being imported into the social science from other fields, especially computer science. But concerns with text models biases and interpretative validity is becoming a growing concern within academia and beyond. This lectures series starts from the premise that text is not just data but social data. Texts are tied to specific contexts and cultural practices, properties that constitutes text data big potential, but also its high risk of being misinterpreted and misclassified - ruining  both interpretation and measurement. More generally textual data presents challenges that ranges from core concerns within machine learning, to deep methodological issues in the social science regarding quantification, interpretation and how to combine qualitative and quantitative modes of analysis. In this lecture series we have invited scholars how have made valuable contributions to the interdisciplinary field of computational text analysis.

Mining discourse patterns of hate and counter speech in Facebook

Post-doctoral researcher at University of Hamburg, Gregor Wiedemann, will give a lecture. Gregor Wiedemann works as a researcher in the Language Technology processing (NLP) group at Hamburg University (Germany). He studied political science and computer science in Leipzig and Miami. In 2016, he received his doctoral degree in computer science for his dissertation “Text Mining for Qualitative Data Analysis in the Social Sciences”. Wiedemann has worked in several projects in the fields of digital humanities and computational social science where he developed methods and workflows to analyze large text collections.


Social media networks offer their users the opportunity to discuss the numerous contents of traditional mass media. Contrary to the idea of “echo chambers”, on Facebook pages of mass media providers people with very different political attitudes meet. There, users are confronted increasingly with hate speech comments, but also with efforts to contradict abusive language and discrimination with counter speech. The talk presents findings from a recent study of such discursive hate- and counter-speech patterns in German Facebook. Based on a large corpus of one million user comments from 2017, a framework for computer-aided critical discourse analysis of social media data is introduced. With the help of topic modeling and text classification, the material is structured in such a way that allows for precise navigation through thematically and categorically filtered subsets allowing for quantitative statistics as well as a seamless integration with qualitative data analysis steps. In a reflective summary of the methodological proceeding, I also give an outlook on promising innovations in natural language processing for automatic analysis of text as social data.

The lecture will take place in building 1, 2ndfloor, room 26 (1.2.26) of the CSS Campus, University of Copenhagen, from 11.00 am to 12.30 pm.

If you have questions or want to know more, please write Sophie Smitt Sindrup Grønning at sophiegroenning@samf.ku.dk.