Study a Master in Social Data Science and learn more about exploring massive data sets. We integrate classical social science with state-of-the-art tools from data science. You can read more about the master here. View a small video about our programme here:
The development of machine learning methods and the use of big data have been exploding over the past decade, and sometimes presented as an alternative to more traditional approaches in econometrics that have been (and still are) widely used in the social sciences. Historically, the two fields have been competing. However, the recent literature has emphasized that they are complementary and have strong synergies. Machine learning can leverage econometrics to design new methods with improved flexibility to address new complex problems. Moreover, the collection of Big Data creates new challenges for traditional econometric methods in terms of computation and identification. Machine learning may allow researchers to tackle some of these challenges.
The goal of this seminar is to bridge the gaps between the two approaches, and to explore new avenues of research that can crossbreed econometric and machine learning methods in a productive way. This idea has recently been flourishing and has produced a number of relevant insights for economists (Mullainathan, Spiess (2017); Athey (2018)). The set of tools stemming from machine learning can broadly be categorized according to whether they enhance current tools for policy evaluation and causal effects (e.g., use of instrument variables, matching, etc.), or whether they are general procedures estimating models in high-dimensions.
For policy evaluation, for example, the new tools make the following improvement: i) automatic selection of variables for use in regression models (Belloni et al. (2014)); have data driven choices of heterogeneous treatment effects (Athey, Wager (2018)).
The econometric methods that are relevant for this seminar includes the classical, frequentist approach or Bayesian econometrics. The student will have the choice between the two approaches (or possibly decide to use both).
Participants of the seminar will get a chance to work on a number of different projects that are either theoretical or applied, or both. Examples of seminar projects include the replication of old studies that do not make use of these new methods, to check the robustness of their findings against more flexible approaches; collecting new data and applying the methods on these data; extending the new methods in relevant ways.
Course Coordinator: Samantha Breslin
Level: Bachelor, Bachelor choice, Full Degree Master choice
Digital technologies are ubiquitous in our lives today. We socialize, pay our bills, track our fitness, and so on, using the internet and digital devices. Our interactions with digital technologies have produced new forms of identification, as well as new scales of data about ourselves. The growth of ‘big data’ has, in turn, produced new ways of examining our digital selves. This course looks at the nexus of identities, digital data and technologies, and methods, drawing on literature from anthropology, social data science, science and technology studies, and related fields. The course considers how identities have creatively flourished, but also provides a critical interrogation of how gender, race, and other forms of difference and inequality are reproduced in and through digital data and technologies.
This course begins with considering the history of digital data and technologies, and the methods and tools used to understand digital identities from the fields of anthropology and data science. This includes examining differing approaches to ethics in these fields. The course will also explore theories about identity and different ways identities are constructed and performed through digital technologies, such as social media, internet cultures, and fitness trackers. We will also explore the identities of those who design and build these technologies, the politics and norms reproduced through technologies themselves, and the effects they have with particular attention to the role of gender and race. Finally, we will
consider the political economic contexts of these technologies and the formation of digital identities.
The rise of new types of digital data and the varieties of social life taking place on social media platforms enable new relations between quantitative and qualitative methods of inquiry and analysis. How such new complementarities are best exploited for social-scientific and practical purposes will be the focus of this course.
The course will offer students the opportunity to learn digital methods across the qualitative and quantitative spectrum, and to experiment with how these complement one another. The course will be structured around a mini research project, which takes students through several stages: from research design, data collection, data cleaning, to analysis. Methodologically, students will learn a wide range of methods and skills for conducting digital and computational research.
First, the course will take the form of a python programming boot camp where students will learn the python programming necessary for being efficient and flexible in working with digital data. We will learn basic the syntax, data types and structures in python and how to manipulate and work with tabular, text and network data. The boot camp will also introduce to various python libraries that will be used throughout the course hereunder numpy, pandas, networkx, scikit-learn and scipy. This first part of the course will take up one third of the course.
Second, the course will introduce students to the fundamentals of digital ethnography on social media platforms and other online spaces (e.g. Facebook groups, Twitter hashtags, Reddit threads, intranets, discussion fora, etc.). Digital ethnography involves conducting participant observation and interviews in digital spaces for the purpose of learning the dynamics of a particular online social setting. Digital ethnography thus provides the interpretative grounding which ensures that the categories and social processes that will be quantitatively assessed through digital and computational methods later in the course, are grounded in the meaning-making practices of the actors themselves.
Third, students will learn how to scrape, clean and do visualizations, network analysis, clustering and multidimensional scaling in and with python. These techniques will be used to explore, map out and visualize the varying densities and differences in data for the purpose of qualitatively exploring patterns in and across online social settings.
Fourth, students will learn how use computational text analysis, hereunder supervised machine learning, to quantify aspects of their qualitative inquiry. The course will go through more classical issues in content analysis around the construction of coding schemes and various forms of validity issues relevant for working with textual data. The course will also work through how to train models to automatically label textual data in ways that are sensitive to biases and limitations to of automated text analysis. Finally, the course will run through various simple analytical strategies to analysis the labeled data.
Students will first be tasked with finding a project idea, and to invent a research design that combines qualitative and quantitative digital methods around an online case study. Following this, students will be tasked with collecting data through ethnography, digital methods and computational social science programming. Students will then be tasked to conduct an analysis of the case study, while reflecting on the methodological, epistemological and practical aspects of combining heterogeneous datasets and methods.
"Social Data Science: Econometrics and Machine Learning" is one of two new courses in Social Data Science, that build on the introductory summer school course in social data science. The courses introduce students to three new essential data structures and teach state of the art methods for applying data science and machine learning techniques by practical examples and hands-on experience. In each course, we discuss how novel social data science applications apply these tools.
"Social Data Science: Econometrics and Machine Learning" focuses on methods for analyzing on tree based model and causal models as well as social networks.
The course has a dual focus on both methods and data structures. In terms of methods we investigate the interesection of microeconometrics and machine learning. From a data perspective we cover relational data which may cover complex and social networks and Geographic Information System (GIS). The methods includes some very recent developments in econometrics, in terms of hybrid statistical models that are built for econometrics but leverage machine learning as well as fundamental and novel models for estimation in networks.
The course begins by introducing spatial/GIS data and then introduces tree and kernel based models. The course then proceeds to cover machine learning models for causal inference. We next introduce networks and relational data as a canonical data type. Networks are essential for representing systems of interaction such as information transmission, social behavior as well as for risk in the interbank markets. Finally, we cover methods for estimating choice models and how they relate to estimating models for social spillovers and netw ork formation.
"Social Data Science: Text Data and Deep Learning" is one of two new courses in Social Data Science, that build on the introductory summer school course in social data science. The courses introduce students to three new essential data structures and teach state of the art methods for applying data science and machine learning techniques by practical examples and hands-on experience. In each course, we discuss how novel social data science applications apply these tools.
"Social Data Science: Text Data and Deep Learning" focuses on methods for analyzing unstructured data. Unstructured data such as images, video and text used to be confined to small N qualitative studies within the social sciences. How ever, recent developments in both natural language processing (NLP) and computer vision (CV) - broadly speaking the field of AI - hold great promises to social data scientists wishing to supplement deep qualitative readings and analysis of unstructured data, with quantitative insights and generalization from large corpuses of unstructured text and images.
The course begins with an introduction to neural networks and transfer learning. In many cases involving unstructured data, the high dimensionality of both language and vision means that the old supervised learning paradigm of training models from scratch using limited instructive samples (training data) is either impossible or very inefficient. In these cases, transfer learning can be used to adopt large pre-trained models, trained on very large labeled or unlabeled datasets, to a specific task. Next, we cover text as data, an abundantly and readily available data source in the form of news articles, speeches, forum threads, social media posts, encyclopedia, et cetera. Lastly, the course introduces methods for using digital images as data.
The objective of this course is to learn how to analyze, gather and work with modern quantitative social science data. Increasingly, social data that capture how people behave and interact with each other is available online in new, challenging forms and formats. This opens up the possibility of gathering large amounts of interesting data, to investigate existing theories and new phenomena, provided that the analyst has sufficient computer literacy while at the same time being aware of the promises and pitfalls of working with various types of data.
In addition to core computational concepts, the class exercises will focus on the following topics
1. Gathering data: Learning how to collect and scrape data from websites as well as working with APIs.
2. Data manipulation tools: Learning how to go from unstructured data to a dataset ready for analysis. This includes to import, preprocess, transform and merge data from various sources.
3. Visualization tools: Learning best practices for visualizing data in different steps of a data analysis. Participants will learn how to visualize raw data as well as effective tools for communicating results from statistical models for broader audiences.
4. Prediction tools: Covering key implementations of statistical learning algorithms and participants will learn how to apply and interpret these models in practice.
You can find the course description here.
More information about the course here.
The age of social big data brings with it a range of ethical, legal and political issues. From the ethics of protecting individual online privacy, to the legal frameworks regulating internet giants such as Facebook and Google, new data governance issues surface at a rapid pace. This course provides students with an introduction to key legislative, political and ethical principles and debates from the perspectives of anthropology, law, sociology, political science, and related disciplines, concerning the governance of data, needed for a range of analysis and management positions across private, public and non-profit organizations.
Data governance concerns the overall management of the availability, usability, integrity and security of data used in private, public and non-profit organizations. Comprehensive data governance addresses issues of data stewardship, ownership, compliance, privacy, data risks, data sensitivity and data sharing, including how such issues exist between different entities within the same organization. It involves thinking through issues such as: What do new forms of data-driven surveillance mean for relations between citizens, businesses and nation states, and how are new legal issues such as the legal basis for decision support systems and algorithmic decisions in public and private organizations addressed within current European legislation?
Students will be taught how to develop and implement ethically and politically informed procedures and infrastructures for organizing, managing and maintaining data and data products in public and private organizations. The course also introduces the most recent ethical and social-scientific models of data governance, including organizational models and risk assessments, and asks students to apply them to a real-world case of problem solving.
Casework takes students through the main phases of data governance analysis and practice: identification of a data-related problem and its internal and external stakeholders; analysis of how legal, technical-infrastructural and social-organizational components of the problem interrelate; pre-screening of possible solutions, including their respective risks; and final proposal and pilot check of a new data governance scheme expected to be robust in the face of foreseeable near- and mid-term challenges. By drawing on cutting-edge research in anthropology, law, sociology, and related disciplines, the students will also be able to contenxtualize and situate the case-based work within existing scientific debates concerning data governance and ethics.
The course is organized into three parts. First, we begin with an introduction to what can be done with social big data under current Danish and EU laws. This is followed by a consideration and discussion of what should (and should not) be done in more political and ethical terms. And finally, the course will discuss what could be done in terms of governance in different sectors of public administration (health, education, etc.), in the private business sector and in the non-profit sector.
More information about the course here.
Course Coordinator: Hjalmar Alexander Bang Carlsen
Level: Bachelor, Bachelor choice, Full Degree Master, Full Degree Master choice
This course equips students with the analytic skills and reflexive capacities needed to engage critically but productively with various new 'device-aware' styles of social analysis assisted by digital and computational means. It does so, firstly, by way of reading paradigmatic analyses of the nature of social behaviors, networks and ideas, focusing both on classical concepts and contemporary research frontiers.
Examples are drawn from across all the social-science disciplines, and core interdisciplinary convergences are identified. Second, the course takes students through all the methodological steps of social research design, analysis and interpretation, tying these steps to practical examples and to students' own projects (from other courses). Here, key initial questions include: what are the implications of working with different data types (static vs. dynamic; broad vs. deep); how to think about and practically handle data biases stemming from digital platforms and devices (noise, bots etc.); how to build ethical considerations in from the start of data harvesting (digital research ethics)?
In a next step, students are introduced to key methodological traditions often underlying the analysis of behaviors, networks and ideas, respectively (causal-experimental; pattern search; meaning-oriented), as well as to ways of working across them using various digital data sources as well as combining with other sources (including both quantitative and qualitative). In a final step, students learn how to think critically about the interpretation of their social data analyses, including issues of internal and external validity, representativeness and generalizability, as well as analytical induction and concept work.
Rounding up, thirdly, students are introduced to frameworks for thinking about the changing place of social research in digital societies, including the possibilities and challenges opened up by greater interdisciplinary collaboration as well as new types of academia-industry-government partnerships.
More information about the course here.
The objective of this course is to teach students how to leverage the data science toolbox for use in social science. We emphasize the use of new data sources associated with communication, behavior, transactions, etc., which are increasingly available through the web and by collection from the various devices we use. These new sources of structured and unstructured data allow for testing and validation of existing theories in social science as well as development of new ones. Performing these analyses, however, requires an ability to understand and apply methods from the computational sciences. We build on the foundational course in social data science to teach these fundamental skills.
We introduce students to the essentials of data structure and structuring and teach state of the art methods for applying data science and machine learning techniques. We do this by using practical examples and provide students with hands-on experience. We will build on the knowledge from the basic Social Data Science course.
The first canonical data structure we introduce is network and relational data. This data type is ubiquitous when analyzing data from social media, communication on cell phones or data on physical meetings. The second data type is spatial data which includes data on shape and structure of shops, buildings, administrative boundaries, etc. but also includes personal data from GPS on smartphones, cars and much more. The final data type is text data which is present everywhere as documents, online discussions etc. For each of the three datatypes we will teach various tools to work with them in practice.
We teach students a high level of applied machine learning. We will provide an in-depth review of the advantages and disadvantages of standard machine learning techniques, i.e. supervised machine learning (regression, classification) and unsupervised learning. In addition we will teach tools from the frontier of applied social data science that leverages machine learning for causal inference.
The teaching is built around empirical examples: the course aims at developing good practices in data analysis, including thorough exploratory analysis, reliable collection and cleaning of data, visualization skills and statistical sensitivity analysis.
The course will emphasize a complete approach to working with data - from data collection - over data structuring (i.e. parsing, cleaning, transformation, and merging) - to exploratory analysis, and finally reporting of the results.
Course Coordinator: Hjalmar Alexander Bang Carlsen
Level: Full Degree Master, Full Degree Master choice
It is widely recognized that growth in digital data formats enable new relations between quantitative and qualitative methods of inquiry and analysis, thus posing questions as to how such new complementarities are best exploited for social-scientific and practical purposes.
This course will run through a number of ways in which qualitative and quantitative modes of inquiry complement one another. They will compose a research strategy that ensures both qualitative methods strength in valid interpretation of the meaning and consequence of social actions and quantitative methods strength in generalization and structural analysis.
The course will structured around a mini research project from data collection to analysis. In the data collection part we will combine quantitative sampling theory and qualitative case selection theory to ensure a good sample. For exploring the data we will discuss and use a set of methods meant to map out ones data and display the varying densities and differences in ones data(network analysis, clustering, multidimensional scaling).
These maps work as navigational device to ensure that one digital ethnography gets hold of the relevant variation in ones data. The digital ethnography entails a practice of participant observation in digital spaces (e.g. a Facebook group, a company intranet board) for the purpose of learning the details of the values and motivations characteristic of social interaction in this setting.
The digital ethnography provides the interpretative grounding which insures that the categories and social processes, that in the later stage will be quantitatively assessed, have a hold in the meaning-making practices of the actors themselves. In order to explore, generalize and test large scale patterns we need to translate our ethnographic description into something quantifiable.
We will use supervised machine learning in order to scale our qualitative categorization/coding. The student will learn the pro's and con's of various strategies regarding ones sampling, optimization strategies, the uses of unsupervised methods as input.
Importantly we will pay a lot of attention to biases detection and correction to ensure the reliability and validity of our supervised machine learning model. This categorized dataset can then be used of more or less simple quantitative analysis which together with the digital ethnography will make the mini analysis.
More information about the course here.
I dag ser vi, at en række centrale samfundsmæssige spørgsmål i stigende grad bygger på kompliceret teknologisk og videnskabelig viden, hvorom eksperterne ofte er uenige. Det gælder f.eks. i spørgsmålet om genmanipulerede fødevarer, digitale formater, energiforsyning, klimatilpasning, vurdering af miljørisici, stamcelleforskning, vacciner, etc. Men tillige sociale og kulturelle temaer såsom kulturarv, kommasætning, multikulturalisme, økonomiske prognoser, retsnormer, osv. udgør vidensfelter, hvor eksperter ofte strides.
KOVIKO har til formål at sætte den studerende i stand til at kortlægge, analysere og grafisk visualisere sådanne teknologiske og/eller videnskabelige kontroverser, primært gennem brug af digitale metoder og redskaber (såsom Issue Crawler). Herigennem opnås kompetencer, som er centrale for at navigere i et globalt videns- og risikosamfund.
Mere information om kurset her.