Virtual Data Discussion w/ Terne Sasha Thorn Jakobsen & Kristoffer Pade Glavind
Copenhagen Center for Social Data Science (SODAS), is pleased to announce that we are continuing with SODAS Data Discussions this fall.
SODAS aspirers to be a resource for all students and researchers at the Faculty of Social Sciences. We therefor invite researchers across the faculty to present ongoing research projects, project applications or just a loose idea that relates to the subject of social data science.
Every month two researchers will present their work. The rules are simple: short research presentations of ten minutes are followed by twenty minutes of debate. No papers will be circulated beforehand, and the presentations cannot be longer than five slides.
Kristoffer Pade Glavind: TripAdvisor: Large online review dataset
I present a dataset from the online review site TripAdvisor. It includes a large body of reviews (>20 million) from all restaurants in several major Europeans cities, and overall stats for restaurants and sights from about 30 big cities around the world. Further, it includes all reviews for >100.000 individual users. I believe that this dataset has potential to answer a wide range of questions, e.g. regarding trust, information, cultural differences, travel patterns, fake reviews, composition of cities and more. I am looking for research ideas and potentially for collaborators.
Terne Sasha Thorn Jakobsen: Confounds in cross-topic argument mining
Argument mining – the process of finding and extracting arguments from text – can be an important step in the fight against misinformation. A big challenge is learning models that generalise across topics rather than relying on within-topic confounds. Recent work in argument mining approach the problem by training with multiple topics and performing cross-topic evaluations on held-out topics. We question whether this evaluation protocol is sufficient. We emulate the evaluation protocol with state-of-the-art models, for both single-task and multi-task learning, and analyse the models using the interpretability tool LIME, and through ablation experiments. The analysis shows that cross-topic argument mining still rely heavily on within-topic confounds and does not generalise to distant topics.
The SODAS Data Discussion will take place at SODAS in Zoom from 11.00 am to 12.00 noon.
Join Zoom Meeting: https://ucph-ku.zoom.us/j/67702148237?pwd=dWZ2eXNUOFF5Q0YxVHphUmZYZUVCZz09
Meeting ID: 677 0214 8237
Passcode: 164600
If you want to attend the event or want to know more, please write Katrine Herold at katrine.herold@sodas.ku.dk.