“I just love the internet” – Sebastian Barfort makes the business case for social data science
“I just love the internet” – Sebastian Barfort makes the business case for social data science
“Good to be back”, says Sebastian Barfort, looking around at the photos on the walls at SODAS. Almost ten years ago, he finished his PhD in Economics on this campus. Today, he’s an associate partner at consultancy firm Round. He’s also a member of the employer panel for KU’s Master’s program in Social Data Science, making sure students learn skills that are relevant for the job market. We talk about how he uses data in his own job, and what kind of work social data scientists are best at.
How do you use social data science skills to help companies at Round?
Sebastian: Ultimately, any company decision is a bet on human behavior. All products are guided by assumptions about the decisions that people will make, whether they are consumers or employees of another company. Because of digitization, we now have a lot more information available to inform those bets.
We think digital data is important for companies to analyze for a few reasons. First, there’s so much of it, and that means you can say something about pretty small segments of the consumer base. For many companies a small percentage of the consumers, say ten to twenty percent, drives most of the revenue. In a survey, you might not be able to catch enough of them to understand what they want or how they behave. Also, if someone had negative experiences with the company, they might be more honest about those in an anonymous review on a website than in a survey or a focus group.
On top of that, this kind of data is super scalable. We could expand an analysis to customers in Asia or Latin America at a pretty small cost. And we can go back in time. In a survey, you can’t really ask, “how did you feel about this supermarket two years ago?” We know from academic studies that people’s memory of those things is imperfect to say the least (smiles). And it’s not just noisy, but biased. With online data, we can actually check what people were saying and doing at the time.
What kinds of social data do you use at Round?
It depends. When a product is directly relevant to the consumer, and people are willing to tell the world about their interactions with it, we can use social media data. That was the case for our work with the toy company Maileg, where the toys are aesthetically very pleasing and so people were happy to post pictures of them in their kids’ rooms and so on.
In other situations, we will use news articles, product reviews from review platforms, e-commerce website data, or specialized forums and chat platforms where people are talking specifically about the area that we’re interested in.
We always try to find data that approximates behavior. Whenever we can, we go beyond expressions to look at what people actually do. With Maileg, it turned out that there was a segment of grown-up consumers who were creating whole universes with the toys. You know, I just love the internet. I can get really absorbed by these niche communities. I’m amazed by their enthusiasm and I learn a lot from it.
At SODAS, we like to combine quantitative and qualitative methods to look at a subject from all angles. Do you do the same?
Yes, it’s actually very rare that we will do a quant-only or qual-only analysis. Online data lends itself quite naturally to mixed methods. Before we decide what data to collect, we talk to people about which online sources they actually use when they want to talk about or find out more about a product. And humans who read the posts, look at the photos, see the facial expressions of kids using the toys, they still add a lot of value to our analysis.
Still, I’m excited to find out how much you can let state of the art models do these things as well. Large language models are making the difference between machine and human approaches smaller. Already they can tell us something about mood, colors, atmosphere. Whether we can test our small-sample “netnographic” insights using models on large datasets, that’s a really interesting area of active research. We don’t know yet. But at Round we read a lot of research papers in computational social science and social data science, to stay on top of what models are capable of.
Working with Maileg, you used social data science to help valuate the brand of a toy company. What other use cases in the business world do you see?
Firms like ours can analyze any group that is interesting to understand for an organization and that has some sort of online activity. Say a large foundation wants to increase focus on a particular disease. In order to get that change to happen in the world, you need stakeholders to pick up on your ideas. So they need to know if key people in the media landscape are talking about the themes they are putting out there. If not, then what are they picking up on? Could their interests be added to the foundation’s agenda? Are there new opportunities here the foundation wasn’t aware of?
Sometimes, companies are interested in broad societal debates. Maybe a life science company is producing a new weight loss drug. Political discussions about health, beauty, medicine, and exercise will impact things like subsidies or how the product should be marketed, and so these will be crucial for the company to understand and track going forward.
As a member of the employer panel for our Social Data Science education, where do you think the program has most potential for the outside world?
Social data science means combining an understanding of how people actually behave, informed by social science, with the technical skills to handle the intricacies of modern data sources. That’s more important than ever because the volume of information is exploding. In a study we did using Reddit data, even though we had data going several years back, 60% of our data had actually been generated in the past 12 months.
And the data is complex. It’s easy to count likes, or conversions on a website, or logins. We are already pretty good at squeezing value out of that data. The huge potential is in unstructured data. That is how we communicate important things with each other as humans. We use poems, songs, paintings, tonality. And that kind of data is more and more accessible these days. Social data science can give students the right foundation to tackle problems related to these highly valuable data sources.
What would you like to see the program improve or add?
You should make sure students are aware of their competitive advantage in the market, compared to a pure data scientist or a machine learning engineer. Or compared to a sociologist who spent more time specializing.
Your graduates should be sharper on how their added value is greater than the sum of the parts. Right now it only becomes clear once you work with them, or once they’ve had some time to work with their own skills.
I think many of your students would benefit from spending less time coding and more time thinking, “what can I do, as someone who’s also trained as an anthropologist?” Rather than adopting the vocabulary and approaches from more technical fields, students with a more qualitative background can think deeply about what kind of training data to generate, how to incorporate contextual information into labeling protocols, how to finetune models for a specific problem, and how to validate insights.
At Round, we want everyone to have a mixed background, but we always staff people with technical skills together with people who can’t code at all. I don’t want people with a physics or pure engineering background to analyze human behavior without qualified input from people who are trained in these matters. In my experience, social scientists bring a more nuanced view of all the things about humans that are hard to study, and especially a humility around all the things we don’t really know or understand. I find that to be incredibly valuable.