Nick Beauchamp is an Assistant Professor in Northeastern University’s department of Political Science. He recent spoke at the Center for Data Science’s Text as Data conference.
What did you study in school? How did you get to what you study now?
Like many data scientists, I took a somewhat winding path to arrive where I am now. Although I started college studying physics and math, I wound up double-majoring in philosophy and literature. I then went to grad school in English at Johns Hopkins, worked on a thesis that combined literature, philosophy of mind, and politics.
How did you get from English studies and politics to data science?
I think because of my background in literature, I was immediately interested in computational text analysis: inferring ideology from speech and the dynamics of opinion change. After doing my work in English at Johns Hopkins, I went to the Carter Center to work on election observation and electoral fraud analysis. That led me back to grad school in politics with a focus on quantitative methods at NYU.
How do the methodologies in data science differ from the social or hard sciences?
The natural tendency in the sciences is towards specialization. We are in a rare – and perhaps brief – moment where one can work in a number of fields – computer science, statistics, a substantive field such as politics – and produce novel and important work. With the social or hard sciences, interdisciplinary synthesis is the exception more than the norm, and I hope the interdisciplinary aspect of data science continues.
When did you start to incorporate data science into your research?
During my graduate work at NYU, one of my first projects was analyzing congressional speech using computational methods to determine speaker ideology. But I already had some programming experience at that point.
Could you tell me about some of the projects you’re working on?
I’m currently working on a project that algorithmically generates more persuasive text by looking at how people shape their opinions through political rhetoric surrounding Obamacare. By analyzing participants’ reactions to three sentence paragraphs, we can figure out which topics are the most persuasive, and look at ways in which the structuring of arguments affects how people respond. It not only gives us insight into what makes a convincing argument, but also how ideas shape political opinions.
I’ve also been modeling online arguments within political groups, which tend to be much more deliberate and persuasive than arguments between differing political ideologies. This allows us to discern mental connections between topics, how these mental connections take shape, and how one gradually shapes their ideology.
There are so many ways you can use data science to look at a problem in politics. Why does text analysis appeal to you?
I’m fundamentally interested in how people think about and argue politics. Ideology in politics has for decades been modeled as a simple left-right spectrum (or perhaps two dimensions), which seemed to me to deeply miss out on the full complexity of political thought – the ways ideas and beliefs interrelate and affect each other, and the ways we learn and change our beliefs. Words and text are really the only window into the full complexity of all of this, and after all, it is through words that we learn and change all of our beliefs. Text analysis lets us see into this incredibly complex psychological and social world, if we develop the right tools to model and understand it.
How has the way in which you incorporate data science into political science changed over the years?
In the “old days” I would develop all my tools from scratch. I still do a lot of that, but text analysis has become a booming industry in the last 5 years, which means there are now a lot of sophisticated models to build upon.
What do you think is the single biggest affect that data science has had on political science, or how we approach political science?
Data science now allows us to build more complex models of social behavior, with richer psychological and social dimensions that speak to personality, ideas, and social movements beyond the left/right dichotomy.
Are there any ways in which text analysis is being used differently in political science than in other fields?
Political science has more sharply defined subject outcomes than other social sciences. But text analysis is also a bit hampered in political science by our emphasis on the unidimensional left-right dimension, which can sometimes limit the analysis and prevent models from capturing the richer complexity of speech.
Are there any areas of your field that aren’t being impacted by data science that should be?
So far the modeling side of our field, as in economics, tends to be dominated by analytic, game-theoretic models rather than computational or agent-based models. The real world of data science is extremely messy and it’s important to have models that are analytically tractable and theoretically understandable. But it will also be interesting to see how more complex computational models will allow more interaction between the messy world of data and the theoretical world of modeling.
For you, what was the biggest takeaway from the Text as Data Conference?
I thought there were a number of great papers focusing on long-standing fundamental problems in text analysis, such as n-grams and sentiment. But I want to see broader social theories incorporated into the computer-science work you see in natural language processing. There’s lot of room for growth on the theory side – not modeling per se, but social theories that have ramifications beyond small-scale political domains.
On a chalk-board at CDS we have the question: “What does it mean to be a data scientist?” Could you answer this question?
A data scientist is someone who finds theoretical and methodological connections across disciplines, and is ready to make new discoveries wherever she finds them. The surge in computation means that many of these discoveries are entirely novel, and are new insights into complex systems that could not have been made with previous methods.