Analyzing political texts has traditionally required political experts who possess a deep understanding of the source material. Yet, this method is slow and expensive. Can we produce political data using non-experts to lower costs and increase efficiency?
This question is at the heart of CDS’ Michael Laver’s research paper, co-written with Kenneth Benoit (LSE / TCD), Drew Conway (NYU), Benjamin Lauderdale (LSE), and Slava Mikhaylov (UCL), titled “Crowd-Sourced Text Analysis: Reproducible and Agile Production of Political Data.”
A major problem with relying on expert data is reproducibility. Researchers must be able to reproduce a study to allow for independent verification or disproving results. But expert generated data, Laver says, is challenging to reproduce because the pool of experts is usually very small, while the pool of political data is often quite large. It would be too expensive for political scientists to repeat analysis of a large corpus of political texts, or rehire from a restricted supply of skill labor.
Laver’s approach, however, proposes using non-experts from the crowd to complete the job of experts, relying on the sheer number of analysts to arrive at results “indistinguishable from expert approaches.”
Laver and the team used Crowdflower, an online aggregator of crowd-sourcing companies, to find non-expert analysts. The analysts were then asked to answer simple questions about text fragments from political texts. Their responses were used to build a scale measuring a latent trait.
For example, the researchers asked the non-expert analysts if a particular sentence fragment referred to economic policy, social policy, or neither. Then, they used a scaling model based on Item Response Theory (a theory developed to control for the difficulty of survey questions and the bias of analysts) to aggregate rater responses to the fragments.
“Intuitively, we could do this by simply averaging the rater responses for each text fragment… That works, but does not take account of the possibility that some text fragments are harder to rate than others (with higher variance in the rater responses) and some raters are “worse” than others… or systematically biased in some sense,” Laver explained.
They compared the results from crowd-sourcing to a control group comprising of four to six experts who analyzed the same text fragments and concluded that “uncertainty over the crowd-based estimates collapse as we increase the number of workers per sentence.” The more non-experts that rated an item, the closer the overall rating reached the expert rating.
The team’s results have many implications for political science research, foremost of which is that crowd-sourcing “takes us significantly closer to a true scientific replication standard.” While more work needs to be done for refining the quality control of crowd workers in the future, Laver is optimistic about the potential of crowd-sourcing, especially for researchers with little funding. Crowd-sourcing could help them generate new data sets with less money than they would require to hire experts.
“We are still in the early days of crowd-sourced data generation in the social sciences… we now have a new method for collecting political data that allows us to do things we could not do before,” Laver said.