Big Data, Big Questions: Can NLP Reveal Power Imbalances?

67l-qujb14w-fabian-irsara It looks like NLP researchers may soon be joining philosophers like Jacques Derrida and Michel Foucault to theorize about the relationship between power and language.

Last Friday, the NLP & Text-As-Data Seminar heard from Vinodkumar Prabhakaran, a postdoctoral fellow at Stanford University who is specializing in computational sociolinguistics.

One of his research projects focuses on the workplace. Today, 96% of all office communication, Prabhakaran explained, occurs through mediums like email. But although email may be more convenient, it has also resulted in more people speaking online in ways that they would not during face-to-face communication. For example, one may feel comfortable speaking more sharply over email than they usually would in person, thanks to the detached, quasi-anonymity of a digital screen.

A drawback to this ‘online disinhibition’ is increased instances of incivility between superiors and subordinates that could then result in reduced productivity. As the first step to tackling the problem is being able to identify such moments of incivility, Prabhakaran’s project involves building an NLP tagger that can predict the direction of power from analyzing a single email between two people.

Working with an email corpus containing around 36,000 emails from a company called Enron, Prabhakaran and his team started by analyzing the structure of email threads and recording who initiated conversations, who ended conversations, and how long conversations lasted between employees and their superiors.

Then, they performed dialog tagging on the content. Each email was categorized as conversational (‘How are you today?’), informational (‘The meeting is at 3 o’clock’), an information request (‘What is the subject of the meeting?’), or a commitment (‘I will submit the report by Friday’).

The most revealing category, however, is Overt Displays of Power (ODP). A superior writing “Do you think you could send the report by today?”, as Prabhakaran points out, transmits a softer tone than one who writes “I need the report by today.” To Prabhakaran, the former opens up an opportunity for the employee to respond in multiple ways, while the latter appears to restrict the employee’s response options—a trait that he believes is characteristic of superiors who display ODP. After training the tagger to recognize these categories, Prabhakaran’s team discovered that it had a 61.8% accuracy rate when identifying superiors from subordinates.

More work is still required to see whether the assumptions and conclusions Prabhakaran’s team has made can be applied to companies outside of the one that they investigated. Additionally, it remains to be seen how accurately email corpuses reflect the actual company culture at hand, given that some people may send many more emails than others and skew the overall results. And, of course, there is always the tricky snag called context: what if one’s superior really does need the report ‘by today’? These probing questions demand complex answers and deeper reflection, but they are precisely why NLP continues to be an increasingly dynamic field.

by Cherrie Kwok

NYU Center for Data Science