May 2, 2017

Crowd Control: Improving Quality Assurance in Crowd Sourced Labeling

Although computers surpass humans in both intelligence and speed, there are still some tasks that they cannot perform, like language translation, writing product descriptions, or image tagging. Enter crowd sourcing micro-task platforms like Amazon Mechanical Turk (AMT), where humans are paid to complete the work that computers can’t.

Not only do these platforms provide affordable labor for tech giants who require human skills, but they also promise high productivity. Some challenges associated with these platforms, however, are the varying skill sets that the “crowd” has, which results in equally varied task execution. Inserting a quality verification stage is possible but unlikely as it would raise prices for tasks to be completed.  How else, then, can we improve the quality of crowd-sourced tasks while keeping costs low?

Panagiotis G. Ipeirotis and Foster Provost, two professors affiliated with CDS and working at NYU’s Stern School of Business, have recently been addressing this very question in their paper, Cost-Effective Quality Assurance in Crowd Labeling.

Seeking to strike a balance between the main advantage and challenge of micro-crowdsourcing, Ipeirotis and Provost propose implementing a two-phase framework. Taking labeling as an example of a crowd-sourced task, phase one of their framework is label allocation, where workers label objects as usual. Phase two, however, is the inference phase where an algorithm derives the true labels of the objects.

What makes Ipeirotis and Provost’s framework unique is that it allocates crowd sourcing tasks to the worker based on their ability. As labeling tasks often vary in difficulty, when a worker is completing one task, the framework provides “algorithmic estimates of object and worker quality from all the labels obtained so far” so that, next time, it can allocate tasks that match the worker’s abilities more closely. By collecting data on the worker’s performance, this adaptive system could improve the quality of the work that crowd sourcing platforms produce since workers would only be allocated tasks within skill level.

A value metric was also presented in the paper. True to its name, the metric measures the quality of a worker’s labeling, which allows employers to reward workers who meet their quality requirements with bonuses. For interested parties, the software that Ipeirotis and Provost have created is available for use here.

by Nayla Al-Mamlouk