Popularity Prediction for Social Media over Arbitrary Time Horizons
Daniel Haimovich, Dima Karamshuk, Thomas Leeper, Evgeniy Riabenko, Milan Vojnovic
Conference on Knowledge Discovery and Data Mining (KDD)
Large online services employ thousands of people to label content for applications such as video understanding, natural language processing, and content policy enforcement. While labelers typically reach their decisions by following a well-defined “protocol,” humans may still make mistakes. A common countermeasure is to have multiple people review the same content; however, this process is often time-intensive and requires accurate aggregation of potentially noisy decisions.
In this paper, we present CLARA (Confidence of Labels and Raters), a system developed and deployed at Facebook for aggregating reviewer decisions and estimating their uncertainty. We perform extensive validations and describe the deployment of CLARA for measuring the base rate of policy violations, quantifying reviewers’ performance, and improving their efficiency. In our experiments, we found that CLARA (a) provides an unbiased estimator of violation rates that is robust to changes in reviewer quality, with accurate confidence intervals, (b) provides an accurate assessment of reviewers’ performance, and (c) improves efficiency by reducing the number of reviews based on the review certainty, and enables the operational selection of a threshold on the cost/accuracy efficiency frontier.
Daniel Haimovich, Dima Karamshuk, Thomas Leeper, Evgeniy Riabenko, Milan Vojnovic
Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga Behram, James Huang, Charles Bai, Michael Gschwind, Anurag Gupta, Myle Ott, Anastasia Melnikov, Salvatore Candido, David Brooks, Geeta Chauhan, Benjamin Lee, Hsien-Hsin S. Lee, Bugra Akyildiz, Max Balandat, Joe Spisak, Ravi Jain, Mike Rabbat, Kim Hazelwood
Liqi Yan, Qifan Wang, Yiming Cu, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu