Crowdsourcing with Contextual Uncertainty

Conference on Knowledge Discovery & Data Mining (KDD)


We study a crowdsourcing setting where we need to infer the latent truth about a task given observed labels together with context in the form of a classifier score. We present Theodon, a hierarchical nonparametric Bayesian model, developed and deployed at Meta, that captures both the prevalence of label categories and the accuracy of labelers as functions of the classifier score. Theodon uses Gaussian processes to model the non-uniformity of mistakes over the range of classifier scores. For our experiments, we used data generated from integrity applications at Meta as well as public datasets. We showed that Theodon (1) obtains 1–4% improvement in AUCPR predictions on items’ true labels compared to state-of-the-art baselines for public datasets, (2) is effective as a calibration method, and (3) provides detailed insights on labelers’ performances.

Latest Publications