Research from Meta

All Publications

October 24, 2012
Haiping Zhao, Minghui Yang, Xin Qi, Mark Williams, Charlie Gao, Guilherme Ottoni, Drew Paroski, Scott MacVicar, Jason Evans, Stephen Tu

Scripting languages are widely used to quickly accomplish a variety of tasks because of the high productivity they enable. Among other reasons, this increased productivity results from a combination o...

August 16, 2012
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, Michael Paleczny

Key-value stores are a vital component in many scale-out enterprises, including social networks, online retail, and risk analysis. Accordingly, they are receiving increased attention from the research community in an effort to improve their performance, scalability, reliability, cost, and power consumption. To be effective, such efforts require a detailed understanding of realistic key-value workloads. And yet little is known about these workloads outside of the companies that operate them. This paper aims to address this gap.

August 13, 2012
Dhruba Borthakur, Randy Katz, Prashanth Mohan, Tathagata Das, David Zats

Web applications have now become so sophisticated that rendering a typical page may require hundreds of intra-datacenter flows. At the same time, web sites must meet strict page creation deadlines of 200-300ms to satisfy user demands for interactivity. Long-tailed flow completion times make it challenging for web sites to meet these constraints. They are forced to choose between rendering a subset of the complex page, or delay its rendering, thus missing deadlines and sacrificing either quality or responsiveness. Either option leads to potential financial loss.

August 12, 2012
Kedar Bellare, Suresh Iyengar Parthasarathy, Aditya Parameswaran, Vibhor Rastogi

In entity matching, a fundamental issue while training a classifier to label pairs of entities as either duplicates or non-duplicates is the one of selecting informative examples. Although active learning presents an attractive solution to this problem, previous approaches minimize the misclassification rate (0-1 loss) of the classifier, which is an unsuitable metric for entity matching due to class imbalance (i.e., many more non-duplicate pairs than duplicate pairs).

June 22, 2012
Lars Backstrom, Paolo Boldi, Marco Rosa, Johan Ugander, Sebastiano Vigna

Frigyes Karinthy, in his 1929 short story “Lancszemek” (in English, “Chains”) suggested that any two persons are distanced by at most six friendship links. Stanley Milgram in his famous experiments challenged people to route postcards to a fixed recipient by passing them only through direct acquaintances. Milgram found that the average number of intermediaries on the path of the postcards lay between 4.4 and 5.7, depending on the sample of people chosen.

June 1, 2012
Mateusz Berezecki, Eitan Frachtenberg, Michael Paleczny, Ken Steele

Power consumption of data centers had become an important factor in the economy and sustainability of large-scale Web services. Researchers and practitioners are spending considerable effort to characterize Web-scale workloads and evaluating their applicability to alternative, more power-efficient architectures. One such workload in particular is the caching layer, which stores expensive-to-regenerate data in fast storage to reduce service times.

May 16, 2012
Adam D. I. Kramer

In this paper we study large-scale emotional contagion through an examination of Facebook status updates. After a user makes a status update with emotional content, their friends are significantly more likely to make a valence-consistent post.