Publications - Meta Research

December 23, 2011

Eitan Frachtenberg, Ali Heydari, Harry Li, Amir Michael, Jacob Na, Avery Nisbet, Pierluigi Sarti

Large-scale datacenters consume megawatts in power and cost hundreds of millions of dollars to equip. Reducing the energy and cost footprint of servers can therefore have substantial impact.

Areas

Systems & Infrastructure

Paper

December 1, 2011

Raman R. Khanna, Leah S. Karliner, Matthias Eck, Eric Vittinghoff, Christopher J. Koenig, Margaret C. Fang

Paper

Performance of an online translation tool when applied to patient educational material

We evaluate the accuracy of state-of-the-art online machine translation systems for translating patient educational material.

Areas

Natural Language Processing & Speech

Paper

July 24, 2011

Chi Wang, Rajat Raina, David Fong, Ding Zhou, Jiawei Han, Greg Badros

Paper

Learning Relevance from a Heterogeneous Social Network and Its Application in Online Targeting

The rise of social networking services in recent years presents new research challenges for matching users with interesting content. While the content-rich nature of these social networks offers many...

Areas

Machine Learning

Paper

July 17, 2011

Adam D. I. Kramer, Cindy K. Chung

Paper

Dimensions of Self-Expression in Facebook Status Updates

We describe the dimensions along which Facebook users tend to express themselves via status updates using the semi-automated text analysis approach, the Meaning Extraction Method (MEM).

Areas

Data Science, Human Computer Interaction & UX,

Paper

July 17, 2011

Lars Backstrom, Eytan Bakshy, Jon Kleinberg, Thomas Lento, Itamar Rosenn

Paper

Center of Attention: How Facebook Users Allocate Attention across Friends

An individual’s personal network — their set of social contacts — is a basic object of study in sociology. Studies of personal networks have focused on their size (the number of contacts) and their composition (in terms of categories such as kin and co-workers). Here we propose a new measure for the analysis of personal networks, based on the way in which an individual divides his or her attention across contacts. This allows us to contrast people who focus a large fraction of their interactions on a small set of close friends with people who disperse their attention more widely.

Areas

Data Science

Paper

July 5, 2011

Jonathan Chang, Eric Sun

Paper

Location3: How Users Share and Respond to Location-Based Data on Social Networking Sites

In August 2010 Facebook launched Places, a location-based service that allows users to check into points of interest and share their physical whereabouts with friends. The friends who see these events in their News Feed can then respond to these check-ins by liking or commenting on them.

Areas

Data Science

Paper

July 1, 2011

Mateusz Berezecki, Eitan Frachtenberg, Michael Paleczny, Ken Steele

Paper

Many-core key-value store

Scaling data centers to handle task-parallel workloads requires balancing the cost of hardware, operations, and power. Low-power, low-core-count servers reduce costs in one of these dimensions, but may require additional nodes to provide the required quality of service or increase costs by underutilizing memory and other resources.

Areas

Systems & Infrastructure

Paper

June 20, 2011

Rubao Lee, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, Xiaodong Zhang

Paper

YSmart: Yet Another SQL-to-MapReduce Translator

MapReduce has become an effective approach to big data analytics in large cluster systems, where SQL-like queries play important roles to interface between users and systems. However, based on our Face book daily operation results, certain types of queries are executed at an unacceptable low speed by Hive (a production SQL-to-MapReduce translator). In this paper, we demonstrate that existing SQL-to-MapReduce translators that operate in a one-operation-to-one-job mode and do not consider query correlations cannot generate high-performance MapReduce programs for certain queries, due to the mismatch between complex SQL structures and simple MapReduce framework. We propose and develop a system called Y Smart, a correlation aware SQL-to-MapReduce translator. Y Smart applies a set of rules to use the minimal number of MapReduce jobs to execute multiple correlated operations in a complex query. Y Smart can significantly reduce redundant computations, I/O operations and network transfers compared to existing translators. We have implemented Y Smart with intensive evaluation for complex queries on two Amazon EC2 clusters and one Face book production cluster. The results show that Y Smart can outperform Hive and Pig, two widely used SQL-to-MapReduce translators, by more than four times for query execution.

Areas

Data Science, Machine Learning, Systems & Infrastructure,

Paper

June 12, 2011

Dhruba Borthakur, Joydeep Sen Sarma, Jonathan Gray, Kannan Muthukkaruppan, Nicolas Spiegelberg, Hairong Kuang, Karthik Ranganathan, Dmytro Molkov, Aravind Menon, Samuel Rash, Rodrigo Schmidt, Amitanand Aiyer

Paper

Apache Hadoop goes realtime at Facebook

Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform. Apache HBase is a database-like layer built on Hadoop designed to support billions of messages per day.

Areas

Databases, Systems & Infrastructure,

Paper

April 10, 2011

Tao Stein, Roger Chen, Karan Mangla

Paper

Facebook Immune System

Popular Internet sites are under attack all the time from phishers, fraudsters, and spammers. They aim to steal user information and expose users to unwanted spam. The attackers have vast resources at their disposal. They are well-funded, with full-time skilled labor, control over compromised and infected accounts, and access to global botnets.

Areas

Security & Privacy, Systems & Infrastructure,

Paper

Research

Research from Meta

All Publications