Apache Hadoop Goes Realtime at Facebook

Borthakur, Dhruba; Sarma, Joydeep Sen; Gray, Jonathan; Muthukkaruppan, Kannan; Spiegelberg, Nicolas; Kuang, Hairong; Ranganathan, Karthik; Molkov, Dmytro; Menon, Aravind; Rash, Samuel; Schmidt, Rodrigo; Aiyer, Amitanand
Borthakur, D
Sarma, J
Gray, J
Muthukkaruppan, K
Spiegelberg, N
Kuang, H
Ranganathan, K
Molkov, D
Menon, A
Rash, S
Schmidt, R
Aiyer, A

Facebook recently deployed Facebook Messages, its first ever
user-facing application built on the Apache Hadoop platform.
Apache HBase is a database-like layer built on Hadoop designed
to support billions of messages per day. This paper describes the
reasons why Facebook chose Hadoop and HBase over other
systems such as Apache Cassandra and Voldemort and discusses
the application’’s requirements for consistency, availability,
partition tolerance, data model and scalability. We explore the
enhancements made to Hadoop to make it a more effective
realtime system, the tradeoffs we made while configuring the
system, and how this solution has significant advantages over the
sharded MySQL database scheme used in other applications at
Facebook and many other web-scale companies. We discuss the
motivations behind our design choices, the challenges that we
face in day-to-day operations, and future capabilities and
improvements still under development. We offer these
observations on the deployment as a model for other companies
who are contemplating a Hadoop-based solution over traditional
sharded RDBMS deployments.

Sigmod 2011
Citations range: 
RealtimeHadoopSigmod2011.pdf431.96 KB