Automatic Optimization for MapReduce Programs

Jahani, Eaman; Cafarella, Michael J.; Ré, Christopher

The MapReduce distributed programming framework has
become popular, despite evidence that current implemen-
tations are inefficient, requiring far more hardware than a
traditional relational databases to complete similar tasks.
MapReduce jobs are amenable to many traditional database
query optimizations (B+Trees for selections, column-store-
style techniques for projections, etc), but existing systems
do not apply them, substantially because free-form user code
obscures the true data operation being performed. For ex-
ample, a selection in SQL is easily detected, but a selection

Syndicate content