Optimizing Joins in a Map-Reduce Environment

Afrati, Foto N.; Ullman, Jeffrey D.

Implementations of map-reduce are being used to perform
many operations on very large data. We examine strategies
for joining several relations in the map-reduce environment.
Our new approach begins by identifying the “map-key,” the
set of attributes that identify the Reduce process to which a
Map process must send a particular tuple. Each attribute of
the map-key gets a “share,” which is the number of buckets
into which its values are hashed, to form a component of the
identifier of a Reduce process. Relations have their tuples

Syndicate content