Processing theta-joins using MapReduce

Okcan, Alper; Riedewald, Mirek

Joins are essential for many data analysis tasks, but are
not supported directly by the MapReduce paradigm. While
there has been progress on equi-joins, implementation of join
algorithms in MapReduce in general is not sufficiently un-
derstood. We study the problem of how to map arbitrary
join conditions to Map and Reduce functions, i.e., a parallel
infrastructure that controls data flow based on key-equality
only. Our proposed join model simplifies creation of and
reasoning about joins in MapReduce. Using this model, we
derive a surprisingly simple randomized algorithm, called 1-

Syndicate content