Block-based Load Balancing for Entity Resolution with MapReduce

Authors: 
Kolb, L; Thor, A; Rahm, E
Author: 
Kolb, L
Thor, A
Rahm, E

The effectiveness and scalability of MapReduce-based im-
plementations of complex data-intensive tasks depend on an
even redistribution of data between map and reduce tasks.
In the presence of skewed data, sophisticated redistribution
approaches thus become necessary to achieve load balanc-
ing among all reduce tasks to be executed in parallel. For
the complex problem of entity resolution with blocking, we
propose BlockSplit, a load balancing approach that supports
blocking techniques to reduce the search space of entity res-
olution. The evaluation on a real cloud infrastructure shows
the value and effectiveness of the proposed approach.

Year: 
2011
Venue: 
CIKM 2011
URL: 
http://dbs.uni-leipzig.de/de/publication/block_based_lb_for_er_with_mr
Citations: 
0
Citations range: 
n/a
AttachmentSize
cikm_poster_paper.pdf674.79 KB