Parallel Sorted Neighborhood Blocking with MapReduce

Authors: 
Kolb, L; Thor, A; Rahm, E
Author: 
Kolb, L
Thor, A
Rahm, E

Cloud infrastructures enable the efficient parallel execution of data-intensive
tasks such as entity resolution on large datasets. We investigate challenges and possi-
ble solutions of using the MapReduce programming model for parallel entity resolu-
tion. In particular, we propose and evaluate two MapReduce-based implementations
for Sorted Neighborhood blocking that either use multiple MapReduce jobs or apply
a tailored data replication.

Year: 
2011
Venue: 
BTW 2011
URL: 
http://dbs.uni-leipzig.de/de/publication/title/parallel_sorted_neighborhood_blocking_with_mapreduce
Citations: 
0
Citations range: 
n/a
AttachmentSize
Kolb2011ParallelSortedNeighborhoodBlockingwithMapReduce.pdf533.25 KB