Cloud infrastructures enable the efﬁcient parallel execution of data-intensive
tasks such as entity resolution on large datasets. We investigate challenges and possi-
ble solutions of using the MapReduce programming model for parallel entity resolu-
tion. In particular, we propose and evaluate two MapReduce-based implementations
for Sorted Neighborhood blocking that either use multiple MapReduce jobs or apply
a tailored data replication.