MapDupReducer: Detecting Near Duplicates over Massive Datasets

Authors: 
Wang, Chaokun; Wang, Jianmin; Lin, Xuemin; Wang, Wei, Wang, Haixun; Li, Hongsong; Tian, Wanpeng; Xu, Jun; Li, Rui
Author: 
Wang, C
Wang, J
Lin, X
Wang, W
Li, H
Tian, W
Xu, J
Li, R

Near duplicate detection benefits many applications, e.g.,
on-line news selection over the Web by keyword search. The
purpose of this demo is to show the design and implemen-
tation of MapDupReducer, a MapReduce based system ca-
pable of detecting near duplicates over massive datasets ef-
ficiently.

Year: 
2010
Venue: 
Sigmod 2010 (Demo)
URL: 
http://portal.acm.org/citation.cfm?id=1807296
Citations: 
0
Citations range: 
n/a
AttachmentSize
SIGMOD10-MRppjoin-Final.pdf338.49 KB