Search: mapreduce, Data Integration, MapReduce

9 results

Results

Multi-pass sorted neighborhood blocking with MapReduce

... challenges and possible solu- tions of using the MapReduce programming model for par- allel entity resolution using Sorting ... blocking (SN). We propose and evaluate two efficient MapReduce- based implementations for single- and multi-pass SN that either ...

Publication - kolb - 11/09/2023 - 21:05 - 1 attachment

Parallel Sorted Neighborhood Blocking with MapReduce

... challenges and possi- ble solutions of using the MapReduce programming model for parallel entity resolu- tion. In particular, we propose and evaluate two MapReduce-based implementations for Sorted Neighborhood blocking that either ...

Publication - kolb - 11/16/2023 - 15:49 - 1 attachment

Fuzzy Joins Using MapReduce

... a similarity threshold. The computation model is a single MapReduce job. Because we allow only one MapReduce round, the Reduce function must be designed so a given output pair is ...

Publication - admin - 11/09/2023 - 23:16 - 1 attachment

Learning-based Entity Resolution with MapReduce

... can be realized in a cloud infras- tructure using MapReduce. We propose and evaluate two efficient MapReduce-based strategies for pair-wise similar- ity computation and ...

Publication - kolb - 11/09/2023 - 21:05 - 1 attachment

Efficient Parallel Set-Similarity Joins Using MapReduce

... set-simi- larity joins in parallel using the popular MapReduce frame- work. We propose a 3-stage approach for end-to-end set- ... (Data Integration, Entity Resolution, Hadoop, ics.uci.edu, MapReduce) ...

Publication - kolb - 11/09/2023 - 23:27 - 1 attachment

Block-based Load Balancing for Entity Resolution with MapReduce

... The effectiveness and scalability of MapReduce-based im- plementations of complex data-intensive tasks depend on ... (Data Integration, Entity Resolution, load-balancing, MapReduce, Object Matching, Parallel Data Processing) ...

Publication - kolb - 11/10/2023 - 02:16 - 1 attachment

Load Balancing for MapReduce-based Entity Resolution

... The effectiveness and scalability of MapReduce-based implementations of complex data-intensive tasks depend on an ... search space of entity resolution, utilize a preprocessing MapReduce job to analyze the data distribution, and distribute the entities of ...

Publication - admin - 11/16/2023 - 15:38 - 1 attachment

MapDupReducer: Detecting Near Duplicates over Massive Datasets

... show the design and implemen- tation of MapDupReducer, a MapReduce based system ca- pable of detecting near duplicates over massive ... 338.49 KB (Data Integration, Hadoop, MapReduce, PPJoin+) ...

Publication - admin - 11/10/2023 - 00:16 - 1 attachment

Dedoop: efficient deduplication with Hadoop

... tool called Dedoop (Deduplication with Hadoop) for MapReduce-based entity resolution (ER) of large datasets. Dedoop supports a ... Specified workflows are automatically translated into MapReduce jobs for parallel execution on different Hadoop clusters. To achieve ...

Publication - cat - 11/09/2023 - 23:05 - 0 attachments