Search: mapreduce, Rahm, E, MapReduce

6 results

Results

Multi-pass sorted neighborhood blocking with MapReduce

... challenges and possible solu- tions of using the MapReduce programming model for par- allel entity resolution using Sorting ... blocking (SN). We propose and evaluate two efficient MapReduce- based implementations for single- and multi-pass SN that either ...

Publication - kolb - 11/09/2023 - 21:05 - 1 attachment

Parallel Sorted Neighborhood Blocking with MapReduce

... challenges and possi- ble solutions of using the MapReduce programming model for parallel entity resolu- tion. In particular, we propose and evaluate two MapReduce-based implementations for Sorted Neighborhood blocking that either ...

Publication - kolb - 11/16/2023 - 15:49 - 1 attachment

Learning-based Entity Resolution with MapReduce

... can be realized in a cloud infras- tructure using MapReduce. We propose and evaluate two efficient MapReduce-based strategies for pair-wise similar- ity computation and ...

Publication - kolb - 11/09/2023 - 21:05 - 1 attachment

Block-based Load Balancing for Entity Resolution with MapReduce

... The effectiveness and scalability of MapReduce-based im- plementations of complex data-intensive tasks depend on ... (Data Integration, Entity Resolution, load-balancing, MapReduce, Object Matching, Parallel Data Processing) ...

Publication - kolb - 11/10/2023 - 02:16 - 1 attachment

Load Balancing for MapReduce-based Entity Resolution

... The effectiveness and scalability of MapReduce-based implementations of complex data-intensive tasks depend on an ... search space of entity resolution, utilize a preprocessing MapReduce job to analyze the data distribution, and distribute the entities of ...

Publication - admin - 11/16/2023 - 15:38 - 1 attachment

Dedoop: efficient deduplication with Hadoop

... tool called Dedoop (Deduplication with Hadoop) for MapReduce-based entity resolution (ER) of large datasets. Dedoop supports a ... Specified workflows are automatically translated into MapReduce jobs for parallel execution on different Hadoop clusters. To achieve ...

Publication - cat - 11/09/2023 - 23:05 - 0 attachments