Map-reduce-merge: simplified relational data processing on large clusters

Yang, Hung-chih; Dasdan, Ali; Hsiao, Ruey-Lung; Parker, D. Stott

Map-Reduce is a programming model that enables easy de-
velopment of scalable parallel applications to process vast
amounts of data on large clusters of commodity machines.
Through a simple interface with two functions, map and re-
duce, this model facilitates parallel implementation of many
real-world tasks such as data processing for search engines
and machine learning.
However, this model does not directly support processing
multiple related heterogeneous datasets. While processing
relational data is a common need, this limitation causes dif-

Syndicate content