incremental MapReduce

Ad-hoc data processing in the cloud

Authors: 
Logothetis, Dionysios; Yocum, Kenneth

Ad-hoc data processing has proven to be a critical paradigm
for Internet companies processing large volumes of unstruc-
tured data. However, the emergence of cloud-based com-
puting, where storage and CPU are outsourced to multi-
ple third-parties across the globe, implies large collections
of highly distributed and continuously evolving data. Our
demonstration combines the power and simplicity of the
MapReduce abstraction with a wide-scale distributed stream
processor, Mortar. While our incremental MapReduce op-
erators avoid data re-processing, the stream processor man-

Year: 
2008
Syndicate content