Ad-hoc data processing in the cloud

Authors: 
Logothetis, Dionysios; Yocum, Kenneth
Author: 
Logothetis, D
Yocum, K

Ad-hoc data processing has proven to be a critical paradigm
for Internet companies processing large volumes of unstruc-
tured data. However, the emergence of cloud-based com-
puting, where storage and CPU are outsourced to multi-
ple third-parties across the globe, implies large collections
of highly distributed and continuously evolving data. Our
demonstration combines the power and simplicity of the
MapReduce abstraction with a wide-scale distributed stream
processor, Mortar. While our incremental MapReduce op-
erators avoid data re-processing, the stream processor man-
ages the placement and physical data flow of the operators
across the wide area. We demonstrate a distributed web
indexing engine against which users can submit and deploy
continuous MapReduce jobs. A visualization component il-
lustrates both the incremental indexing and index searches
in real time.

Year: 
2008
Venue: 
VLDB 2008
URL: 
http://portal.acm.org/citation.cfm?id=1454159.1454204
Citations: 
0
Citations range: 
n/a
AttachmentSize
Logothetis2008Adhocdataprocessinginthecloud.pdf571.88 KB