MRShare: Sharing Across Multiple Queries in MapReduce

Nykiel, T; Potamias, M; Mishra, C; Kollios, G; N, Koudas

Large-scale data analysis lies in the core of modern enter-
prises and scientific research. With the emergence of cloud
computing, the use of an analytical query processing in-
frastructure (e.g., Amazon EC2) can be directly mapped
to monetary value. MapReduce has been a popular frame-
work in the context of cloud computing, designed to serve
long running queries (jobs) which can be processed in batch
mode. Taking into account that different jobs often perform
similar work, there are many opportunities for sharing. In
principle, sharing similar work reduces the overall amount of

Syndicate content