Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing

Battré, D; Ewen, S; Hueske, F; Kao, O; Markl, V; Warneke, D
Warneke, D
Battré, D
Ewen, S
Hueske, F
Kao, O
Markl, V

We present a parallel data processor centered around a
programming model of so called Parallelization Contracts
(PACTs) and the scalable parallel execution engine Nephele
[18]. The PACT programming model is a generalization of
the well-known map/reduce programming model, extending
it with further second-order functions, as well as with Output
Contracts that give guarantees about the behavior of a func-
tion. We describe methods to transform a PACT program
into a data flow for Nephele, which executes its sequential
building blocks in parallel and deals with communication,
synchronization and fault tolerance. Our definition of PACTs
allows to apply several types of optimizations on the data
flow during the transformation.
The system as a whole is designed to be as generic as (and
compatible to) map/reduce systems, while overcoming several
of their major weaknesses: 1) The functions map and reduce
alone are not sufficient to express many data processing
tasks both naturally and efficiently. 2) Map/reduce ties a
program to a single fixed execution strategy, which is robust
but highly suboptimal for many tasks. 3) Map/reduce makes
no assumptions about the behavior of the functions. Hence,
it offers only very limited optimization opportunities. With
a set of examples and experiments, we illustrate how our
system is able to naturally represent and efficiently execute
several tasks that do not fit the map/reduce model well.

Citations range: 
Warneke2010NephelePACTsAProgrammingModelandExecutionFrameworkfor.pdf250.08 KB