Workflow Engine

Nova: Continuous Pig/Hadoop Workflows

Olston, Christopher; Chiou, Greg; Chitnis, Laukik; Liu, Francis; Han, Yiping; Larsson, Mattias; Neumann, Andreas; Rao, Vellanki B. N.; Sankarasubramanian, Vijayanand; Rao, Vellanki B. N.; Siddharth, Seth; Tian, Chao; ZiCornell, Topher; Wang, Xiaodan

This paper describes a workflow manager developed and
deployed at Yahoo called Nova, which pushes continually-
arriving data through graphs of Pig programs executing on
Hadoop clusters. (Pig is a structured dataflow language and
runtime for the Hadoop map-reduce system.)
Nova is like data stream managers in its support for
stateful incremental processing, but unlike them in that it
deals with data in large batches using disk-based processing.
Batched incremental processing is a good fit for a large frac-
tion of Yahoo’s data processing use-cases, which deal with

