Nova: Continuous Pig/Hadoop Workflows

Authors: 
Olston, Christopher; Chiou, Greg; Chitnis, Laukik; Liu, Francis; Han, Yiping; Larsson, Mattias; Neumann, Andreas; Rao, Vellanki B. N.; Sankarasubramanian, Vijayanand; Rao, Vellanki B. N.; Siddharth, Seth; Tian, Chao; ZiCornell, Topher; Wang, Xiaodan
Author: 
Olston, C
Chiou, G
Chitnis, L
Liu, F
Han, Y
Larsson, M
Neumann, A
Rao, V
Sankarasubramanian, V
Rao, V
Siddharth, S
Tian, C
ZiCornell, T
Wang, X

This paper describes a workflow manager developed and
deployed at Yahoo called Nova, which pushes continually-
arriving data through graphs of Pig programs executing on
Hadoop clusters. (Pig is a structured dataflow language and
runtime for the Hadoop map-reduce system.)
Nova is like data stream managers in its support for
stateful incremental processing, but unlike them in that it
deals with data in large batches using disk-based processing.
Batched incremental processing is a good fit for a large frac-
tion of Yahoo’s data processing use-cases, which deal with
continually-arriving data and benefit from incremental algo-
rithms, but do not require ultra-low-latency processing.

Year: 
2011
Venue: 
Sigmod 2011
URL: 
http://infolab.stanford.edu/~olston/publications/sigmod11.pdf
Citations: 
0
Citations range: 
n/a
AttachmentSize
sigmod11.pdf773.13 KB