Comet: Batched Stream Processing in Data Intensive Distributed Computing

He, B; Yang, M; Guo, Z; Chen, R; Su, B; Lin, W; Zhou, L

Performance and resource optimization is an important
research problem in data intensive distributed comput-
ing. We present a new batched stream processing model
that captures query correlations to expose I/O and com-
putation redundancies for optimizations. The model is
inspired by our empirical study on a trace from a pro-
duction large-scale data processing cluster, which reveals
significant redundancies caused by strong temporal and
spatial correlations among queries.
We have developed Comet, a query processing
system that embraces the batched stream processing


Dataflow Processing and Optimization on Grid and Cloud Infrastructures

Tsangaris, M.; Kakaletris, G.; Kllapi, H.; Papanikos, G.; Pentaris, F.; Polydoras, P.; Sitaridi, E.; Stoumpos, V.; Ioannidis, Y.

Complex on-demand data retrieval and processing is a characteristic of several applications and com-
bines the notions of querying & search, information filtering & retrieval, data transformation & analysis,
and other data manipulations. Such rich tasks are typically represented by data processing graphs, hav-
ing arbitrary data operators as nodes and their producer-consumer interactions as edges. Optimizing
and executing such graphs on top of distributed architectures is critical for the success of the corre-

Syndicate content