Complex on-demand data retrieval and processing is a characteristic of several applications and com-
bines the notions of querying & search, information ﬁltering & retrieval, data transformation & analysis,
and other data manipulations. Such rich tasks are typically represented by data processing graphs, hav-
ing arbitrary data operators as nodes and their producer-consumer interactions as edges. Optimizing
and executing such graphs on top of distributed architectures is critical for the success of the corre-
sponding applications and presents several algorithmic and systemic challenges. This paper describes
a system under development that offers such functionality on top of Ad-hoc Clusters, Grids, or Clouds.
Operators may be user deﬁned, so their algebraic and other properties as well as those of the data they
produce are speciﬁed in associated proﬁles. Optimization is based on these proﬁles, must satisfy a vari-
ety of objectives and constraints, and takes into account the particular characteristics of the underlying
architecture, mapping high-level dataﬂow semantics to ﬂexible runtime structures. The paper highlights
the key components of the system and outlines the major directions of its development.