Dryad is a general-purpose distributed execution engine for
coarse-grain data-parallel applications. A Dryad applica-
tion combines computational “vertices” with communica-
tion “channels” to form a dataﬂow graph. Dryad runs the
application by executing the vertices of this graph on a set of
available computers, communicating as appropriate through
ﬁles, TCP pipes, and shared-memory FIFOs.
The vertices provided by the application developer are
quite simple and are usually written as sequential programs
with no thread creation or locking. Concurrency arises from
Dryad scheduling vertices to run simultaneously on multi-
ple computers, or on multiple CPU cores within a computer.
The application can discover the size and placement of data
at run time, and modify the graph as the computation pro-
gresses to make eﬃcient use of the available resources.
Dryad is designed to scale from powerful multi-core sin-
gle computers, through small clusters of computers, to data
centers with thousands of computers. The Dryad execution
engine handles all the diﬃcult problems of creating a large
distributed, concurrent application: scheduling the use of
computers and their CPUs, recovering from communication
or computer failures, and transporting data between ver-