Runtime Measurements in the Cloud: Observing, Analyzing, and Reducing Variance

Schad, J; Dittrich, J; Quiané-Ruiz, JA
Schad, J
Dittrich, J
Quiané-Ruiz, J

One of the main reasons why cloud computing has gained
so much popularity is due to its ease of use and its ability
to scale computing resources on demand. As a result, users
can now rent computing nodes on large commercial clusters
through several vendors, such as Amazon and rackspace.
However, despite the attention paid by Cloud providers,
performance unpredictability is a major issue in Cloud com-
puting for (1) database researchers performing wall clock ex-
periments, and (2) database applications providing service-
level agreements. In this paper, we carry out a study of the
performance variance of the most widely used Cloud infras-
tructure (Amazon EC2) from different perspectives. We use
established microbenchmarks to measure performance vari-
ance in CPU, I/O, and network. And, we use a multi-node
MapReduce application to quantify the impact on real data-
intensive applications. We collected data for an entire month
and compare it with the results obtained on a local cluster.
Our results show that EC2 performance varies a lot and
often falls into two bands having a large performance gap
in-between — which is somewhat surprising. We observe in
our experiments that these two bands correspond to the dif-
ferent virtual system types provided by Amazon. Moreover,
we analyze results considering different availability zones,
points in time, and locations. This analysis indicates that,
among others, the choice of availability zone also influences
the performance variability. A major conclusion of our work
is that the variance on EC2 is currently so high that wall
clock experiments may only be performed with considerable
care. To this end, we provide some hints to users.

Proceedings of the ...
Citations range: 
Schad2010RuntimeMeasurementsintheCloudObservingAnalyzingand.pdf1.28 MB