Adaptively parallelizing distributed range queries

Vigfusson, Ymir; Silberstein, Adam; Cooper, Brian F.; Fonseca, Rodrigo
Vigfusson, Y
Fonseca, R
Silberstein, A
Cooper, B

We consider the problem of how to best parallelize range
queries in a massive scale distributed database. In tradi-
tional systems the focus has been on maximizing paral-
lelism, for example by laying out data to achieve the highest
throughput. However, in a massive scale database such as
our PNUTS system [11] or BigTable [10], maximizing par-
allelism is not necessarily the best strategy: the system has
more than enough servers to saturate a single client by re-
turning results faster than the client can consume them, and
when there are multiple concurrent queries, maximizing par-
allelism for all of them will cause disk contention, reducing
everybody’s performance. How can we find the right par-
allelism level for each query in order to achieve high, con-
sistent throughput for all queries? We propose an adaptive
approach with two aspects. First, we adaptively determine
the ideal parallelism for a single query execution, which is
the minimum number of parallel scanning servers needed to
satisfy the client, depending on query selectivity, client load,
client-server bandwidth, and so on. Second, we adaptively
schedule which servers will be assigned to different query
executions, to minimize disk contention on servers and en-
sure that all queries receive good performance. Our sched-
uler can be tuned based on different policies, such as favoring
short versus long queries or high versus low priority queries.
An experimental study demonstrates the effectiveness of our
techniques in the PNUTS system.

VLDB 2009
Citations range: 
Vigfusson2009Adaptivelyparallelizingdistributedrangequeries.pdf227.15 KB