The Case for Determinism in Database Systems

Thomson, Alexander; Abadi, Daniel J.

Replication is a widely used method for achieving high availability in database systems. Due to the nondeterminism inherent in traditional concurrency control schemes, however, special care must be taken to ensure that replicas don’t
diverge. Log shipping, eager commit protocols, and lazy synchronization protocols are well-understood methods for
safely replicating databases, but each comes with its own cost in availability, performance, or consistency.
In this paper, we propose a distributed database system which combines a simple deadlock avoidance technique with


Column-Stores vs. Row-Stores: How different are they really?

Abadi, Daniel J.; Madden, Samuel R.; Hachem, Nabil

There has been a significant amount of excitement and recent work
on column-oriented database systems (“column-stores”). These
database systems have been shown to perform more than an or-
der of magnitude better than traditional row-oriented database sys-
tems (“row-stores”) on analytical workloads such as those found in
data warehouses, decision support, and business intelligence appli-
cations. The elevator pitch behind this performance difference is
straightforward: column-stores are more I/O efficient for read-only
queries since they only have to read from disk (or from memory)


A comparison of approaches to large-scale data analysis

Pavlo, Andrew; Paulson, Erik; Rasin, Alexander; Abadi, Daniel J.; DeWitt, David J.; Madden, Samuel; Stonebraker, Michael

There is currently considerable enthusiasm around the MapReduce
(MR) paradigm for large-scale data analysis [17]. Although the
basic control flow of this framework has existed in parallel SQL
database management systems (DBMS) for over 20 years, some
have called MR a dramatically new computing model [8, 17]. In
this paper, we describe and compare both paradigms. Furthermore,
we evaluate both kinds of systems in terms of performance and de-
velopment complexity. To this end, we define a benchmark con-
sisting of a collection of tasks that we have run on an open source


Data management in the cloud: Limitations and opportunities

Abadi, DJ

Recently the cloud computing paradigm has been receiving significant excitement and attention in the
media and blogosphere. To some, cloud computing seems to be little more than a marketing umbrella,
encompassing topics such as distributed computing, grid computing, utility computing, and software-
as-a-service, that have already received significant research focus and commercial implementation.
Nonetheless, there exist an increasing number of large companies that are offering cloud computing

Syndicate content