cs.yale.edu

The Case for Determinism in Database Systems

Authors: 
Thomson, Alexander; Abadi, Daniel J.

Replication is a widely used method for achieving high availability in database systems. Due to the nondeterminism inherent in traditional concurrency control schemes, however, special care must be taken to ensure that replicas don’t
diverge. Log shipping, eager commit protocols, and lazy synchronization protocols are well-understood methods for
safely replicating databases, but each comes with its own cost in availability, performance, or consistency.
In this paper, we propose a distributed database system which combines a simple deadlock avoidance technique with

Year: 
2010

Column-Stores vs. Row-Stores: How different are they really?

Authors: 
Abadi, Daniel J.; Madden, Samuel R.; Hachem, Nabil

There has been a significant amount of excitement and recent work
on column-oriented database systems (“column-stores”). These
database systems have been shown to perform more than an or-
der of magnitude better than traditional row-oriented database sys-
tems (“row-stores”) on analytical workloads such as those found in
data warehouses, decision support, and business intelligence appli-
cations. The elevator pitch behind this performance difference is
straightforward: column-stores are more I/O efficient for read-only
queries since they only have to read from disk (or from memory)

Year: 
2008

A comparison of approaches to large-scale data analysis

Authors: 
Pavlo, Andrew; Paulson, Erik; Rasin, Alexander; Abadi, Daniel J.; DeWitt, David J.; Madden, Samuel; Stonebraker, Michael

There is currently considerable enthusiasm around the MapReduce
(MR) paradigm for large-scale data analysis [17]. Although the
basic control flow of this framework has existed in parallel SQL
database management systems (DBMS) for over 20 years, some
have called MR a dramatically new computing model [8, 17]. In
this paper, we describe and compare both paradigms. Furthermore,
we evaluate both kinds of systems in terms of performance and de-
velopment complexity. To this end, we define a benchmark con-
sisting of a collection of tasks that we have run on an open source

Year: 
2009

Data management in the cloud: Limitations and opportunities

Authors: 
Abadi, DJ

Recently the cloud computing paradigm has been receiving significant excitement and attention in the
media and blogosphere. To some, cloud computing seems to be little more than a marketing umbrella,
encompassing topics such as distributed computing, grid computing, utility computing, and software-
as-a-service, that have already received significant research focus and commercial implementation.
Nonetheless, there exist an increasing number of large companies that are offering cloud computing

Year: 
2009
Syndicate content