csail.mit.edu

Column-Stores vs. Row-Stores: How different are they really?

Authors: 
Abadi, Daniel J.; Madden, Samuel R.; Hachem, Nabil

There has been a significant amount of excitement and recent work
on column-oriented database systems (“column-stores”). These
database systems have been shown to perform more than an or-
der of magnitude better than traditional row-oriented database sys-
tems (“row-stores”) on analytical workloads such as those found in
data warehouses, decision support, and business intelligence appli-
cations. The elevator pitch behind this performance difference is
straightforward: column-stores are more I/O efficient for read-only
queries since they only have to read from disk (or from memory)

Year: 
2008

Column-stores for wide and sparse data

Authors: 
Abadi, Daniel J.

ABSTRACT
While it is generally accepted that data warehouses and
OLAP workloads are excellent applications for column-stores,
this paper speculates that column-stores may well be suited
for additional applications. In particular we observe that
column-stores do not see a performance degradation when
storing extremely wide tables, and column-stores handle sparse
data very well. These two properties lead us to conjecture
that column-stores may be good storage layers for Semantic
Web data, XML data, and data with GEM-style schemas.

Year: 
2007

A comparison of approaches to large-scale data analysis

Authors: 
Pavlo, Andrew; Paulson, Erik; Rasin, Alexander; Abadi, Daniel J.; DeWitt, David J.; Madden, Samuel; Stonebraker, Michael

There is currently considerable enthusiasm around the MapReduce
(MR) paradigm for large-scale data analysis [17]. Although the
basic control flow of this framework has existed in parallel SQL
database management systems (DBMS) for over 20 years, some
have called MR a dramatically new computing model [8, 17]. In
this paper, we describe and compare both paradigms. Furthermore,
we evaluate both kinds of systems in terms of performance and de-
velopment complexity. To this end, we define a benchmark con-
sisting of a collection of tasks that we have run on an open source

Year: 
2009
Syndicate content