programming language

Keyword search

Guided search

Click a term to initiate a search.

Interpreting the data: Parallel analysis with Sawzall

Tue, 10/12/2010 - 11:58 — admin

Authors:

Pike, R; Dorward, S; Griesemer, R; Quinlan, S

Very large data sets often have a ﬂat but regular structure and span multiple disks and
machines. Examples include telephone call records, network logs, and web document reposi-
tories. These large data sets are not amenable to study using traditional database techniques, if
only because they can be too large to ﬁt in a single relational database. On the other hand, many
of the analyses done on them can be expressed using simple, easily distributed computations:
ﬁltering, aggregation, extraction of statistics, and so on.

Year:

2005

Pig Latin: A not-so-foreign language for data processing

Fri, 10/16/2009 - 13:30 — admin

Authors:

C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins

There is a growing need for ad-hoc analysis of extremely
large data sets, especially at internet companies where inno-
vation critically depends on being able to analyze terabytes
of data collected every day. Parallel database products, e.g.,
Teradata, oﬀer a solution, but are usually prohibitively ex-
pensive at this scale. Besides, many of the people who ana-
lyze this data are entrenched procedural programmers, who
ﬁnd the declarative, SQL style to be unnatural. The success
of the more procedural map-reduce programming model, and

Year:

2008

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range