Pig Latin: A not-so-foreign language for data processing

Keyword search

Guided search

Click a term to initiate a search.

Pig Latin: A not-so-foreign language for data processing

Fri, 10/16/2009 - 13:30 — admin

Authors:

C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins

Author:

Olston, C

Reed, B

Srivastava, U

R Kumar, A ..

There is a growing need for ad-hoc analysis of extremely
large data sets, especially at internet companies where inno-
vation critically depends on being able to analyze terabytes
of data collected every day. Parallel database products, e.g.,
Teradata, oﬀer a solution, but are usually prohibitively ex-
pensive at this scale. Besides, many of the people who ana-
lyze this data are entrenched procedural programmers, who
ﬁnd the declarative, SQL style to be unnatural. The success
of the more procedural map-reduce programming model, and
its associated scalable implementations on commodity hard-
ware, is evidence of the above. However, the map-reduce
paradigm is too low-level and rigid, and leads to a great deal
of custom user code that is hard to maintain, and reuse.

We describe a new language called Pig Latin that we have
designed to ﬁt in a sweet spot between the declarative style
of SQL, and the low-level, procedural style of map-reduce.
The accompanying system, Pig, is fully implemented, and
compiles Pig Latin into physical plans that are executed
over Hadoop, an open-source, map-reduce implementation.
We give a few examples of how engineers at Yahoo! are using
Pig to dramatically reduce the time required for the develop-
ment and execution of their data analysis tasks, compared to
using Hadoop directly. We also report on a novel debugging
environment that comes integrated with Pig, that can lead
to even higher productivity gains. Pig is an open-source,
Apache-incubator project, and available for general use.

Year:

2008

Venue:

SIGMOD 2008

URL:

http://research.yahoo.com/files/sigmod08.pdf

Citations:

Citations range:

n/a

Attachment	Size
Olston2008PigLatinAnotsoforeignlanguagefordataprocessing.pdf	585.65 KB

websearch

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range

Pig Latin: A not-so-foreign language for data processing

Navigation

Related categories

User login