Towards a scalable enterprise content analytics platform

Keyword search

Guided search

Click a term to initiate a search.

Towards a scalable enterprise content analytics platform

Fri, 10/16/2009 - 14:42 — admin

Authors:

Beyer, Kevin; Ercegovac, Vuk; Krishnamurthy, Rajasekar; Raghavan, Sriram; Rao, Jun; Reiss, Frederick; Shekita, Eugene J.; Simmen, David; Tata, Sandeep; Vaithyanathan, Shivakumar; Zhu, Huaiyu

Author:

Simmen, D

Tata, S

Vaithyanathan, S

Zhu, H

Shekita, E

Reiss, F

Beyer, K

Raghavan, S

Rao, J

Ercegovac, V

Krishnamurthy, R

With the tremendous growth in the volume of semi-structured and unstructured content within enterprises
(e.g., email archives, customer support databases, etc.), there is increasing interest in harnessing this
content to power search and business intelligence applications. Traditional enterprise infrastruture
or analytics is geared towards analytics on structured data (in support of OLAP-driven reporting and
analysis) and is not designed to meet the demands of large-scale compute-intensive analytics over semi-
structured content. At the IBM Almaden Research Center, we are developing an “enterprise content
analytics platform” that leverages the Hadoop map-reduce framework to support this emerging class of
analytic workloads. Two core components of this platform are SystemT, a high-performance rule-based
information extraction engine, and Jaql, a declarative language for expressing transformations over
semi-structured data. In this paper, we present our overall vision of the platform, describe how SystemT
and Jaql ﬁt into this vision, and brieﬂy describe some of the other components that are under active
development.

Year:

2009

Venue:

IEEE Data Engineering 2009

URL:

http://sites.computer.org/debull/A09mar/sandeep.pdf

Citations:

Citations range:

n/a

Attachment	Size
Simmen2009Towardsascalableenterprisecontentanalyticsplatform.pdf	86.74 KB

websearch

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range

Towards a scalable enterprise content analytics platform

Navigation

Related categories

User login