Implementation Issues of A Cloud Computing Platform

Peng, B; Cui, B; Li, X
Peng, B
Cui, B
Li, X

Cloud computing is Internet based system development in which large scalable computing resources
are provided “as a service” over the Internet to users. The concept of cloud computing incorporates
web infrastructure, software as a service (SaaS), Web 2.0 and other emerging technologies, and has
attracted more and more attention from industry and research community. In this paper, we describe our
experience and lessons learnt in construction of a cloud computing platform. Specifically, we design a
GFS compatible file system with variable chunk size to facilitate massive data processing, and introduce
some implementation enhancement on MapReduce to improve the system throughput. We also discuss
some practical issues for system implementation. In association of the China web archive (Web InfoMall)
which we have been accumulating since 2001 (now it contains over three billion Chinese web pages),
this paper presents our attempt to implement a platform for a domain specific cloud computing service,
with large scale web text mining as targeted application. And hopefully researchers besides our selves
will benefit from the cloud when it is ready.

IEEE Data Engineering 2009
Citations range: