Seamless Cloud Computing (SeCCo)
Cloud computing undoubtedly represents one of the major paradigm shifts of the past decade. To leverage the new possibilities of this model, we however need to rethink the nature and use of essentially all classic ingredients of computing including networks, operating systems, or programming models while keeping cross-cutting concerns such as reliability, scalability, or privacy and security in mind.
The excitement about the new possibilities of cloud computing
is stifled by the current rigid support for developing and deploying applications in clouds. In particular, existing support is geared towards vanilla setups where everything is colocated in a single datacenter, which contradicts many existing and emerging real-world scenarios. Bringing all data to the ``mothership'' prior to executing an analysis task if tedious at best and in most cases exhibits poor performance.
``Seamless cloud computing'' (SeCCo) aims at smoothening the boundaries to and between computing clouds. We propose a specific middleware
for SCC, consisting in two layers.
Deployment across multiple datacenters and clouds and possibly cloud vendors further exacerbates known security concerns. We are thus furthermore investigating means of assuring (cross-)cloud data processing.
- The backbone of SeCCo consists in a highly scalable overlay network called Atmosphere
which is specifically designed to interconnect multiple data centers, potentially from different cloud vendors. Protocols such as BGP or DNS are used in combination with
specific roundtrip measurement mechanisms to dynamically adapt the overlay graph to changes in topology. To abstract from different technologies
and clouds, our overlay substrate promotes a content-based addressing mechanism which clearly separates logical and physical addresses.
It guides the deployment of more specific protocols for assured communication within individual applications.
We provide several such protocols with varying qualitative and quantitative guarantees on reliability, security, and performance.
- On top of this layer we deploy appropriately modified versions of well-known systems such as Hadoop MapReduce, Hadoop Distributed File System, and Apache Pig. These augmented systems rely on Atmosphere for much of the basic communication, and take geo-distribution into account for performing individual tasks as efficiently as possible. For instance, for a simple sequence of MapReduce jobs performed uniformly on a dataset partitioned across several datacenters, speedups easily reach 2-3x depending on characteristics of the jobs such as associativity.
This project was bootstrapped with seed funds from the
Purdue Research Foundation (PRF), and generously supported by an education research award from Amazon AWS for testing in Amazon EC2.