Practical Assured Big Data Analysis
The "pay-as-you-go" cloud model has strong potential for efficiently supporting big data analysis. Due to privacy and security concerns, government and enterprise institutions are however reluctant to moving data and corresponding computations to the cloud. This reluctance can be justified because the current state of affairs requires absolute trust in the cloud infrastructure provider's intentions, and in the face of multi-tenancy, also in the security defense mechanisms deployed. We address all three challenges of information security - Confidentiality, Integrity and Availability in this project.
We are developing a system named Crypsis that allows execution of data analysis jobs directly on encrypted data, thus eliminating the need to trust the cloud provider. Crypsis transforms arbitrary data analysis scripts written in Pig Latin so that they can be executed over encrypted data. Crypsis avoids fully homomorphic encryption because of its prohibitive cost and instead employs existing practical partially homomorphic encryption (PHE) schemes; Crypsis adopts a broad perspective in that, depending on availability of resources on the client side, it can perform partial computations there when PHE alone would fail. We also present a set of program transformations focused on reducing the cost of data analysis computations in this larger picture.
Integrity and Availability
Byzantine fault tolerant (BFT) replication has many benefits to enforce integrity and availability in clouds. Existing BFT systems, however, are not suited for typical data-flow processing cloud applications which analyze large amounts of data in a parallelizable manner. In fact, current limits of data processing directly depend on the amount of data that can be processed per time unit. Towards this, we are building a solution that secures computations being run in the cloud by leveraging BFT replication coupled with fault isolation. Our solution leverages a combination of variable degree clustering, approximated and offline output comparison, smart deployment, and separation of duty, to achieve a parameterized trade off between fault tolerance and overhead in practice.
Source Pig Latin Script
Transformed Pig Latin Script
- Julian Stephen
- Savvas Savvides
- Vinai Sundaram
- Masoud Ardekani
- Patrick Eugster
J. Stephen, S. Savvas, R. Seidel, and P. Eugster
Practical Confidentiality Preserving Big Data Analysis
6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '14).
J. Stephen and P. Eugster
Secure Cloud-based Data Analysis with ClusterBFT
14th ACM/IFIP/USENIX International Middleware Conference (Middleware 2013), December 2013.
Crypsis Technical Reports
The source code for Crypsis is licensed under the terms of the GPL v3
license, unless otherwise noted (such as in the case of third-party
components and libraries):
A copy of the GNU General Public License can be found here .