Practical Assured Big Data Analysis

  • Introduction
  • People
  • Publications
  • Technical Reports
  • Source
  • License
  • Introduction

    The "pay-as-you-go" cloud model has strong potential for efficiently supporting big data analysis. Due to privacy and security concerns, government and enterprise institutions are however reluctant to moving data and corresponding computations to the cloud. This reluctance can be justified because the current state of affairs requires absolute trust in the cloud infrastructure provider's intentions, and in the face of multi-tenancy, also in the security defense mechanisms deployed. We address all three challenges of information security - Confidentiality, Integrity and Availability in this project.


    We are developing a system named Crypsis that allows execution of data analysis jobs directly on encrypted data, thus eliminating the need to trust the cloud provider. Crypsis transforms arbitrary data analysis scripts written in Pig Latin so that they can be executed over encrypted data. Crypsis avoids fully homomorphic encryption because of its prohibitive cost and instead employs existing practical partially homomorphic encryption (PHE) schemes; Crypsis adopts a broad perspective in that, depending on availability of resources on the client side, it can perform partial computations there when PHE alone would fail. We also present a set of program transformations focused on reducing the cost of data analysis computations in this larger picture.

    Integrity and Availability

    Byzantine fault tolerant (BFT) replication has many benefits to enforce integrity and availability in clouds. Existing BFT systems, however, are not suited for typical data-flow processing cloud applications which analyze large amounts of data in a parallelizable manner. In fact, current limits of data processing directly depend on the amount of data that can be processed per time unit. Towards this, we are building a solution that secures computations being run in the cloud by leveraging BFT replication coupled with fault isolation. Our solution leverages a combination of variable degree clustering, approximated and offline output comparison, smart deployment, and separation of duty, to achieve a parameterized trade off between fault tolerance and overhead in practice.

    Sample Transformation

    Source Pig Latin Script

    Transformed Pig Latin Script



  • J. Stephen, S. Savvas, R. Seidel, and P. Eugster
    Practical Confidentiality Preserving Big Data Analysis
    6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud '14).

  • J. Stephen and P. Eugster Secure Cloud-based Data Analysis with ClusterBFT
    14th ACM/IFIP/USENIX International Middleware Conference (Middleware 2013), December 2013.
  • Technical Reports

  • Crypsis Technical Reports
  • License

    The source code for Crypsis is licensed under the terms of the GPL v3 license, unless otherwise noted (such as in the case of third-party components and libraries): A copy of the GNU General Public License can be found here .