CS 542: Distributed Database Systems, Spring 2016

 

 

 

Time: T, Th 10:30-11:45AM

Room: LWSN 1106

Instructor: Prof. Bharat Bhargava (Office: LWSN 2116F, Tel: 494-6013, Email: bbshail@purdue.edu, Office hours: Tuesday 11:45-1:00 PM) Available in Lawson Commons after class.

TA: Tatiana  Kuznetsova.  Office: LWSN B116D, Email: tkuznets@purdue.edu, Office hours: Monday 11:00am-12:00pm, Tuesday 1:30-2:30pm

Mail list: cs542@cs.purdue.edu.

Midterm: March 9 (Wednesday), 08:00 PM, BRNG 2290

Final Exam: May 4 (Wednesday), 08:00 AM - 10:00 AM, LWSN B151

Teaching Material

This course will deal with the fundamental issues in large distributed systems which are motivated by the computer networking and distribution of processors, and control. The theory, design, specification, implementation, and performance large systems will be discussed. Concurrency, Consistency, Integrity, Reliability, Privacy, and Security in distributed systems will be included.

A special feature of the course includes interesting problems in Mobile Ad hoc networks and Cloud Computing that can benefit from research ideas in distributed systems. Research related to Mobile Computing, Streaming databases, Video conferencing, Peer to Peer systems, Cloud computing will be covered.

Textbooks:

Principles of Distributed Database Systems, Prentice Hall, Tamer Oszu and Patrick Valduriez, (Copy on reserve in LWSN reception office book shelf and on the Springer website from the university network) (MAIN TEXT)

Concurrency Control and Reliability in Distributed Systems, Van Nostrand and Reinhold Publishers by Bharat Bhargava (Ed.), 1987 (Copy on reserve in LWSN reception office book shelf)

Transaction Processing: Concepts and Techniques, Morgan Kaufmann, Jim Gray and Andreas Reuter, 1992 (Copy on reserve in LWSN reception office book shelf)

Course Outline:

a)    Architecture, General Systems Issues, Example Systems ....(3 Hrs.)

b)    Distributed control for synchronization and concurrency (will include the models for concurrent processing and transactions, theory of serializability, classes of concurrency control approaches, performance evaluation of these classes, centralized control vs. decentralized control).........(12 Hrs.)

c)    Distributed commitment/termination (involves preservation of atomicity of transaction execution, blocking/non-blocking protocols).........(6 Hrs.)

d)    Resiliency in distributed systems (involves design of protocols for site failure, network partitioning, loss of messages or variable transmission delays, consistent recovery)..........(3-6 hours.)

e)    Security in distributed systems. (involves study of a variety of attacks on the components of system (such as on routing protocols in ad hoc networks), privacy issues in Peer to Peer systems, trusted collaboration and dissemination of data among cooperative entities).........(9-12 hours)

f)     Design and Implementation of Prototype/commercial systems, Experimental Evaluations. Details of peer to peer system developed at Purdue and several commercial systems .....(9 Hrs)

g)    Research Issues in Cloud Computing (see web site under CS 590: http://www.cs.purdue.edu/homes/bb/#teaching)

Assignments and Grading Policy:

The following assignments/exams etc. are planned.

a)    Non programming assignments ....4 (Once every two weeks: 20% of grade)

b)    Programming assignments in the form of a project that you select from a choice of 14-15 projects. You may suggest a project based on your interest. (30% of grade)

c)    Mid Term and Final Exams ....2 (20% of grade each)

d)    Class contributions (e.g., class attendance and participation, discussions, outside reading/presentation of research papers) .... (10% of grade) You should attend all classes for highest class contribution credit.

e)    You can expect a grade of A- if you score 90-93, an A if you score 94 or higher. Same scale for B and C grade. If you do not do any part of the project or home works or score less than 75%, a C grade is possible.

f)     To pass the qualifier, a grade of A- in class and 80% in qualifier questions is expected. Your exam will be reviewed/graded by another examiner and this policy may vary slightly from year to year.

 

Your programming projects can involve implementing some component for integrity, security, reliability, or privacy policy. Or communication facility for LAN, or routing in mobile adhoc networks can be developed.

 

You can conduct experiments on ns2, or peer to peer system prototype, JAVA RMI (I suggest using java and java RMI for communications. If you haven't used RMI, the Hello World tutorial at http://java.sun.com is a good starting point.

Using piazza.com

We will use piazza.com for the class. I encourage you to register there if you are not registered yet. Once you are registered, you will be able to find the class CS542 "Distributed Databases" and join it as a student. If you can't find the class then use the following link to sign-up for it:
Join CS542 (Spring'16) in piazza

Late Submission Policy

Penalty for late homework is 10% for each class day after due date. Problems on grading of assignments/exams/projects must be resolved within one week after the graded work is returned/project score is posted. The grades will not be modified after the one week time period.

Absence from Class

You must attend all classes. Absence in five or more classes will lower your final grade.

Academic Dishonesty

One can not use any part of another person's work in his/her assignments and projects. If an overlap in any submission for grading is detected, an automatic grade of F in course will be assigned to the student and it will be reported to graduate school and deans. You are welcomed to discuss and learn from others. If you use any material in your homework, you have to put a reference to it. If you do a group project, specify individual contributions clearly at the time of submission and let me and TA know in advance of collaborating. Read the following links: Course Policies, Academic Integrity.

Comprehensive Reading list (Read at least one paper)

 

Implementation

      Building Distributed Database Systems, Bharat Bhargava.

      The Raid Distributed Database System, Bharat Bhargava and John Riedl, IEEE Trans on Software Engineering, 15(6), June 1989.

      PROMISE: Peer-to-Peer Media Streaming Using CollectCast, M. Hefeeda, A. HabibB.Botey, D. Xu, and B. Bhargava, in Proc. of ACM Multimedia, 45-54, Berkeley, CA, November 2003.

      Adaptability Experiments in the RAID Distributed Database System , (1990) Bharat  Bhargava, Karl  Friesen, Abdelsalam  Helal, John Riedl, Computer Science Technical Reports. Paper 825.

      Operating System Support for Database Management, Michael  Stonebraker, Communications of the ACM, Volume 24 Issue 7, Pages 412-418, July 1981.

 

Concurrency

      Concurrency Control in Database Systems Bharat Bhargava, IEEE Trans on Knowledge and Data Engineering,11(1), Jan.-Feb. 1999

      A Model for Adaptable Systems for Transaction Processing, Bharat Bhargava and John Riedl, IEEE Transactions on Knowledge and Data Engineering, 1(4), Dec 1989.

      A Causal Model for Analyzing Distributed Concurrency Control Algorithms, B. Bhargava and C. Hua, IEEE Trans. on Software Engineering, SE-9, 470-486, 1983.

      Global scheduling for flexible transactions in heterogeneous distributed database systems, A. Zhang, M. Nodine, and B. Bhargava. IEEE TKDE, 13(3), 2001.

      Concurrency Control in Distributed Database Systems , P. Bernstein, N. Goodman, ACM Computer Survey, 13(2), 1981.

      Concurrency control in a system for distributed databases (SDD-1), P. Bernstein, D. Shipman, J. Rothnie, ACM Transactions on Database Systems, 5(1), 1980.

      The Transaction Concept: Virtues and Limitations , Jim Gray, VLDB, 1981.

      On Optimistic Methods for Concurrency Control , H.T. Kung and John T. Robinson, ACM Trans. Database Systems, 6(2), 1981.

      The serializability of concurrent database updates, C. Papadimitriou, Journal of the ACM (JACM), 26(4), 1979.

      Granularity of Locks and Degrees of Consistency in a Shared Data Base, J. Gray, R. Lorie, G. Putzolu, I. Traiger. Modelling in Data Base Management Systems, G.M. Nijssen (ed). North Holland Publishing Company, 1976.

 

Communication Network

      Communication Facilities for Distributed Transaction Processing Systems, E. Mafla, and B. Bhargava, IEEE Computer, 24(8), 1991.

      A Communication Framework for Digital Libraries , B. Bhargava and M. Annamalai, Multimedia Tools and Applications International Journal, Volume 10, No 2, pp 205-236, April 2000.

      Evolution of a communication system for distributed transaction processing in RAID, B. Bhargava, Y. Zhang, and E. Mafla, Computing Systems Journal, 4(3), 1991.

      Communication Facilities for Distributed Transaction Processing, B. Bhargava.

      WANCE: Wide area network communication emulation systems, Y. Zhang and B. Bhargava, IEEE workshop on Parallel and Distributed Systems, 1993:: Slides

      Bandwidth Measurements for VMs in Cloud, Amit Gupta, Rohit Ranchal:: Slides

 

Security

      Trust-Based Privacy Preservation for Peer-to-peer Media Streaming, Y. Lu, W, Wang, D. Xu, and B. Bhargava, in Proceedings of Secure Knowledge Management (SKM) Amherst, NY, September 2004:: Slides

      Private and Trusted Collaborations, B. Bhargava and L. Lilien, in Proceedings of Secure Knowledge Management (SKM) Amherst, NY, Septemeber 2004.

      A Taxonomy of Security Attacks and Issues in Vehicular Ad-Hoc Networks (VANETs), Parul Tyagi, Deepak Dembla, International Journal of Computer Applications (0975 - 8887) Volume 91 - No.7, April 2014.

      Detecting Service Violation in Internet and Mobile Ad Hoc Networks:: Slides, Abstract

      Trust-based Privacy Preservation for Peer-to-peer Data Sharing:: Slides

      2004 December ICDCIT "Vulnerabilities and Threats in Distributed Systems":: Slides, Extended Slides

      Privacy:: Slides

      Collaborative Attack:: Slides

      Attacks on Networks

 

Reliability

      An Experimental Analysis of Replicated Copy Control During Site Failure and Recovery, B. Bhargava, P. Noll, and D. Sabo. In Proceedings of International Conference on Data Engineering, p.82-91, Feb. 1988.

      Resilient Concurrency Control in Distributed Database Systems, B. Bhargava, IEEE Trans. on Reliability, R-31(5): 437-443, 1984.

      Optimism and Consistency in partitioned distributed database systems, S. B. Davidson, ACM Trans. on Database Systems, 9(3): 456-481, 1984.

      Consistency in Partitioned Networks, S. B. Davidson, H. Garcia-Molina, and D. Skeen, ACM Computer Survey, 17(3): 341-370, 1985.

      Nonblocking commit protocols, D. Skeen, ACM SIGMOD, 1981.

      A Decentralized Termination Protocol, D. Skeen, IEEE Symposium on Reliability in Distributed Software and Database Systems, July, 1981.

      A Formal Model of Crash Recovery in a Distributed System, D. Skeen and M. Stonebraker, IEEE Trans. on Software Engineering, 9(3): 219-228, 1983.

      Detection of Mutual Inconsistency in Distributed Systems, Jr. D. Parker, et al., IEEE Trans. on Software Engineering, SE-9, 1983.

      How to Assign Votes in a Distributed System, Hector Garcia-Molina, Daniel Barbara, J. ACM 32(4): 841-860, 1985.

      A History of the Virtual Synchrony Replication Model, Ken Birman. Appears in Replication:Theory and Practice. B. Charron-Bost, F. Pedone, A. Schiper (Eds), Replication, LNCS 5959, pp. 91–120, 2010.

      Reliable communication in the presence of failures, K.Birman, T. Joseph, ACM Transactions on Computer Systems (TOCS), Volume 5 Issue 1, Feb. 1987.

      Transaction Processing and Consistency Control of Replicated Copies during Failures in Distributed Databases, B. Bhargava, Journal of Management Information Systems, Vol. 4 No. 2, Fall 1987.

      Site Recovery in Replicated Distributed Database Systems, B. Bhargava, Computer Science Technical Reports, Report Number: 85-564, 1985.

      Consistency in Partitioned Networks, S. B. Davidson, H. Garcia-Molina, and D. Skeen, ACM Computer Survey, 17(3): 341-370, 1985.

      A Dynamic Majority Determination Algorithm for Reconfiguration of Network Partitions, B. Bhargava and P.L. Ng, Information Sciences 46, pp 27-45, 1987.

      PLANET: Making Progress with Commit Processing in Unpredictable Environments, Gene Pang, Tim Kraska, Michael Franklin, Alan Fekete, SIGMOD 2014.

 

Mobile

      Peer-to-Peer File-sharing over Mobile Ad Hoc Networks, G. Ding and B. Bhargava, in the first International Workshop on Mobile Peer-to-Peer Computing, Orlando, Florida, March 2004.

      Data Consistency in Intermittently Connected Distributed Systems, E. Pitouri and B. Bhargava, IEEE TKDE, 11(6), 1999.

      Maintaining Consistency of Data in Mobile Distributed EnvironmentsEvaggelia Pitoura, Bharat Bhargava, ICDCS, 1995.

      Optimal File Allocation in Multiple Computer System, W. W. Chu, IEEE Transaction on Computers, 885-889, October 1969.

 

Cloud Computing

      http://www.cs.purdue.edu/homes/bb/cs590/

      Hey, You, Get Off of My Cloud

      Amazon's response to "Hey, you, get off my cloud"

      An Entity-centric Approach for Privacy and Identity Management in Cloud Computing
A Mobile-Cloud Collaborative Traffic Lights Detector for Blind Navigation

      Paper- Above the Clouds: A Berkeley View of Cloud Computing
PPT- Above the Clouds: A Berkeley View of Cloud Computing

      NIST Cloud Computing
NIST Cloud Computing Slides

      Chunxiao Li, Anand Raghunathan, Niraj K. Jha, Secure Virtual Machine Execution under an Untrusted Management OS(Slides) (Presenter: Pelin Angin or Prof Raghunathan of ECE)

      Security and Privacy in Cloud Computing http://www.cs.purdue.edu/homes/bb/cloud.html

      Mehdi Azarmi, et. al., End to End security in Service Oriented Architecture (Slides) (Presenter: Mehdi Azarmi)

Homework

Homework assignments are generally due on Thursday in class before the lecture starts. Check individual homework for exact due date.
The submitted homework should be typed. If you can not submit the printed homework in class before the lecture, then send it to TA in pdf format before the due time.

      Homework 1 [Link] Due to Jan 26

      Homework 2 [Link] Due to Feb 11

      Homework 3 [Link] Due to April 5

      Homework 4 [Link] Due to April 26

Projects [Link]

The programming projects may involve using a mini version of the RAID system or the other systems to implement some algorithm or communication facility for transaction processing.
Please, hand in all project plans (proposals) on Jan. 21, 2016.
The hard deadline for Project Proposal is Feb. 11, 2016.
Last Date for Project Submission: April 14, 2016, 11:59 pm.
Demonstration of projects will be scheduled on the next week after submission.

Lecture slides

       IntroductionDDBMSDistributed DB background

       Dist. DB ArchitectureDist. DB Design - FragmentationDist. DB Design - Data Location

       Implementation issues and RAID system

       Dist. Query ProcessingDist. Query Optimization

       Dist. Transaction Management:  Transaction ConceptsACID, Transaction Models

       Concurrency Control IdeasConcurrency Control AlgorithmsConcurrency-Control ResearchFormal Concurrency ControlDeadlocks

       Optimistic CCDistributed & CentralizedPerformance

       Commit/Termination Protocols:  2PC3PCDegrees of CommitmentTermination

       Reliability and Fault-Tolerant MechanismsFailure and Recovery, Logging

       Site Failure and Recovery, Partitioning Reliability PartitionMutual Consistency

       Distributed Version Management

       Mobile Database Systems

       Privacy, Trust and AuthenticationPrivacy-Anonymity

       P2P systems

Book Slides

Please find the slides with the following links.

      Ch1-Introduction

      Ch2-Background

      Ch3-Distribution Design

      Ch4-Data Integration

      Ch5-Semantic

      Ch6-Query Intro

      Ch7-Query Localization

      Ch8-Query Optimization

      Ch9-Query MDB

      Ch10-Transactions

      Ch11-Conc Control

      Ch12-Reliability

      Ch13-Replication

      Ch14-Parallel DBMS

      Ch15-Object DBMS

      Ch16-P2P

      Ch17-Web

      Ch18-Current Issues