CS 542: Distributed Database Systems, Spring 2022

Home Reading List Slides Homeworks Project Weekly Readings

Main Projects

  1. Centralized Two Phase Locking
  2. Local vs Global Transactions in Optimistic CC
  3. Replicated Copy Control: Correct Read-From Precedence
  4. Adaptable CC
  5. Semantics
  6. Three Phase Commit Protocol
  7. Site Failure in Partial Replicated Database
  8. Network Partitions (Two Partitions)
  9. Fail-Locks and Database Recovery
  10. Peer-to-Peer Multimedia Distribution
  11. Verifying Data Integrity in Peer-to-Peer Video Streaming

More Projects

Projects in Security, Privacy and Wireless Sensor Networks management are also available:
  1. Active Bundle for Private Data Dissemination.
  2. Service Management in SOA .
  3. Adaptable Security in Mobile Cloud .
  4. Centralized packet management for wireless sensor networks .

Project Description

A detailed description for each project is as follows.  

1. Centralized Two Phase Locking (2PL)

You will implement the centralized 2PL (Global 2PL) approach to concurrency control and be able to detect/resolve deadlocks.

A minimum of four sites is required.
This is an individual project.

 

2. Local vs Global Transactions in Optimistic CC

More details are available on: You will implement an optimistic approach to concurrency control.

One of the problems you will have to figure out is how to keep the list of committed transactions against which a global transaction must validate fixed at each site without delaying the local transaction.

A minimum of four sites is required.

 

3. Replicated Copy Control: Correct Read-From Precedence

Transactions are validated and committed in the concurrency control server, while the database on different sites is posted by the I/O server (or access manager). There is an interval between the time of commit of a transaction Ti and the time of update on the database. During this interval, a new transaction can read an old value. For concurrency control, the new transaction is serialized after the commitment of Ti, while for accessing the database it is before Ti. The research and implementation question is:

  1. How to eliminate this interval?, and
  2. How to establish the correct read-from relation for each pair of transactions for all servers on a site?

For this project, you must:

A minimum of two sites is required.
This is an individual project.

 

4. Adaptable CC

More details are available on: You will implement an adaptable concurrency control, which will switch either protocol (optimistic or pessimistic) or control (distributed or centralized):

If interested, first read the above papers. Then, discuss with the TA your ideas and the scope of the project.
Group project is possible.

 

5. Semantics

More details are available on:

The objective of this project is to identify semantics useful for concurrency control and/or recovery. The semantics will be provided by the user as a part of each transaction (e.g., breakpoints suggested by Lynch, types of locks, commutativity of operation, user defined dependencies among transactions, types of objects and operations, etc.).

The concurrency control protocol will use these semantics to serialize transactions and eliminate partial executions. You will need to implement a concurrency control method and change the transaction generator to add the semantics.

If interested, first read the above paper. Then, discuss with the TA your ideas and the scope of the project.
This is an individual project.

 

6. Three Phase Commit Protocol

More details are available in the paper: The objective of this project is to implement a basic Three Phase Commit Protocol:

A minimum of four sites is required.
This is an individual project.

 

7. Site Failure in Partially Replicated Database

More details are available in the following:

The objective of this project is to implement the protocol to read one copy, write all available copies.

Note that reading or writing of database will be simulated. The data structures will be updated in reality.

A minimum of three sites is required.
This is an individual project.

 

8. Network Partitions (Two Partitions)

More details are available in the following:

Several protocols are given in Chapter 1 of Prof. Bhargava's book to deal with this problem. You would need to implement an optimistic or a pessimistic approach. For the pessimistic approach, you may read the technical report "Maintaining Mutual Consistency of Replicated Files in Partitioned Distributed Database System", which is available from Prof. Bhargava.

The objective of this project is to implement the needed concurrency control components to deal with transaction processing during network partitions..

A minimum of three sites.
Simulate two partitions.
No need to do any actual reading or writing of files for any transaction.

 

9. Fail Locks and Database Recovery (Multiple Failures)

See notes for Site Failure in Partially Replicated Database above.

Instead of recovering the site, you will concentrate on ensuring that the database is fully recovered at the failed site. The operational sites will maintain fail-locks (for each failed site) on copies that are updated. The recovering site will collect fail-lock table from each site and mark the corresponding data-items as unavailable. It will allow the processing of transactions on other data items. If any site is down, you will maintain the fail-lock table on the recovered site if any other site is down.

The unavailable data-items will become available by one of the two approaches:

You may be able to measure by simulating the time it takes to make all unavailable copies up-to-date.

A minimum of three sites.

 

10. Peer-to-Peer Multimedia Distribution

Most of the contents distributed over the current P2P systems such as Kazaa and Gnutella are multimedia files, e.g., music files and videos. Nonetheless, current systems distribute these multimedia files using bulk-file download, which does not consider the time inter-dependency among different segments of a multimedia file. This means that a client has to download the entire file before starting playing it out. Consider, for example, a one-hour movie recorded at 1 Mb/s, and being downloaded by a client with an in-bound bandwidth of 1.5 Mb/s. Ignoring all protocols overhead and retransmissions, the client will have to wait for 40 minutes to start watching the movie!

In this project, you will develop a real-time media streaming system on top of a P2P lookup service. You can use any lookup service you want. Examples include, JXTA (jxta.org), Gnutella, and FreePastry. The real-time streaming system will start playing out the requested movie after a short (e.g., order of seconds) waiting period. For more information, please refer to the following papers:

You should do the following:

A minimum of four sites (a client and at three senders) are required.
If interested, first read the above papers. Then, discuss with the TA your ideas and the scope of the project.
Group Project is possible depending on the scope of the project -- for example, P2P streaming with data integrity.

 

11. Verifying Data Integrity in Peer-to-Peer Video Streaming

In P2P video streaming, multiple senders serve one client. Each sender provides different segments of the video. The segments are assembled and ordered by the client. A major concern in the P2P environment is whether a peer provides genuine data. If a peer provides corrupt data, it severely impacts the quality of the streaming session. In this project, you will implement distributed data verification protocols in P2P environment. The verification protocol has to run in real-time, i.e., during the streaming session. Two data verification protocols are proposed in the following paper:

You should do the following:

A minimum of four sites (a client and at three senders) are required.
If interested, first read the above paper. Then, discuss with the TA your ideas and the scope of the project.
Group Project is possible depending on the scope of the project -- for example, P2P streaming with data integrity.

 

12. Active Bundle for Private Data Dissemination

The aim of this project is to ensure secure data dissemination in a distributed environment. Data owners specify policies for data sharing. These policies need to be enforced before data is shared with different hosts (including unknown or untrusted hosts). The goal is to implement a policy-based controlled data dissemination mechanism that is able to selectively disseminate data based on host credentials and owner policies. This will enable data privacy throughout its lifecycle (i.e. from data owner to the final destination traversing intermediate nodes). One way to achieve this is using the Active Bundle mechanism.

References:  

13. Service Management in SOA

Modern Web-scale solutions (e.g. Netflix, Amazon, Airbnb) are based on Micro-service Architecture. These solutions have elastically auto-scaled capacity to handle changes in demand and ensure the quality of service under the peak. Elastic scalability creates a highly dynamic environment where services are automatically provisioned and unprovisioned. The newly provisioned services need to be registered so that existing services in the system can discover them. Services need to be informed of any unprovisioned services. The health of all services in the system need to be tracked. This project is about building a Directory service that provides a registry for locating different services and the number of instances available for each service in the system. Due to the elastic nature, services need to be automatically registered and removed. The Directory service keeps track of registered services and automatically removes unhealthy services from its registry.

References:  

14. Adaptable Security in Mobile Cloud

The goal of this project is to advance real-time mobile-cloud computing of data and/or computation intensive applications by providing a secure adaptability framework that achieves maximum performance with minimum cost (communication and power) despite varying network and cloud conditions. We aim to develop a secure mobile-cloud computing framework that partitions a mobile application for distributed execution on the mobile and cloud platforms, where the decision regarding the execution location for each application partition is made dynamically.

 

15. Centralized packet management for wireless sensor networks

In wireless sensor networks (WSN) packets can be lost due to several reasons: node failures; external interference and noise, leading to link quality degradation; receiver's buffer overflow etc. It is important to detect the packet loss. The goal of this project is to implement a centralized algorithm to detect whether a packet was received by a sink node and to detect link quality between nodes in the wireless sensor network. Also, consider distributed implementation for the packet loss analysis, maintaining the databases with necessary service data on different nodes.

The algorithm should be implemented in a network simulator "TOSSIM" under TinyOS. The implementation language is "nesC", which is a C-dialect used in sensor networks. To calculate link quality between nodes in WSN the Collection Tree Protocol (CTP) can be employed. You may find useful links for TinyOS and TOSSIM tutorials in the references.

References: