CS54200, Spring 2022

Main Projects

Centralized Two Phase Locking
Local vs Global Transactions in Optimistic CC
Replicated Copy Control: Correct Read-From Precedence
Adaptable CC
Semantics
Three Phase Commit Protocol
Site Failure in Partial Replicated Database
Network Partitions (Two Partitions)
Fail-Locks and Database Recovery
Peer-to-Peer Multimedia Distribution
Verifying Data Integrity in Peer-to-Peer Video Streaming

More Projects

Projects in Security, Privacy and Wireless Sensor Networks management are also available:

Active Bundle for Private Data Dissemination.
Service Management in SOA .
Adaptable Security in Mobile Cloud .
Centralized packet management for wireless sensor networks .

Project Description

A detailed description for each project is as follows.

1. Centralized Two Phase Locking (2PL)

You will implement the centralized 2PL (Global 2PL) approach to concurrency control and be able to detect/resolve deadlocks.

Consider at least four sites, each site with a copy of the database. Assume a fully replicated database.
Use a non-distibuted database on each site, such as SQLite, PostgreSQL.
Use a centralized control for global decisions with a Central Site. The lock table has to be store at the Central Site.
All transactions arriving at different sites are sent to the Central Site. Queues for requesting and releasing locks will be maintained at the Central Site.
Processing must take place at the site where the transaction is submitted. Locks will be released after waiting on the database at the centralized site. Updates for other sites will be sent and they may arrive in a different order.
You may implement the algorithm of section 11.3.1 of our text book (Ozsu) or a variation of it.
The 2PL implementation must ensure that all updates at all sites are posted in the same order. Furthermore, you must detect/resolve deadlocks. Please refer to section 11.6 in the textbook.

A minimum of four sites is required.
This is an individual project.

2. Local vs Global Transactions in Optimistic CC

More details are available on:

Resilient Concurrency Control in Distributed Database Systems, B. Bhargava, IEEE Trans. on Reliability, R-31(5): 437-443, 1984.

You will implement an optimistic approach to concurrency control.

Maintain the respective data structures, such as:

Dynamic conflict graph.
Set of committed, semi-committed and other active transactions.

You will do local and global validation for each transaction.
You will send to a site, a mix of local transactions (that require an update at only this site) and global transaction (that require an update on all copies).
Consider at least four sites and maintain a list of logical objects and directory of copies as in Site Recovery in Replicated Distributed Database Systems description (Chapter 11 of Prof. Bhargava's book. Its on reserve in Math library).
Assume that no failures occur in the system.

One of the problems you will have to figure out is how to keep the list of committed transactions against which a global transaction must validate fixed at each site without delaying the local transaction.

A minimum of four sites is required.

3. Replicated Copy Control: Correct Read-From Precedence

Transactions are validated and committed in the concurrency control server, while the database on different sites is posted by the I/O server (or access manager). There is an interval between the time of commit of a transaction Ti and the time of update on the database. During this interval, a new transaction can read an old value. For concurrency control, the new transaction is serialized after the commitment of Ti, while for accessing the database it is before Ti. The research and implementation question is:

How to eliminate this interval?, and
How to establish the correct read-from relation for each pair of transactions for all servers on a site?

For this project, you must:

Design a policy (different than total blocking of the new transaction), and implement it.
You will need to have at least two servers for each site. You need to generate at least two sites.
You will need data structures that contain:

transaction id and time stamp of commit,
transaction id and time stamp of reading an object,
transaction id and time stamp of writing an object,
read-from relation for each transaction w.r.t. all other transactions (conflict graph).

You will implement a protocol that will produce the same read-from relation for all transactions on each site.
You may be able to measure the delays caused by your policy.

A minimum of two sites is required.
This is an individual project.

4. Adaptable CC

More details are available on:

A Model for Adaptable Systems for Transaction Processing, Bharat Bhargava and John Riedl, IEEE Transactions on Knowledge and Data Engineering, 1(4), Dec 1989.
System level concurrency control for distributed database systems. D. J. Rosenkrantz, R. Stearns, and P. Lewis II. ACM Transactions on Database Systems (TODS), 3(2), 178-198, Jun 1978.

You will implement an adaptable concurrency control, which will switch either protocol (optimistic or pessimistic) or control (distributed or centralized):

You will set up a data structure to capture the read/write set and time stamps for transactions that arrive on the site. This information will be used by three types of concurrency control:

pessimistic time protocol,
optimistic time stamp protocol, and
locking time stamp protocol.

You will output the serializable history as accepted by each of the above protocols.
In addition, you should be able to switch from one protocol to another. Certain conditions must be satisfied that are given in the technical report and need further study.
For distributed concurrency control, you may want to use the decentralized protocol of Rosenkrantz, Stearns, and Lewis. Refer to the second paper above.
Furthermore, your adaptable concurrency control implementation must be able to detect/handle deadlocks by implementing the following protocols (should be able to switch among them):

would-wait protocol,
wait-die protocol,

If you like, you may choose to adapt from distributed control to centralized control protocol or vice-versa. (This can be done for concurrency control as well as commit protocols).

If interested, first read the above papers. Then, discuss with the TA your ideas and the scope of the project.
Group project is possible.

5. Semantics

More details are available on:

Multilevel atomicity - A new correctness criterion for database concurrency control. Nancy Lynch. ACM Transactions on Database Systems (TODS), 8(4), 484-502, Dec 1983.

The objective of this project is to identify semantics useful for concurrency control and/or recovery. The semantics will be provided by the user as a part of each transaction (e.g., breakpoints suggested by Lynch, types of locks, commutativity of operation, user defined dependencies among transactions, types of objects and operations, etc.).

The concurrency control protocol will use these semantics to serialize transactions and eliminate partial executions. You will need to implement a concurrency control method and change the transaction generator to add the semantics.

If interested, first read the above paper. Then, discuss with the TA your ideas and the scope of the project.
This is an individual project.

6. Three Phase Commit Protocol

More details are available in the paper:

Nonblocking commit protocols, D. Skeen, ACM SIGMOD, 1981.

The objective of this project is to implement a basic Three Phase Commit Protocol:

A minimum of four sites should be generated. Each site will contain data structures:

To maintain the status (commit, abort, wait, etc.) for each transaction as determined by this site.
Global decision of all sites on every transaction.
Up/Down status of sites.

Your program will initiate rounds of messages to terminate or commit a transaction and update the data structures.
After two or three rounds, a decision will be made for the transaction.
You will initiate transactions at each site. You will also simulate failures of different types.
You may simulate database updating using file storage.

A minimum of four sites is required.
This is an individual project.

7. Site Failure in Partially Replicated Database

More details are available in the following:

Site Recovery in Replicated Distributed Database Systems, B. Bhargava, Computer Science Technical Reports, Report Number: 85-564, 1985.

The objective of this project is to implement the protocol to read one copy, write all available copies.

A minimum of three sites should be generated. Each site will contain data structures for:

actual session number,
session vector,
logical objects (at least five) and a directory for replicated copies.

Transactions will arrive at each site. A copy will be read from this site or another site that is up and contains a copy. All copies at up sites will be updated. The up/down status is determined by the session vector.
You will simulate the up/down status of sites. You may assume initially that at least one site is always up. You will use a control transaction to announce the failure. You will recover a site by updating the data structures correctly,and will announce the recovery via another control transaction.
Initially, you do not have to worry about recovering the database.

Note that reading or writing of database will be simulated. The data structures will be updated in reality.

A minimum of three sites is required.
This is an individual project.

8. Network Partitions (Two Partitions)

More details are available in the following:

Concurrency Control and Reliability in Distributed Systems, Van Nostrand and Reinhold Publishers by Bharat Bhargava (Ed.), 1987. Chapter 1 (Copy on reserve in Math Library).

Several protocols are given in Chapter 1 of Prof. Bhargava's book to deal with this problem. You would need to implement an optimistic or a pessimistic approach. For the pessimistic approach, you may read the technical report "Maintaining Mutual Consistency of Replicated Files in Partitioned Distributed Database System", which is available from Prof. Bhargava.

The objective of this project is to implement the needed concurrency control components to deal with transaction processing during network partitions..

You will simulate two partitions.
You will need to make several data structures, such as:

partition graph,
site name vector,
linear order vector,
version number for each object, and
marker vector.

You will generate at least three sites. You will maintain a set of logical files and a directory of physical copies (it could be the same as the site name vector).
To allow your implementation to handled transaction processing, you will need to implement the following procedures:

ISMAJORITY
PARTITION
RESOLVE
MERGE

A minimum of three sites.
Simulate two partitions.
No need to do any actual reading or writing of files for any transaction.

9. Fail Locks and Database Recovery (Multiple Failures)

See notes for Site Failure in Partially Replicated Database above.

Instead of recovering the site, you will concentrate on ensuring that the database is fully recovered at the failed site. The operational sites will maintain fail-locks (for each failed site) on copies that are updated. The recovering site will collect fail-lock table from each site and mark the corresponding data-items as unavailable. It will allow the processing of transactions on other data items. If any site is down, you will maintain the fail-lock table on the recovered site if any other site is down.

The unavailable data-items will become available by one of the two approaches:

Copier transactions (initiated by the system).
An update from an operational site with up-to-date copy and a read request on the site with the unavailable copy.

You may be able to measure by simulating the time it takes to make all unavailable copies up-to-date.

A minimum of three sites.

10. Peer-to-Peer Multimedia Distribution

Most of the contents distributed over the current P2P systems such as Kazaa and Gnutella are multimedia files, e.g., music files and videos. Nonetheless, current systems distribute these multimedia files using bulk-file download, which does not consider the time inter-dependency among different segments of a multimedia file. This means that a client has to download the entire file before starting playing it out. Consider, for example, a one-hour movie recorded at 1 Mb/s, and being downloaded by a client with an in-bound bandwidth of 1.5 Mb/s. Ignoring all protocols overhead and retransmissions, the client will have to wait for 40 minutes to start watching the movie!

In this project, you will develop a real-time media streaming system on top of a P2P lookup service. You can use any lookup service you want. Examples include, JXTA (jxta.org), Gnutella, and FreePastry. The real-time streaming system will start playing out the requested movie after a short (e.g., order of seconds) waiting period. For more information, please refer to the following papers:

Hefeeda, M., Habib, A., Xu, D., Bhargava, B., & Botev, B. (2005). CollectCast: A peer-to-peer service for media streaming. Multimedia Systems, 11(1), 68-81.
Palacios, S., Santos, V., Barsallo Yi, E. & Bhargava, B. (2019).MioStream: a peer-to-peer distributed live media streaming on the edge", Multimedia Tools and Applications, 1-24.

You should do the following:

Use the lookup service to return multiple suppliers for a client.
Choose a subset of them and assign streaming rates and different data segments to each.
At the client, re-assemble (in real-time) and put the contiguous data in the player's buffer. You may use the Berkeley's mpeg player to play out the media file while streaming.
Adapt to failures: if a supplier fails, you need to replace it with another one from the set of backup suppliers. Then, redistribute rates and resume streaming.

A minimum of four sites (a client and at three senders) are required.
If interested, first read the above papers. Then, discuss with the TA your ideas and the scope of the project.
Group Project is possible depending on the scope of the project -- for example, P2P streaming with data integrity.

11. Verifying Data Integrity in Peer-to-Peer Video Streaming

In P2P video streaming, multiple senders serve one client. Each sender provides different segments of the video. The segments are assembled and ordered by the client. A major concern in the P2P environment is whether a peer provides genuine data. If a peer provides corrupt data, it severely impacts the quality of the streaming session. In this project, you will implement distributed data verification protocols in P2P environment. The verification protocol has to run in real-time, i.e., during the streaming session. Two data verification protocols are proposed in the following paper:

Ahsan Habib, Dongyan Xu, Mikhail Atallah, Bharat Bhargava, "Verifying Data Integrity in Peer-to-Peer Video Streaming," CERIAS TR 2002-39, Purdue University, Revised 2003,

You should do the following:

Read the above paper carefully.
Then, choose one of the protocols: (1) One Time Digest Protocol (OTDP) or (2) Tree-based Forward Digest Protocol (TFDP). Implement the protocol and test it. You may propose a different protocol.
Perform some experimental study (similar to those proposed in the paper).

A minimum of four sites (a client and at three senders) are required.
If interested, first read the above paper. Then, discuss with the TA your ideas and the scope of the project.
Group Project is possible depending on the scope of the project -- for example, P2P streaming with data integrity.

12. Active Bundle for Private Data Dissemination

The aim of this project is to ensure secure data dissemination in a distributed environment. Data owners specify policies for data sharing. These policies need to be enforced before data is shared with different hosts (including unknown or untrusted hosts). The goal is to implement a policy-based controlled data dissemination mechanism that is able to selectively disseminate data based on host credentials and owner policies. This will enable data privacy throughout its lifecycle (i.e. from data owner to the final destination traversing intermediate nodes). One way to achieve this is using the Active Bundle mechanism.

References:

Cross-domain data dissemination and policy enforcement (2015). Rohit Ranchal.
Privacy-preserving data dissemination in untrusted cloud (2017, June). D. Ulybyshev, B. Bhargava, M. Villarreal-Vasquez, A. Alsalem, D. Steiner, L. Li, & R. Ranchal, R. In 2017 IEEE 10th International Conference on Cloud Computing (CLOUD) (pp. 770-773). IEEE.

13. Service Management in SOA

Modern Web-scale solutions (e.g. Netflix, Amazon, Airbnb) are based on Micro-service Architecture. These solutions have elastically auto-scaled capacity to handle changes in demand and ensure the quality of service under the peak. Elastic scalability creates a highly dynamic environment where services are automatically provisioned and unprovisioned. The newly provisioned services need to be registered so that existing services in the system can discover them. Services need to be informed of any unprovisioned services. The health of all services in the system need to be tracked. This project is about building a Directory service that provides a registry for locating different services and the number of instances available for each service in the system. Due to the elastic nature, services need to be automatically registered and removed. The Directory service keeps track of registered services and automatically removes unhealthy services from its registry.

References:

Open-Source Service Discovery

14. Adaptable Security in Mobile Cloud

The goal of this project is to advance real-time mobile-cloud computing of data and/or computation intensive applications by providing a secure adaptability framework that achieves maximum performance with minimum cost (communication and power) despite varying network and cloud conditions. We aim to develop a secure mobile-cloud computing framework that partitions a mobile application for distributed execution on the mobile and cloud platforms, where the decision regarding the execution location for each application partition is made dynamically.

15. Centralized packet management for wireless sensor networks

In wireless sensor networks (WSN) packets can be lost due to several reasons: node failures; external interference and noise, leading to link quality degradation; receiver's buffer overflow etc. It is important to detect the packet loss. The goal of this project is to implement a centralized algorithm to detect whether a packet was received by a sink node and to detect link quality between nodes in the wireless sensor network. Also, consider distributed implementation for the packet loss analysis, maintaining the databases with necessary service data on different nodes.

The algorithm should be implemented in a network simulator "TOSSIM" under TinyOS. The implementation language is "nesC", which is a C-dialect used in sensor networks. To calculate link quality between nodes in WSN the Collection Tree Protocol (CTP) can be employed. You may find useful links for TinyOS and TOSSIM tutorials in the references.

References: