CS 44800: Introduction to Relational Database Management Systems

Project 3: Logging and Recovery

Team selection due 4:00pmEDT on Wednesday, December 1; Individual portion due 11:59pmEDT Monday, 6 December; Team portion due 11:59pmEST Thursday, 9 December 2021

Please turn in code on lab machines through turnin (turnin -c cs448 -p project3 <submission folder>) and the report through Gradescope, as with Project 2. Please do not turn in class files and cruft created by your development environment, just turn in the source needed to compile and run your project and the tests you use, along with a README.txt showing how to run your tests. As with project 2, things should run on the lab machines (amber01 - amber30.cs.purdue.edu). Make sure that you mark the start/end of each part in Gradescope. Please typeset your report; handwritten figures/drawings accepted where needed.

This assignment has three options for the individual portion, and will be done in teams of two or three. As with project 1, the teammates should be from your PSO, but need not be the same as projects 1 or 2 (although they can be). The projects are reasonably independent, you can each choose to do one of tasks 1, 2, or 3 (as long as you each do something different.) Note that Tasks 2 and 3 have some overlap, Task 2 isn't very interesting if Task 1 hasn't been done (think about why), and Task 3 is a bit harder to do nicely if Task 2 isn't done, so it is probably best to just do tasks 1 and 2 if you have a two person team. Please send your PSO instructor your team selection and which team member will be doing which task by 4pmEDT on Wednesday, December 1.

We encourage you to discuss your individual portions with your teammates. Even though you are primarily responsible for one task, understanding what your teammate(s) are doing will make the team integration portion much easier (since you'll think more about what you need to do for integration when doing your individual task.) Furthermore, it will give you an opportunity to learn about parts of the query processor that you don't need to modify for your task. Finally, explaining what you are doing to your teammate(s) will help you solidify your understanding of the parts of the system you are working with.

SimpleDB code base

We recommend you start with the default SimpleDB 3.4 code base, which does implement basic logging and recovery. The code is available in the lab machines (amber01 - amber30.cs.purdue.edu) at /homes/cs448/SimpleDB.zip, or can be downloaded using https.

SimpleDB already includes support for checkpointing, however, it only does a checkpoint after recovering and before accepting any new queries. This is a quiescent checkpoint; all transactions must complete and nothing be running to do a checkpoint. It also only supports undo logging, this means that to commit a transaction, all pages modified by that transaction must be flushed to disk, then the commit record flushed to the log.

Individual Task 1: Undo/Redo Logging

The current implementation of SimpleDB requires that all pages modifed to a transaction be written to disk before the transaction can commit (and write the commit record to the log.) SimpleDB implements only undo logging as a result, since all committed (or aborted) transactions are already reflected in the data on disk.

Task 1 is to implement Undo/Redo logging, so that you don't need to write all modifed pages before a transaction commits. The undo is already implemented, all you need to do is:

Adding the new value to the log record (it currently only records the old value). The setInt/setString in RecoveryManager gets newval, and this would need to be passed to the SetIntRecord.writeToLog/SetStringRecord.writeToLog, as well as updates to the log structures to support this. Hint: Page is a wrapper on a string of bytes that is passed to LogManager, it isn't really something that creates a page.
Implementing a redo function that takes a log record, and if the transaction is in the completed transaction list, writes the new value in the appropriate place. This is almost identical to the undo function, except that it uses the new value.
Implementing an iterator that moves forward through the log for the redo pass (the current iterator moves back through the log until it reaches a checkpoint record.)
Turning of the forced write of pages when a transaction commits, so that you see the performance improvement from having an undo/redo log.

Individual Task 2: Fuzzy Checkpointing

In class, we discussed a non-quiescent checkpoint, where there is a start checkpoint log record that lists all transactions running at the time of the start checkpoint, and an end checkpoint log record once all pages modified at the start have been written out. Task 2 is to implement this capability. This will require:

Storing the list of active transactions with the checkpoint log record (probably the most difficult part of the task)
Causing all the modifed buffers to be written to disk. This is also a part of Task 3, and isn't that hard.
Creating a new end checkpoint log record, and writing it to the log.
Updating the recovery process so that doesn't stop until after it first sees and end checkpoint, then a start checkpoint (the existing checkpoint record.)

Individual Task 3: Forced Checkpoint

The undo logging needs to go back to the previous checkpoint to undo any transactions that may still be running. If the database stays up a long time, this could be expensive. A better approach is to do checkpoints either periodically, or on demand. Task 1 is to implement one of these. You can implement a timer that causes a checkpoint to occur (the buffer manager has an example of a timer; if a transaction waits too long for a buffer to be available it times out.) Alternatively, you can implement a new "checkpoint" command that will cause a checkpoint to occur.

The current checkpoint is part of the recovery process; when it recovers it undoes all in-process transactions, flushes all buffers modified as part of the undo, then writes a checkpoint record. You'll need to find a different way to flush all modified buffers. The page and buffer manager currently has code to flush a modified page when it is replaced, so you'll be able to use that as an example. You could either keep a list of all modified buffers, or you could go through all buffers and see if they are modified or not (since either is an in-memory operation, it should be fast.)

Perhaps the hardest part of this task is that this can only happen if no transactions are running, unless Task 2 (Fuzzy Checkpoint) is done. You can get nearly full credit if you just assume that no transactions are running (in other words, it is okay if it silently result in a corrupt database if other transactions are running and the database crashes/recovers), provided you note in your report that this could occur with your code. For full credit you should deal with this possibility, either through having a fuzzy checkpoint, or through waiting for other transactions to finish before doing the checkpoint.

Individual Report Contents:

Instructions for running your code (should also be in README.txt file in the code)
An overview of how your approach works and changes made to the code. Include any limitations (e.g., if you did Task 3, you may note that the database could recover to an incorrect state if another transaction was running during the checkpoint.)
How you tested for correctness and short test results
How you tested to see if the desired improvements actually materialized (e.g., if you have fewer page writes or faster commits with undo/redo logging.)

Team Portion: Integration

The team portion is simply to put your pieces together and make them run together. This may involve turning off some features, such as waiting for other transactions to stop before checkpointing if you do task 3. You may also find that you have multiple tasks make changes to the same modules, so integration will be easiest if you communicate well from the beginning.

Your team report should include:

An overview of how you made appropriate choices of algorithms to use, and a discussion of changes made to the code (from what you turned in for your individual tasks).
How you have tested for correctness (and short test results). You may find this requires test cases beyond what you've done for the individual portions, but if you've thought about this in advance, you may find you've already created sufficient tests.
How you have tested for performance improvements (and short test results).

We have enabled a team submission feature in Gradescope, but how this works doesn't show up in the instructor view. Your report should include the Names and CAREER ID (email address, not the PUID number) of all teammates, and which one of you is turning in the code and full report. If the team feature seems to work (e.g., when you submit in gradescope, you can name multiple people as working on the project), then just turn in once as a group. If you don't see this option, then only one person should turn in the full report, others should just list the Names and CAREER ID of the team, and who is turning in the full report.

The team portion is due four days after the last individual portion is submitted. If one of your team members is late (and uses late days or is penalized for late work) on their individual portion, there will be no late penalty (or late days used on the team portion) until more than four days after the last individual portion is submitted.

The team code should be turned in by one team memberon lab machines using turnin -c cs448 -p team3 <submission folder> and the report through Gradescope using the team submission feature.