CS 44800: Introduction to Relational Database Management Systems

Assignment 4: Query Processing, Concurrency Control

Due 11:59pmEDT Monday, 15 November 2021

Update 11/12 12:35 - 5.1: Clarified meaning of equivalent.

Update 11/17 10:30 - 5.2: Clarified that queries should be plural.

Please turn in a PDF through Gradescope. Make sure that you mark the start/end of each question in Gradescope. Assignments will be graded based on what you mark as the start/end of each question. Please typeset your answers where feasible, handwritten answers will be acceptable for drawings (e.g., trees).

Note: Exercises with book numbers are adapted from Silberschatz, Korth & Sudarshan; while you are encouraged to look there for further information (and may need to for some details) please answer the question as asked below (they may not be exactly the same.)

1. Query Transformation

(Based on SKS 15.2): Given the bank database of SKS Figure 15.14, and the query:
select T. branch_name from branch T, branch S where T.assets > S.assets and S.branch_city = "Brooklyn"

Give an efficient relational algebra expression for this query. Explain why it should be efficient.
Explain the processing steps you would use for your expression. E.g., are operations pipelined or blocking, algorithms used, etc.
How much memory (buffer space) would you require, and how many I/Os? (Note that you may not be able to give an exact number, but describe in terms of, for example, the size of the tables or depth of indexes.)
Describe a situation where your expression may not be the most efficient, or justify that it would be impossible for some other expression to be more efficient.

2. Cost Estimation

SKS 16.6

3. Query Transformation

(Based on SKS 16.17(b)): Show that σ_θ₁&θ₂(E₁ ⋈_θ₃ E₂) = σ_θ₁(E₁ ⋈_θ₃ (σ_θ₂(E₂)), where θ₂ uses only attributes from E₂. How could use use this fact to improve query performance?

4. Cost Estimation Techniques

(Based on SKS 16.20)

Explain how to use a histogram to estimate the size of a selection of the form σ_A≤v(R).
Describe how, or give an example where, the histogram approach could give a better estimate than using min/max and v.

Note: Questions above this point should be fairly quick, and material that you have a reasonable handle on from the project. They cover material that will be on the second midterm. The material below will not be on the second midterm.

5. Serializable Schedules

Give an example of a serializable schedule of two or more transactions that you would expect to be faster than the conflict-equivalent (or view equivalent) serial schedule.
Give an example of SQL queries that you think could lead to the situation you describe, where running them concurrently is faster than running one then the other.

6. Two-phase locking

Assume R₁(A) means Transaction 1 reads item A, W₂(B) means Transaction 2 Writes B, ...

Given the schedule: R₁(A) R₂(A) R₁(B) W₁(B) R₂(C) R₁(C) W₂(A) W₂(C) W₁(B) R₂(A)

Is the schedule serializable? Explain.
Show an order of locking and unlocking (with just a single lock type) that satisfies two-phase locking. Is the schedule able to complete, or is it impossible for the reads/writes to occur as given with two-phase locking?
Show a schedule with reads/write rearranged assuming you were using 2-phase locking. Does the schedule complete, or does it deadlock?
Does your schedule from the previous part satisfy strict 2-phase locking?
Repeat parts 2-4 using shared and exclusive locks. Does this improve anything?

7. Regular vs. Strict Two-Phase Locking

What is the advantage of strict two-phase locking over basic two-phase locking?
Give an example schedule where regular two-phase locking has a problem that would be prevented by strict two-phase locking.
Give an example schedule where regular two-phase locking could give better/faster results than strict two-phase locking.

8. Deadlock Avoidance

Timestamp based deadlock avoidance schemes, such as wait-die and wound-wait, prevent deadlines from occuring.

Prove that a deadlock cannot occur using wound-wait.
Give an example schedule where wait-die would cause a transaction to abort/rollback and have to restart, but a deadlock would not occur.
The wound-wait and wait-die schemes lead to unnecessary restarts, but deadlock detection is complicated and costly, and lock timeout lead to the possibility of starvation. Can you develop a mechanism that would prevent the unnecessary rollback you gave in part 2, but would not face starvation? Hint: start with the simplest to implement mechanisms discussed.