CS44800: Introduction to Database Systems, Fall 2019

Home Syllabus Schedule Homework, Projects, & Handouts Readings

Weekly Readings

Please go over the concepts in chapters 1-3.
Read about various types of, data model, schema/subschema, data independence, Fig. 1.1, Fig 2.3, Fig 3.1.
Read about benefits of using a DBMS.
Read appendix B on disk parameters.

Main ideas that we have discussed in first two weeks of class.

  1. Need to think about user, database, schema/catalog, DBMS software/utilities.
  2. Need to think about implementing query processing, recovery, privacy, concurrency sotware.
  3. Parameters for performance evaluation and how to implement ideas that can improve performance (query optimization, I/O optimization, reducing I/Os by getting more data in one I/O operation for use later).
  4. Relationship among data, implementing hierarchical and network model that query/update one record at a time vs relation model that deal with set of records. Type of queries for different models. Hierarchical model needs, commands such get next, get next within parent. Network model goes for one entity to another entity via links and uses the command find next. I introduced these briefly, but more in "other materials" after slides for Chapter 29.
We move on to discuss Chapter 16 briefly, so we know how to think of implementing I/O.

B Tree, B+ Tree, I/O time, insertion in B-Tree, related to Chapter 17.
Please, read: https://en.wikipedia.org/wiki/B-tree

  1. Overview.
    See tree below.

  2. A B-tree (Bayer & McCreight 1972) of order 5 (Knuth 1998).
    Image by Wikipedia
  3. Difference between B Tree and B+ Tree.
    See https://en.wikipedia.org/wiki/B%2B_tree.
    A B+ tree can be viewed as a B-tree in which each node contains only keys (not key-value pairs), and to which an additional level is added at the bottom with linked leaves.
    See tree below.

  4. A simple B+ tree example linking the keys 1-7 to data values d1-d7.
    The linked list (red) allows rapid in-order traversal. This particular tree's branching factor is b=4.
    Image by Wikipedia
  5. Time to search a sorted file.
    Large databases have historically been kept on disk drives. The time to read a record on a disk drive far exceeds the time needed to compare keys once the record is available. The time to read a record from a disk drive involves a seek time and a rotational delay. The seek time may be 0 to 20 or more milliseconds, and the rotational delay averages about half the rotation period. For a 7200 RPM drive, the rotation period is 8.33 milliseconds. For a drive such as the Seagate ST3500320NS, the track-to-track seek time is 0.8 milliseconds and the average reading seek time is 8.5 milliseconds. For simplicity, assume reading from disk takes about 10 milliseconds.
    Naively, then, the time to locate one record out of a million would take 20 disk reads times 10 milliseconds per disk read, which is 0.2 seconds.
  6. An index speeds the search.
  7. See an example of Insertion and the tree below.

  8. A B Tree insertion example with each iteration.
    The nodes of this B tree have at most 3 children (Knuth order 3).
    Image by Wikipedia
If students get a chance, please go over the example 1 on page 605, example 2 on page 606, example 3 on page 609, example 4 on page 614.

Useful readings from Chapter 16:

Useful readings from Chapter 17:

  1. Please read about constraints in relational model.
  2. Please think about how to store constraints(in schema or in another relation). Also, think about how to store schema and subschema (again in a relational model).
  3. How to verify constraints (after each action or after a set of actions)? What overheads in terms of disk block accesses are caused? How to minimize the overhead?
  4. In chapter 19, please see how index structures are used when sorting a large file.

In chapter 5, please look up the following in addition to the introduction to Relational Model:

In chapter 6, please read about:

Read about Relational Algebra operations such as:

Next, I want to focus of Functional dependencies and normal forms in chapter 14.

All questions in midterm and final are based on topics covered and emphasized in class lectures. If you go over the readings under course web page, you will see the topics that I have covered and you should read these in the book.

Go over the terms in bold in various chapters.
Go over the Review Questions after each chapter and see if you can answer them.

Please answer questions in bullet form and be direct. Each question will be followed by 1/4 page of space to write your answer.
Midterm will have some multiple choice questions and final will have many more and may be all of them.
I tried to ask few questions based on projects ( supplied by TAs).

I will cover Relational Algebra, chapter 8, on Tuesday. I plan to ask some questions on this topic in midterm

Chapter 14 slides and corresponding pages in Chapter 14, pages 459-468 and pages 471-474.

I will go over these sections of book in class on 10th October. I started to talk about this material on 3rd October.

NOTE: Normal Form (First, second, third, BCNF, etc) will be covered in final exam. (NOT MIDTERM)

Chapter 14

Chapter 15

Chapter 20

Chapter 21

Chapter 20

Chapter 21

Chapter 22

Read about 2PL that guarantees serializability (pdf).

NOTE: This is also a handout under the Chapter 21: Concurrency Control Techniques.

Further Readings and Preparing for Final Examination:

In the Final Exam in addition to TRUE/FALSE, there will Multiple choice questions on: