CS 542: Distributed Database Systems

Monday, Wednesday, and Friday from 10:30-11:20
LWSN 1106

Chris Clifton

Email: clifton_nospam@cs_nojunk.purdue.edu
Office hours: By appointment (or just drop by LWSN 2142F, I'm generally in 8:30-5)

Fundamental issues in distributed database systems that are motivated by the computer networking and distribution of processors and databases. The theory, design, specification, implementation, and performance of distributed database systems.

Teaching Assistant

None.

Course Methodology

The course will be taught through lectures, supplemented with reading. The primary reading will be from the text, with supplementary material from current research literature where appropriate. The written assignments and projects are also a significant component of the learning experience.

For now, Professor Clifton will not have regular office hours. Feel free to drop by anytime, or send email with some suggested times to schedule an appointment.

You can also send things to the course email list (if traffic goes beyond 1-2/week, we'll start a newsgroup instead.)

Prerequisites

The official requirement is a bit confused due to the transition to Banner, but you should have some level of database background, such as CS 448 (Introduction to Relational Database Systems), CS 541 (Database Systems), or the equivalent. Students who have not had a prior database course, but feel they have equivalent experience gained elsewhere, please see the instructor.

Evaluation/Grading

Evaluation will be a subjective process (see my grading standards), however, it will be based primarily on your understanding of the material as evidenced in:

Exams will be open note / open book. To avoid a disparity between resources available to different students, electronic aids are not permitted.

Projects and assignments will be evaluated on a ten point scale:

10
Exceptional work. So good that it makes up for substandard work elsewhere in the course. These will be rare.
8
What I'd expect of a Ph.D. candidate. This corresponds to an A grade.
6
Good enough for a Master's degree, but not what I'd like to see for a Ph.D. candidate. This corresponds to a B grade.
4
Okay for a Master's candidate who does extremely well in other courses. This corresponds to a C grade.
2
Not good enough for a graduate student. But something.
0
Missing work, or so bad that you needn't have bothered.

Late work will be penalized 10% per day (24 hour period). This penalty will apply except in case of documented emergency (e.g., medical emergency), or by prior arrangement if doing the work in advance is impossible due to fault of the instructor (e.g., you are going to a conference and ask to start the project early, but I don't have it ready yet.)

Blackboard will be used to record/distribute grades and turn in assignments.

Projects

Stay tuned...

Qualifier Requirements

The qualifying exam will consist of an hour-long supplement given at the end of the course. Passing the qualifier will require both suitable performance in the course and on the qualifying exam. All computer science students who have not passed a relevant fourth qualifier are encouraged to take the exam, even if you do not currently plan to pursue a Ph.D.

Policy on Intellectual Honesty

Please read the departmental academic integrity policy above. This will be followed unless I provide written documentation of exceptions. In particular, I encourage interaction: you should feel free to discuss the course with other students. However, unless otherwise noted work turned in should reflect your own efforts and knowledge.

For example, if you are discussing an assignment with another student, and you feel you know the material better than the other student, think of yourself as a teacher. Your goal is to make sure that after your discussion, the student is capable of doing similar work independently; their turned-in assignment should reflect this capability. If you need to work through details, try to work on a related, but different, problem.

If you feel you may have overstepped these bounds, or are not sure, please come talk to me and/or note on what you turn in that it represents collaborative effort (the same holds for information obtained from other sources that you provided substantial portions of the solution.) If I feel you have gone beyond acceptable limits, I will let you know, and if necessary we will find an alternative way of ensuring you know the material. Help you receive in such a borderline case, if cited and not part of a pattern of egregious behavior, is not in my opinion academic dishonesty, and will at most result in a requirement that you demonstrate your knowledge in some alternate manner.

Text

Principles of Distributed Database Systems, by M. Tamer Ozsu and Patrick Valduriez. Prentice Hall, 1999, ISBN 0-13-659707-6

Syllabus (numbers correspond to roughly to week):

The schedule is currently in progress, however, it will be similar to the previous offering.

  1. Course Introduction, Relational Database Review, Concurrency Control Review
    Reading: Ozsu Chapters 1, 2 (skip 2.4.2), skim 3 and 4.1, read 10.1, 10.2, and 11.1. This seems like a lot, but most of it SHOULD be review.
  2. January 19: No Class (Martin Luther King, Jr. Day)
    Concurrency Control: Serializability Theory, Two-Phase Locking. Reading: 11.2, 11.3.1.
    Assignment 1 (due 30 January).
  3. Distributed Locking, deadlock detection. Reading: 11.3.3, 11.6.
    Non-Locking Schedulers. Reading: 11.4.
  4. Recovery Algorithms. Reading: 12.1-12.4.
    Project Part 1 (due 17 February).
  5. Distributed Recovery. Reading: 12.5-12.10.
    Optional reading (I'd recommend at least one version to get used to the difference between textbook treatment and a research paper):
    Skeen, Dale, A Formal Model of Crash Recovery in a Distributed System, IEEE Transactions on Software Engineering 9(3), May 1983, pp.219-228. (Preliminary version from SIGMOD'81.)
    February 13: Guest Lecture, Prof. Bharat Bhargava: Communications for Transaction Processing. Reading: Skim Chapter 3.
  6. Assignment 2 (due 23 February).
    Replicated Data. Reading: 12.7.3-12.7.5.
    February 20: Guest Lecture, Prof. Sunil Prabhakar.
  7. Distributed Database Design. Reading: 5-5.3
    Project Part 2 (due 13 March).
    February 25: Guest Lecture, Prof. Ahmed Elmagarmid: Data Integration. Further reading on data integration.
  8. Distributed Database Design. Reading: 5.4-5.6.
    March 5: Midterm Review
  9. Introduction to Query Processing. Reading: Chapter 7.
    The midterm will be March 11, in class.
    March 16-20: Spring Break
  10. March 23: Drop Date
    Query Processing. Reading: Chapters 8, 9.1.
    Assignment 3 (due 3 April).
  11. Query Processing. Reading: Chapter 9.2-9.4.
    Hash Join (pages 23-27)
    Project 3 begins April 3, due April 23 18:00 (demos to be scheduled that day.)
  12. Parallel Databases. You may also find Section 15.1 of the book to be useful.
    April 6: Guest Lecture, Prof. Walid Aref.
    Distributed In-Memory DB: Microsoft's Project Velocity. Slides from Murali Krishnaprasad, Slides from Anil Nori at the 2008 Microsoft Professional Development Conference.
  13. Alon Halevy, Dataspaces: A New Abstraction for Data Management, Keynote talk at the International Conference on Database Systems for Advanced Applications, Singapore, April 13, 2006.
  14. Anastassia Ailamaki, Database Architectures for New Hardware, invited tutorial at the 30th International Conference on Very Large Data Bases, Toronto, Canada, August 2004.
    Arnon Rosenthal and Marianne Winslett, Security of Shared Data in Large Systems: State of the Art and Research Directions tutorial at the 30th International Conference on Very Large Data Bases, Toronto, Canada, August 2004.
    Assignment 4 (due 1 May).
  15. April 27: Project 3 presentations.
    April 29: Guest Lecture, Prof. Ahmed Elmagarmid: Data Integration continued. Further reading on data integration.
    Review

Final exam Monday, 4 May, 2009: 10:20-12:20, LWSN 1106.

Qualifying Exam, Wednesday, 6 May, 2009: 11:00-12:00, LWSN 1106.


Valid XHTML 1.1