CS 541: Database Systems

Tuesday and Thursday from 9:00-10:15
LWSN 1106

Chris Clifton

Email: clifton_nospam@cs_nojunk.purdue.edu
Office hours: By appointment (or just drop by LWSN 2142F, I'm generally in 8:30-5)

Fundamentals for the logical design of database systems. The entity-relationship model, semantic model, relational model, hierarchical model, network model. Implementations of the models. Design theory for relational databases. Design of query languages and the use of semantics for query optimization. Design and verification of integrity assertions, and security. Introduction to intelligent query processing and database machines.

Teaching Assistant

None. Please bear with me, I'm a bit swamped through the end of October.

Course Methodology

The course will be taught through lectures, supplemented with reading. The primary reading will be from the text, with supplementary material from current research literature where appropriate. The written assignments and projects are also a significant component of the learning experience.

For now, Professor Clifton will not have regular office hours. Feel free to drop by anytime, or send email with some suggested times to schedule an appointment.

You can also send things to the course email list (if traffic goes beyond 1-2/week, we'll start a newsgroup instead.)

Prerequisites

The official requirement is CS 448 (Introduction to Relational Database Systems). Students who have not had an undergraduate database course, but feel they have equivalent experience gained elsewhere, please see the instructor. In particular, you should have a reasonable understanding of data structures (e.g., weeks 5, 6, 8, 9, and 10 of CS 251), and a user's knowledge of databases (i.e., you can write queries in SQL or some other query language, understand the ideas if not all the theory behind designing tables/database, differences between logical and physical database design - e.g., CS 348.) If you have this background, but not the official prerequisite, it is likely you will be able to manage the course with additional self-study, but your workload may be higher.

Evaluation/Grading

Evaluation will be a subjective process (see my grading standards), however it will be based primarily on your understanding of the material as evidenced in:

Exams will be open note / open book. To avoid a disparity between resources available to different students, electronic aids are not permitted.

Projects and assignments will be evaluated on a ten point scale:

10
Exceptional work. So good that it makes up for substandard work elsewhere in the course. These will be rare.
8
What I'd expect of a Ph.D. candidate. This corresponds to an A grade.
6
Good enough for a Master's degree, but not what I'd like to see for a Ph.D. candidate. This corresponds to a B grade.
4
Okay for a Master's candidate who does extremely well in other courses. This corresponds to a C grade.
2
Not good enough for a graduate student. But something.
0
Missing work, or so bad that you needn't have bothered.

Late work will be penalized 1 point per day (24 hour period). This penalty will apply except in case of documented emergency (e.g., medical emergency), or by prior arrangement if doing the work in advance is impossible due to fault of the instructor (e.g., you are going to a conference and ask to start the project early, but I don't have it ready yet.)

WebCT will be used to record/distribute grades (and, if I figure it out, for turning in assignments.)

Projects

The following is taken from the way things were done last spring. There may be changes, but this should give you a general idea.

The projects are an important part of the course, and will involve a significant amount of C++ programming. The first projects will be SQL programming assignments. The remaining projects will be performed in teams of two. The purpose is for each team to build parts of a working single-user relational database management system. You will start almost from scratch - a few basic components may be provided to you. By the end of the course, you will have built a simple DBMS by completing four separate assignments.

In most of the assignments, you will be given C++ class definitions with function templates. You will need to actually implement the functions. Implementing the various interfaces involves several hundred lines of code. At the end of the project, you will understand the basic relational DBMS concepts because you have implemented them. Each of your assignments builds upon the code written in the previous assignments.

At the end of every assignment, you have the option of throwing away your code and instead using code supplied by the TA.

Qualifier Requirements

The qualifying exam will consist of an hour-long supplement given at the end of the course. Passing the qualifier will require both suitable performance in the course and on the qualifying exam. All computer science students are encouraged to take the exam, even if you do not currently plan to pursue a Ph.D.

Policy on Intellectual Honesty

Please read the departmental academic integrity policy above. This will be followed unless I provide written documentation of exceptions. In particular, I encourage interaction: you should feel free to discuss the course with other students. However, unless otherwise noted work turned in should reflect your own efforts and knowledge.

For example, if you are discussing an assignment with another student, and you feel you know the material better than the other student, think of yourself as a teacher. Your goal is to make sure that after your discussion, the student is capable of doing similar work independently; their turned-in assignment should reflect this capability. If you need to work through details, try to work on a related, but different, problem.

If you feel you may have overstepped these bounds, or are not sure, please come talk to me and/or note on what you turn in that it represents collaborative effort (the same holds for information obtained from other sources that you provided substantial portions of the solution.) If I feel you have gone beyond acceptable limits, I will let you know, and if necessary we will find an alternative way of ensuring you know the material. Help you receive in such a borderline case, if cited and not part of a pattern of egregious behavior, is not in my opinion academic dishonesty, and will at most result in a requirement that you demonstrate your knowledge in some alternate manner.

Text

Database Management Systems, by Raghu Ramakrishnan and J. Gehrke. McGraw Hill, 2003, ISBN 0-07-246563-8.

Syllabus (numbers correspond to roughly to week):

This schedule is currently from a previous semester - dates and chapters will change. Depending on the background of the class, I may compress or expand the first four weeks. Numbers in parentheses correspond to chapters in the text. Some topics will be supplemented with readings from current research literature, this depends on how quickly we can move through the first four items on the list.

  1. Course Introduction
  2. Data Modeling (PDF) Assignment 1: Entity-Relationship Data Model (Starts 30 August, due 11 September).
  3. Using a Relational Database (PDF)
    September 4: Guest lecture by Prof. Elisa Bertino: SQL
    September 6: Guest lecture by Dr. Dan Lin (Slides)
    Project 1: SQL (Starts 13 September, due 25 September).
  4. Relational Theory: Keys, Dependencies, and Normalization (PDF) (3.4-3.5, 19)
  5. Relational Theory: Relational Algebra and Calculus (PDF) (4)
    Assignment 2: Relational Theory (Starts 25 September, due 2 October).
  6. Storage mechanisms, (PDF), Disk Layout (9)
    September 27: Guest Lecture by Prof. Walid Aref: Indexing (8)
  7. Indexing (10)
    Hashing / Bitmap Indexes (11) (another treatment)
    Project 2 starts 10/4, due 10/30
  8. October 9: No classes (fall break)
    October 11: Prof. Luo Si: Guest Lecture on research issues at the intersection of database and Information Retrieval
  9. Query Processing (another treatment) (12)
  10. Midterm October 23, in class. Covers items 1-7 above (PDF). Old Solutions.
  11. Query Optimization (PDF) (14,15) October 30: Guest Lecture by Prof. Ahmed Elmagarmid
    Project 2 ends 10/30.
    Assignment 3 starts 11/5, due 11/13
  12. Concurrency Control (16,17)
    Project 3 starts
  13. Handling Failure (18)
    More on Transactions (17)
  14. Practical SQL: Procedures, Authorizations, Transactions (see also Chapters 6, 21)
    Project 4 due 12/4.
    Assignment 4 starts 11/27, due 12/6
    Database Privacy/Security Case Study: Hippocratic Databases
  15. Data Streaming (PDF)
    Review

Final exam 12 December, 13:00-15:00, TBD. (old Exam, solutions).

Qualifying Exam TBD in TBD.


Valid XHTML 1.1