CS 54100: Database Systems

MWF 09:30-10:20

KNOY B031

Chris Clifton

Email: clifton_nospam@cs_nojunk.purdue.edu

Course Outline

Course Topics

Fundamentals for the logical design of database systems. The entity-relationship model, semantic model, relational model, hierarchical model, network model. Implementations of the models. Design theory for relational databases. Design of query languages and the use of semantics for query optimization. Design and verification of integrity assertions, and security. Introduction to intelligent query processing and database machines.

Teaching Assistant

Duong Nguyen
Office: HAAS G072
Office hours: MR 13:30-15:00.
Office hours to be held at: LWSN B116
Phone:
Email: nduong@purdue.edu

Mailing List

There will be a course email list used for high-priority announcements. This will use your @purdue.edu email address; make sure this is forwarded to someplace you look on a regular basis.

We will be using Blackboard for turning in assignments, recording and distributing grades.

Course Methodology

The course will be taught through lectures, supplemented with reading. The primary reading will be from the text, with supplementary material from current research literature where appropriate. The written assignments and projects are also a significant component of the learning experience.

For review (and if you miss a lecture), you can pick them up as a vodcast/podcast. Be warned that the audio isn't great; you only see what is on the screen, not what is written on the chalkboard; and you can't ask (or answer) questions; so it isn't really a viable alternative to attending lecture.

You can also use Mixable (a bulletin-board style discussion tool limited to the class). Critical announcements (only things that are of importance to all students) may be sent to the course email list.

Prerequisites

The official requirement is CS 448 (Introduction to Relational Database Systems). Students who have not had an undergraduate database course, but feel they have equivalent experience gained elsewhere, please see the instructor. In particular, you should have a reasonable understanding of data structures (e.g., weeks 5, 6, 8, 9, and 10 of CS 251), and a user's knowledge of databases (i.e., you can write queries in SQL or some other query language, understand the ideas if not all the theory behind designing tables/database, differences between logical and physical database design - e.g., CS 348.) If you have this background, but not the official prerequisite, it is likely you will be able to manage the course with additional self-study, but your workload may be higher.

Evaluation/Grading

Evaluation will be a subjective process (see my grading standards), however it will be based primarily on your understanding of the material as evidenced in:

Exams will be open note / open book. To avoid a disparity between resources available to different students, electronic aids are not permitted.

Projects and assignments will be evaluated on a ten point scale:

10
Exceptional work. So good that it makes up for substandard work elsewhere in the course. These will be rare.
8
What I'd expect of a Ph.D. candidate. This corresponds to an A grade.
6
Good enough for a Master's degree, but not what I'd like to see for a Ph.D. candidate. This corresponds to a B grade.
4
Okay for a Master's candidate who does extremely well in other courses. This corresponds to a C grade.
2
Not good enough for a graduate student. But something.
0
Missing work, or so bad that you needn't have bothered.

Late work will be penalized 1 point per day (24 hour period). This penalty will apply except in case of documented emergency (e.g., medical emergency), or by prior arrangement if doing the work in advance is impossible due to fault of the instructor (e.g., you are going to a conference and ask to start the project early, but I don't have it ready yet.)

Blackboard will be used to record/distribute grades (and, in some cases, for turning in assignments.)

Projects

The following is taken from the way things were done last spring. There may be changes, but this should give you a general idea.

The projects are an important part of the course, and will involve a significant amount of C++ programming. The first projects will be SQL programming assignments. The remaining projects will be performed in teams of two. The purpose is for each team to build parts of a working single-user relational database management system. You will start almost from scratch - a few basic components may be provided to you. By the end of the course, you will have built a simple DBMS by completing four separate assignments.

In most of the assignments, you will be given C++ class definitions with function templates. You will need to actually implement the functions. Implementing the various interfaces involves several hundred lines of code. At the end of the project, you will understand the basic relational DBMS concepts because you have implemented them. Each of your assignments builds upon the code written in the previous assignments.

At the end of every assignment, you have the option of throwing away your code and instead using code supplied by the TA.

Qualifier Requirements

The qualifying exam will consist of an hour-long supplement given at the end of the course. Passing the qualifier will require both suitable performance in the course and on the qualifying exam. All computer science students are encouraged to take the exam, even if you do not currently plan to pursue a Ph.D.

Policy on Intellectual Honesty

Please read the departmental academic integrity policy above. This will be followed unless I provide written documentation of exceptions. In particular, I encourage interaction: you should feel free to discuss the course with other students. However, unless otherwise noted work turned in should reflect your own efforts and knowledge.

For example, if you are discussing an assignment with another student, and you feel you know the material better than the other student, think of yourself as a teacher. Your goal is to make sure that after your discussion, the student is capable of doing similar work independently; their turned-in assignment should reflect this capability. If you need to work through details, try to work on a related, but different, problem.

If you feel you may have overstepped these bounds, or are not sure, please come talk to me and/or note on what you turn in that it represents collaborative effort (the same holds for information obtained from other sources that you provided substantial portions of the solution.) If I feel you have gone beyond acceptable limits, I will let you know, and if necessary we will find an alternative way of ensuring you know the material. Help you receive in such a borderline case, if cited and not part of a pattern of egregious behavior, is not in my opinion academic dishonesty, and will at most result in a requirement that you demonstrate your knowledge in some alternate manner.

Text

Database Management Systems, by Raghu Ramakrishnan and J. Gehrke. McGraw Hill, 2003, ISBN 0-07-246563-8.

Syllabus (numbers correspond to roughly to week):

This schedule is currently from a previous semester - dates and chapters will change. Depending on the background of the class, I may compress or expand the first four weeks. Numbers in parentheses correspond to chapters in the text. Some topics will be supplemented with readings from current research literature, this depends on how quickly we can move through the first four items on the list.

  1. Course Introduction
  2. Data Modeling Assignment 1 (Starts 20 January, due 27 January). (Solutions.)
  3. Relational Theory: Keys, Dependencies, and Normalization (3.4-3.5, 19)
  4. Using a Relational Database
    Project 1: SQL. Part 1 due February 6, Part 2 due February 13.
    Note: 3/4 of the class has a reasonable knowledge of SQL. For the rest of you, reading Chapters 6.1-6.6 and doing Project 1 will bring you up to speed.
    February 3: Guest Lecture by Prof. Walid Aref: Relational Algebra
  5. Relational Theory: Relational Algebra and Calculus (4)
    DBMS Internals Overview
    Assignment 2: Relational Theory (Starts 13 February, due 17 February). (Solutions.)
  6. Storage Mechanisms: Project 2 starts 17 February, due 2 March
  7. Indexing (10)
  8. Hashing / Bitmap Indexes
    Midterm preparation non-assignment.
  9. Review for midterm.
    March 7: Midterm, in class. (Solutions.)
    March 9: Guest Lecture by Prof. Elisa Bertino: Database Security / Access Control
  10. Assignment 3 starts, due 26 March. (Solutions.)
    Query Processing (another treatment) (12)
  11. Query Optimization (14,15)
    Project 3 starts, due 6 April.
    Optional reading: GRACE hash join, Hybrid hash join.
  12. Concurrency Control (16,17)
    Assignment 4 starts, due 11 April. (solutions)
    Project 4 starts, due 20 April.
  13. Handling Failure (18)
  14. Practical SQL: Procedures, Authorizations, Transactions (see also Chapters 6, 21)
    New DBMS models: Hadoop, NoSQL, ...
    Data Warehousing / Column Stores
    Assignment 5 starts, due 27 April.
  15. Review
    April 27: Guest Lecture by Prof. Walid Aref: Database Research Directions

You may also want to see the canonical syllabus.

Final Exam Wednesday, 2 May, 1-3pm, KNOY B031. (old solutions).

Qualifying exam, Friday, 4 May, 9-11am, LWSN B134. (Sample from a past year.)

May 5, 21:00: Earliest time that you should plan to leave campus if you don't know know your exam schedule (available late February.) I bought a ticket to go home earlier is not a valid excuse for an exam to be rescheduled.


This page last modified

Valid XHTML 1.1