Advanced Topics in Distributed Systems

Monday, Wednesday, and Friday from 1:30-2:20
RHPH 162 (note change)

Chris Clifton

Email: clifton_nospam@cs_nojunk.purdue.edu

Design and control of distributed computing systems (operating systems and database systems). Topics include principles of naming and location, atomicity, resource sharing, concurrency control and other synchronization, deadlock detection and avoidance, security, distributed data access and control, integration of operating systems and computer networks, distributed systems design, consistency control, and fault tolerance.

A more detailed course description prepared for the CEE program is available, as is a course preview briefing containing more detailed information on requirements and expectations. The course outline is given below.

More course information may be available in WebCT (direct link).

Please add yourself to the course mailing list. Send mail to mailer@cs.purdue.edu containing the line:

add your email to cs603

Feel free to send things to the course mailing list if you feel it is appropriate. An example might be a pointer to a particularly helpful on-line manual describing an API used in one of the projects.

Course Methodology

The course will be taught through lectures, with class participation expected and encouraged. There will be frequent reading assignments to supplement the lectures.

For now, Professor Clifton will not have regular office hours. Feel free to drop by anytime, or send email with some suggested times to schedule an appointment. You can also try H.323/T.120 desktop videoconferencing (e.g., SunForum, Microsoft NetMeeting.) You can try opening an H.323 connection to blitz.cs.purdue.edu - send email if there is no response.

Prerequisites

The official requirement is CS 503 (Operating systems), with CS 542 (Distributed Database systems) recommended. The practical requirement is a solid undergraduate background in computer science including some database and operating systems theory, and substantial programming experience. If you don't have 503, but feel you have sufficient background, please send me an explanation of why you feel you are prepared, along with a number/times for me to call and discuss approving your registration.

Text

The following is recommended (it will be a useful reference for much of the lab work in the course):

Internetworking with TCP/IP Vol.III: Client-Server Programming and Applications,
D. E. Comer and D. Stevens,
Prentice Hall,
(choose appropriate version for your favorite platform),
0-13-032071-4

The following have been recommended in the past, and may provided useful background reading. However, none are required.

Distributed Systems, 1993
Sape Mullender
Prentice Hall
0-201-62427-3

Distributed Algorithms, 1997
Nancy Lynch
Morgan Kaufmann
1-55860-348-4

Distributed Operating Systems, 1995
Tanenbaum
Prentice Hall
0-13-219908-4

Evaluation/Grading:

Evaluation will be a subjective process, however it will be based primarily on your understanding of the material as evidenced in:

Exams will be open note / open book. To avoid a disparity between resources available to different students, electronic aids are not permitted. (If everyone has a notebook with wireless connection and all agree they want to use them in the exams, I could relax this.)

I will evaluate projects on a five point scale:

5
Exceptional work. So good that it makes up for substandard work elsewhere in the course. These will be rare.
4
What I'd expect of a Ph.D. candidate. This corresponds to an A grade.
3
Good enough for a Master's degree, but not what I'd like to see for a Ph.D. candidate. This corresponds to a B grade.
2
Okay for a Master's candidate who does extremely well in other courses. This corresponds to a C grade.
1
Not good enough for a graduate student. But something.
0
Missing work, or so bad that you needn't have bothered.

Projects

A substantial portion of your education in this course will come through performing programming projects: building components of a distributed system. Some examples of what projects might involve are:

My current expectation is that all projects will be done individually, as it is probable that some of the CEE students will not be collocated with other students in the course.

Policy on Intellectual Honesty

Please read the above link to the policy written by Professor Spafford. This will be followed unless I provide written documentation of exceptions.

Late work will be penalized except in case of documented emergency (e.g., medical emergency), or by prior arrangement if doing the work in advance is impossible due to fault of the instructor (e.g., you are going to a conference and ask to start the project early, but I don't have it ready yet.)

The penalty for late work is 1 point (of the possible 5) if turned in after the deadline, and one additional point for each week late.

Syllabus (numbers correspond to week):

Project start/due dates are tentative!

  1. Course overview, Components of a distributed system
  2. Communication Mechanisms
  3. Remote Method Invocation: Mechanisms First project starts January 23
  4. Naming
    First project design due January 30
  5. Clock Synchronization

    Other Reading:
    Leslie Lamport and P. M. Melliar-Smith, "Synchronizing clocks in the presence of faults" Journal of the ACM 32(1) (January 1985).
    Jennifer Lundelius and Nancy Lynch, "A new fault-tolerant algorithm for clock synchronization, Proceedings of the third annual ACM symposium on Principles of distributed computing 1984 , Vancouver, British Columbia, Canada.

    First project due February 11.
  6. Process Synchronization Second project starts February 15.
  7. Distributed Transactions Reading:
    Skeen, Dale, ``A Formal Model of Crash Recovery in a Distributed System,'' IEEE Transactions on Software Engineering 9(3), May 1983, pp.219-228. (preliminary on-line version from SIGMOD'81)
    Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman, Concurrency Control and Recovery in Database Systems, Chapter 7: Distributed Recovery, Addison Wesley, 1987.
  8. Distributed Data: Replication Second project due March 1.
  9. Mid-Semester Review
    March 8, in class: Midterm on material from weeks 1-7.
  10. Processes, code migration
    Third project starts March 20.
  11. Distributed Object systems: CORBA (OMG)
    Reading: CORBA Overview from The Common Object Request Broker: Architecture and Specification, OMG group, 2001.
    CORBA Security Service (reading).
    Third project due April 3, fourth project starts.
  12. Distributed Object Systems:
  13. Distributed Coordination: Jini. Further reading: Jan Newmarch's Guide to JINI Technologies.
  14. Fault Tolerance Fourth project due April 19.
  15. Review

Final exam Thursday, May 2, 2002 from 1:00pm to 3:00pm in RHPH 164.