CS 603: Advanced Topics in Distributed Systems
Class description
Writing programs which will run over a set of physically distributed processors is inherently hard, as one has to deal with latency and failures of both processors and communication -- basically the fact that it takes two to tango.
Most industrial-scale applications developed nowadays involve some form of remote interaction. However, these applications are still mostly built in ad-hoc style, i.e., as sets of logical components, which are then somehow glued together. This provides little reliability in the presence of latency and failures.
This class will focus on models, algorithms, languages, and systems for programming distributed applications.
Instructor
Teaching assistant
Schedule
Grading
25% Homework
25% Programming assignments
40% Final project
10% Participation in class
Academic honesty and ethical behavior are required in this course, as it is in all courses at Purdue University (here is the guide for academic integrity). The class will be conducted according to the policy written by Professor Eugene Spafford.
Subject to changes
| Week | Module | Monday | Wednesday | Programming | |
| Aug 21 - Aug 25 | 1 | Basic |
Intro - background on networks and services Reading: ?Chapters 1 and 2 from Reliable Distributed Systems ?Why do Internet Services Fail, and What Can be Done about It? D. Oppenheimer, A.Ganapathi and D. A. Patterson, 2003. Why Do Computers Stop and What Can be Done about It?Jim Gray, 1985. ? |
- distributed systems models Fundamentals of Fault-tolerant Distributed Computing in Asynchronous Environments. Felix C. Gartner, ACM Surveys 31(1):1-26, 1999. |
|
| Aug 28 - Sep 1 | 2 |
Events and clocks - causality - snapshots - lamport clocks - vector clocks Reading:
?Time, Clocks, and The Ordering
of Events in Distributed Systems. L.
Lamport, CACM 27(7): 558-565, 1978.
?Virtual Time and
Global States of Distributed Systems,
F. Mattern, WDPA’89.
|
Reliable broadcast - terminating Mullender Chap.5 / (A Modular Approach to) Fault-tolerant Broadcasts and Related Problems. V. Hadzilacos and S. Toueg, Distributed Systems, 97-145, 1993. |
||
| Sep 4 - Sep 8 | 3 | (Labor day) |
Fault-tolerant Broadcasts in the Crash Recovery Model
(talk by Dr. R. Boichat) Reading: Deconstructing Paxos. R. Boichat, P. Dutta, S. Frolund, R. Guerraoui, SIGACT News 41(1): 47-47, 2003 |
Chat due Friday midnight (with sockets) | |
| Sep 11 - Sep 15 | 4 | (Reliable broadcast cont'd) |
Programming support intro - libraries, languages, reflection - "middleware" - limitations - examples Reading:
?Concurrency, Distribution, and
Parallelism in Object-Oriented Programming.
J-P. Briot, R. Guerraoui, K.P. Löhr, ACM Computing
Surveys 30(3): 291-329, September 1999
|
||
| Sep 18 - Sep 22 | 5 |
Java RMI - remote interfaces - proxies - registries Reading: RMI spec http://java.sun.com/j2se/1.5.0/docs/guide/rmi/spec/rmiTOC.html |
AspectJ - AOP general - static, dynamic weaving - safety issues Reading:
An Overview of AspectJ.
G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten, J. Palm, and W.G.
Griswold. ECOOP 2001, 327-353
|
||
| Sep 25 - Sep 29 | 6 | Classic |
Failure detectors - accuracy, completeness - omega - comparisons Reading:
Unreliable Failure Detectors for Reliable
Distributed Systems. T.D. Chandra and S. Toueg, JACM, 43(2):
225-267, 1996
|
Consensus - impossibility - chandra&toueg - weakest failure detector - leader-based Reading:
?
Impossibility of Distributed Consensus
with One Faulty Process. M.
Fischer, N. Lynch, and M. Paterson. JACM 32(2): 217-246,
1985.
Solving Consensus using Chandra-Toueg’s Unreliable Failure Detectors: A General Quorum-based Approach, A. Mostfaoui and M. Raynal, DISC’99, 49-63, 1999.
?Leader-based
Consensus. A. Mostefaoui and M. Raynal.
IPL 11(1):95-107, 2001.
|
Assignment on channels and broadcast due Saturday midnight |
| Oct 2 - Oct 6 | 7 | (Consensus cont'd) |
Group communication and state machine replication - state machine replication - total order broadcast - virtual synchrony Reading:
Implementing Fault-Tolerant Services Using the
State Machine Approach: A Tutorial. F.B.
Schneider. ACM Surveys 22(2): 299-319, 1990
?Group Communication
Specifications: A Comprehensive Survey.
G. Chockler, I. Keidar, R. Vitenberg. CSUR 23(4): 427-469,
2001.
|
||
| Oct 9 - Oct 13 | 8 | (October break) | (Group comm. cont'd) | Revised chat due Friday midnight | |
| Oct 16 - Oct 20 | 9 | Byzantine failures (guest lecture by Prof. C. Nita-Rotaru) |
Transactions - non-blocking atomic commit - 2PC and 3PC Reading:
?A Formal Model of Crash
Recovery in a Distributed System. D. Skeen and M.
Stonebraker, IEEE TSE
9(3): 219-228, 1983.
?Non-blocking
Atomic Commit in Asynchronous Distributed Systems with Failure
Detectors. R. Guerraoui, DC 15(1):17-25, 2002.
|
||
| Oct 23 - Oct 27 | 10 | (absence) | (absence) | Assignment on consensus, atomic broadcast, and virtual synchrony due Friday midnight | |
| Oct 30 - Nov 3 | 11 | Advanced |
Probabilistic broadcast Bimodal Multicast. K.
Birman, M. Hayden, O.Ozkasap, Z. Xiao, M. Budiu, and Y. Minsky, ACM TOCS
17(2): 41-88, 1999. |
Probabilistic agreement - consensus - ordering Randomized Protocols for Asynchronous Consensus. J. Aspnes, DC 16(2-3): 165-175, 2003. From Binary Consensus to Multivalued Consensus in Asynchronous Message-passing Systems. A. Mostéfaoui, M. Raynal, F. Tronel, IPL 73(5-6): 207-212, 2000. |
|
| Nov 6 - Nov 10 | 12 |
Presentations Fault-tolerant Queries and Aggregation in Sensor Networks. Chris Global Snapshot Algorithms for Fault-tolerance in Virtual Machines. Ardalan |
Presentations Bounds on Distributed Failure Detection. Jayaram |
||
| Nov 13 - Nov 17 | 13 |
Presentations Event Correlation Languages for Distributed Programming. Nick Aspect-oriented Distributed Programming. Armand |
Presentations Token-based Probabilistic Non-blocking Atomic Commit. Dasarath Stability and Persistence in Probabilistic Broadcast. Patrick
|
||
| Nov 20 - Nov 24 | 14 |
- uniform reliable broadcast - terminating reliable broadcast - leader election |
(Thanksgiving) | ||
| Nov 27 - Dec 1 | 15 |
|
|||
| Dec 4 - Dec 8 | 16 | Final reports for individual projects due | |||