Programming Assignment 2: A Chord-Like Ring
Note: Questions are fully expected on this programming assignment. Do not hesitate to post and ask both the professor and TA. Some instruction is below, but it will doubtlessly be insufficient as a comprehensive guide. (complete, but may be updated)Background/Protocol
The purpose of PA2 is to understand some of the complexities of building a simple ring protocol such as Chord. You should gain a better appreciation for the challenges in distributed consistency, and at the same time increase your familiarity with Mace. As with PA1, we are not implementing all of Chord. Reading the Chord paper is important for understanding and background, but discrepancies in the protocols should be settled here, or with instructor questions. Generally, we are implementing the Chord Join protocol and stabilization protocol, and ignoring the finger tables completely.
For this project, you may use cloud11.cs.purdue.edu through cloud15.cs.purdue.edu. The necessary tools for building and running Mace and the model checker should all be available on those machines.
You should be able to build on your earlier Mace checkout. However, in general: you should use cmake to build Mace out-of-source, and within your scratch directory on these machines. (Mace, when compiled, is quite large, and so keeping the build on /scratch will keep your directory from causing you grief from the sysadmins)
You are welcome to do this project on any machine on which you can build Mace, which should be generally true of Linux and OSX systems. We will use a departmental gentoo build for testing them.
For this project, you only need to submit your final HW2Ring.mac. However, there will be a question which you should answer as a text comment in the file. (Standard C++ commenting syntax applies)
We will be implementing the OverlayRouterServiceClass and
OverlayServiceClass in Mace. The former provides a
getNextHop method, on which routing or DHTs can be implemented.
The latter provides a joinOverlay method, which provides the node
a bootstrap set. Though there are upcall notifications you should make for a
complete implementation, no upcalls are required for this assignment.
I have heavily commented the skeleton file, so you should be able to gain a lot of information there. As before, there are elipses (...) where I have removed code from my working implementation. However, in this case, you will be responsible for defining some whole messages and transitions.
The protocol you should implement is as follows:
- Join Protocol (messages already defined in skeleton): On joinOverlay, the node routes a Join message to its predecessor (or successor, see comments), which responds with a JoinReply. On receiving the JoinReply, stabilization is used immediately to notify the joining node's successor that it is done joining, to update it's predecessor. A join timer should execute periodically to make sure that the joining node keeps trying to join until it succeeds.
- Stabilization Protocol: Periodically, each node should execute an anti-entropy-like protocol in which it asks its successor for its predecessor, to verify that no nodes have been inserted between them. This could be done as a 2-message or 3-message protocol (when does the node's successor update it's predecessor if needs be).
- Socket failure handling: You should assume a socket failure implies the peer has died in your implementation. Part of the error handler has already been implemented. In a comment in your implementation, you should also answer the question: Does your protocol handle arbitrary socket failures which are caused by network errors (i.e. neither node failed, but the TCP socket did)? If not (and mine doesn't), why not? What circumstance causes this? If so, what extra steps did you take to ensure that it does?
- Successor Lists: Each node should maintain a list of it's next MAX_SUCCESSORS successors. This should include it's immediate successor. You should also implement a safety property to verify the successor list never grows too big.
- Single node errors: Your protocol should be robust to single failures of a single node (other than node 0). [In my implementation failing node 0 can cause failures similarly to what happens with arbitrary socket failures].
Environment/Development
To get started on the homework, you should setup your environment as follows:
- In your existing directory, "CS505", under mace/services/.
- (Already completed for HW1): Edit mace/services/CMakeLists.txt, and list CS505 (or your directory name)
in the SERVICE_DIRS variable. To avoid problems, place it last. You can also
remove all other services EXCEPT interfaces, Transport, and SimApplication, as
these will not be needed for this assignment. This will make the build faster
and smaller. The diff from this modification should look like:
Index: CMakeLists.txt =================================================================== --- CMakeLists.txt (revision 1037) +++ CMakeLists.txt (working copy) @@ -1,4 +1,4 @@ -SET(SERVICE_DIRS interfaces Transport RequestTransport Http File RandTree ReplayTree RanSub Pastry Bamboo Chord GenericOverlayRoute GenericTreeMulticast DHT ScribeMS SplitStreamMS SimApplication) +SET(SERVICE_DIRS interfaces Transport SimApplication CS505) WRITE_FILE(${EXTERNAL_VARS_FILE} " # Service Includes and Links" APPEND) - Inside CS505, place HW2Ring.mac. HW2Ring.mac is the file you will be editing. I took my solution, and removed sections of the code, replacing them with an elipsis (...) in a comment. We will be using a pre-existing simulated application, SimOverlayRouterApp. You do not need to do anything - it is already being compiled in SimApplication. You are invited to read it and become familiar with it, but you are not required to develop anything in it. You are also welcome to edit it to test different things than those tested, though you will not be submitting your modifications.
- Place CS505Tests.cc in mace/application/modelchecker/. This still contains the HW1 test, but I've added the HW2 test below it.
- (Already done for HW1): Edit mace/application/modelchecker/CMakeLists.txt, and add CS505Tests.cc to
modelchecker_SRCS next to ServiceTests.cc (the first time modelchecker_SRCS
appears in the file). The diff from this modification would appear like:
Index: CMakeLists.txt =================================================================== --- CMakeLists.txt (revision 1037) +++ CMakeLists.txt (working copy) @@ -1,6 +1,6 @@ IF(${PROJECT_NAME} STREQUAL "Mace") SET(APPS modelchecker) -SET(modelchecker_SRCS ServiceTests.cc ${EXTERNAL_TEST_SRCS}) +SET(modelchecker_SRCS ServiceTests.cc CS505Tests.cc ${EXTERNAL_TEST_SRCS}) SET(LIBNAME MaceMC) ELSE(${PROJECT_NAME} STREQUAL "Mace") SET(APPS ${PROJECT_NAME}_modelchecker) - Edit mace/application/CMakeLists.txt to remove all applications other than the model checker. To do so, comment out all lines except for ADD_SUBDIRECTORY(modelchecker), by prefixing each line with a sharp (#).
- Run "make rebuild_cache" from the build directory. Since you've added a new .mac file, you need to tell the cmake build system to re-detect the files to build.
When editing Mace files (.mac), it is helpful in most editors to use the syntax highlighting for C++. If you are a Vim user, there is a vim syntax file you can use. If you wish to contribute syntax files for other editors, I am happy to share them with others.
Model checking / Debugging
To run and debug this project, we will be using the Mace modelchecker. To setup your environment for model checking:- First, compile Mace
- Then, create a symbolic link from the compiled modelchecker into mace/application/modelchecker directory in the source code. (The Mace scripts all assume the model checker resides there based on prior setup). You should find the modelchecker binary in the build/mace/application/modelchecker/ directory.
- Create a subdirectory under the model checker, for example "HW2". Place params.default into that directory. For convenience, we have also made available params.default.neterror for testing with network errors, and params.default.failnodes". Note that you should name them params.default when you are using them.
- In params default, you can change the number of nodes to expermient with, as well as whether to allow node errors and tcp errors. Other parameters should probably remain the same.
- To run the model checker, change to the HW2 directory, then execute "../run-modelchecker.pl". It will start searching. If it finds an error, it will output a number of files. The run-modelchecker.pl script will automatically generate a complete logfile of the erroneous run, as well as a graph of the execution. It will name these error files error.log and error.graph. Error.graph contains control characters which color the output on most terminals. Rather than looking at error.log directly, we recommend using mdb (../mdb error.log), which will open an interactive logfile debugger, allowing you to look at the execution, one step at a time. Each step will end with an output of the system state (state of all nodes in the system).
- If the model checker is not finding bugs, it will output lines of code such as:
3.0.0.0:10201 [Sim::pathComplete] Path 94208 ( 88306 in phase) ended at simulator step: 107 random step: 144 cause: STOPPING_CONDITION (OK) Path<sdepth=10>SimRandUtil<prefixlen=0,position=144,searchlen=10,walklen=134> search sequence [ 0 0 0 2 0 3 3 2 0 3 ] 10.0.0.0:10201 [Sim::pathComplete] Path 94464 ( 88562 in phase) ended at simulator step: 189 random step: 254 cause: STOPPING_CONDITION (OK) Path<sdepth=10>SimRandUtil<prefixlen=0,position=254,searchlen=10,walklen=244> search sequence [ 0 0 0 2 0 3 3 11 1 3 ]
Each line represents an execution which completed. It contains the path number, which indicates how many paths have been completed so far. It also tells you a number of other statistics such as how long the path was in simulator steps and random number requests, the length of the search sequence, and the search sequence itself. These give you an understanding of the progress in the search.
Useful hints:
- Debug first with 2 nodes instead of 10! Problems will be easier to understand with just 2 nodes.
- Use error.graph to see what's going on.
- You might consider adding other properties or assertions for things you want to verify or understand better.
- mdb assumes that the mace source directory is present in your home folder. It is recommended to create a symbolic to mace/ from the home folder.
- The ServiceClass and Handlers for Gossip can be found at mace/services/interfaces/[Name ServiceClass/Handler].mh.
- NodeSet is an extension of std::set and represents a set of MaceKey's.
- For generating random numbers, use RandUtil::randInt. Incidentally, NodeSet has a 'random' which does not take any arguments and returns a random element.
Submission and Grading:
Submit only the file HW2Ring.mac, using BlackBoard. You can find the submission section in the "Assignments" section of BlackBoard for the class. Submissions will be graded according to this grading rubric (TXT).