Programming Assignment 4: 3 Phase Commit
Note: Questions are fully expected on this programming assignment. Do not hesitate to post and ask both the professor and TA. Some instruction is below, but it will doubtlessly be insufficient as a comprehensive guide. (complete, but may be updated)Background/Protocol
The aim of this programming assignment is to understand and implement the complete state machine of Three Phase Commit. It involves building a distributed system from a (somewhat) complex state diagram (design), and handle all failure scenarios that the protocol supports. A description of the protocol that is suitable for implementation can be found here. Your task is to implement this state machine in Mace.
For this project, you may use cloud11.cs.purdue.edu through cloud15.cs.purdue.edu. The necessary tools for building and running Mace and the model checker should all be available on those machines.
You should be able to build on your earlier Mace checkout. However, in general: you should use cmake to build Mace out-of-source, and within your scratch directory on these machines. (Mace, when compiled, is quite large, and so keeping the build on /scratch will keep your directory from causing you grief from the sysadmins)
You are welcome to do this project on any machine on which you can build Mace, which should be generally true of Linux and OSX systems. We will use a departmental gentoo build for testing them.
For this project, you only need to submit your final HW4Commit3PC.mac.
We will be implementing the CommitServiceClass which is created
for this assignment. You can assume a static Group Membership, which is
supplied as argument to the joinGroup downcall. (This part of the
code is already complete). I have removed code from my working implementation
for you to fill in. You would need to add state variables, define messages and
handle their delivery. The static group membership is already handled.
You are required to handle a single round of the protocol. Each member
(including the coordinator) initiates the protocol using
startCommit and expects the protocol to call back with upcalls
either Commit or Abort. All members should
either commit or all have to abort. The coordinator is picked as the node with
the least identifier.
The protocol you should implement is as follows:
- Phase 1: Coordinator (initiated by
startCommit) sends out CommitRequest to all cohort members requesting for their choice to commit or abort. Each cohort is in turn initiated bystartCommit, and sends back their response. - Phase 2: The coordinator waits for a commit response from all cohort members. If there was any Abort request, the transaction is aborted and the cohorts are intimated. If all replies were to commit, the coordinator proceeds to prepare each cohort for the commit. Each member acknowledges the receipt of this message.
- Phase 3: In the final phase, once the coordinator gets back all the ackowledgements, it sends out a message to commit, and itself commits. Each member proceeds to commit and replies with an acknowledgement.
More details on how to react to node failures and recovery from failures are handled based on the protocol. This is also provided in the web link provided.
Environment/Development
To get started on the homework, you should setup your environment as follows:
- In your existing directory, "CS505", under mace/services/.
- (Already completed for HW1): Edit mace/services/CMakeLists.txt, and list CS505 (or your directory name)
in the SERVICE_DIRS variable. To avoid problems, place it last. You can also
remove all other services EXCEPT interfaces, Transport, and SimApplication, as
these will not be needed for this assignment. This will make the build faster
and smaller. The diff from this modification should look like:
Index: CMakeLists.txt =================================================================== --- CMakeLists.txt (revision 1037) +++ CMakeLists.txt (working copy) @@ -1,4 +1,4 @@ -SET(SERVICE_DIRS interfaces Transport RequestTransport Http File RandTree ReplayTree RanSub Pastry Bamboo Chord GenericOverlayRoute GenericTreeMulticast DHT ScribeMS SplitStreamMS SimApplication) +SET(SERVICE_DIRS interfaces Transport SimApplication CS505) WRITE_FILE(${EXTERNAL_VARS_FILE} " # Service Includes and Links" APPEND) - Inside CS505, place all the .mac and .mh files from here. HW4Commit3PC.mac is the file you will be editing. I took my solution, and removed sections of the code, replacing them with an elipsis (...) in a comment. We will be using the simulated application, SimCommit3PCApp(.mac). You are invited to read it and become familiar with it, but you are not required to develop anything in it. You are also welcome to edit it to test different things than those tested, though you will not be submitting your modifications.
- Place CS505Tests.cc in mace/application/modelchecker/. This still contains the HW1, HW2 tests but I've added the HW4 test below it. You should make sure you place this back in the original modelchecker directory (and not the one you used for HW3).
- (Already done for HW1): Edit mace/application/modelchecker/CMakeLists.txt, and add CS505Tests.cc to
modelchecker_SRCS next to ServiceTests.cc (the first time modelchecker_SRCS
appears in the file). The diff from this modification would appear like:
Index: CMakeLists.txt =================================================================== --- CMakeLists.txt (revision 1037) +++ CMakeLists.txt (working copy) @@ -1,6 +1,6 @@ IF(${PROJECT_NAME} STREQUAL "Mace") SET(APPS modelchecker) -SET(modelchecker_SRCS ServiceTests.cc ${EXTERNAL_TEST_SRCS}) +SET(modelchecker_SRCS ServiceTests.cc CS505Tests.cc ${EXTERNAL_TEST_SRCS}) SET(LIBNAME MaceMC) ELSE(${PROJECT_NAME} STREQUAL "Mace") SET(APPS ${PROJECT_NAME}_modelchecker) - Edit mace/application/CMakeLists.txt to remove all applications other than the model checker. To do so, comment out all lines except for ADD_SUBDIRECTORY(modelchecker), by prefixing each line with a sharp (#).
- Run "make rebuild_cache" from the build directory. Since you've added a new .mac file, you need to tell the cmake build system to re-detect the files to build.
When editing Mace files (.mac), it is helpful in most editors to use the syntax highlighting for C++. If you are a Vim user, there is a vim syntax file you can use. If you wish to contribute syntax files for other editors, I am happy to share them with others.
Model checking / Debugging
To run and debug this project, we will be using the Mace modelchecker. To setup your environment for model checking:- First, compile Mace
- Then, create a symbolic link from the compiled modelchecker into mace/application/modelchecker directory in the source code. (The Mace scripts all assume the model checker resides there based on prior setup). You should find the modelchecker binary in the build/mace/application/modelchecker/ directory.
- Create a subdirectory under the model checker, for example "HW4". Place params.default into that directory. For
convenience, we have also made available params.default.failnodes for
testing failure of cohort members, and params.default.coordfail to test
with a failing coordinator. Note that you should name them params.default when
you are using them. All nodes request for a Commit by default. You can use the
HW4_ABORT_NODESparameter to let nodes selectively request to abort the transaction. - In params default, you can change the number of nodes to expermient with, as well as whether to allow node errors and tcp errors. Other parameters should probably remain the same.
- To run the model checker, change to the HW4 directory, then execute "../run-modelchecker.pl". It will start searching. If it finds an error, it will output a number of files. The run-modelchecker.pl script will automatically generate a complete logfile of the erroneous run, as well as a graph of the execution. It will name these error files error.log and error.graph. Error.graph contains control characters which color the output on most terminals. Rather than looking at error.log directly, we recommend using mdb (../mdb error.log), which will open an interactive logfile debugger, allowing you to look at the execution, one step at a time. Each step will end with an output of the system state (state of all nodes in the system).
- If the model checker is not finding bugs, it will output lines of code such as:
3.0.0.0:10201 [Sim::pathComplete] Path 94208 ( 88306 in phase) ended at simulator step: 107 random step: 144 cause: STOPPING_CONDITION (OK) Path<sdepth=10>SimRandUtil<prefixlen=0,position=144,searchlen=10,walklen=134> search sequence [ 0 0 0 2 0 3 3 2 0 3 ] 10.0.0.0:10201 [Sim::pathComplete] Path 94464 ( 88562 in phase) ended at simulator step: 189 random step: 254 cause: STOPPING_CONDITION (OK) Path<sdepth=10>SimRandUtil<prefixlen=0,position=254,searchlen=10,walklen=244> search sequence [ 0 0 0 2 0 3 3 11 1 3 ]
Each line represents an execution which completed. It contains the path number, which indicates how many paths have been completed so far. It also tells you a number of other statistics such as how long the path was in simulator steps and random number requests, the length of the search sequence, and the search sequence itself. These give you an understanding of the progress in the search.
Useful hints:
- Node 0 (IP: 1.0.0.0 in the modelchecker) gets picked as coordinator
- Debug first with 2 or 3 nodes instead of 10! Problems will be easier to understand with just 2 nodes.
- Verify your program completely for failure-free case, and then go ahead and implement the failure cases
- Test for cohort member failures and coordinator failures separately
- Use error.graph to see what's going on.
- You might consider adding other properties or assertions for things you want to verify or understand better.
- mdb assumes that the mace source directory is present in your home folder. It is recommended to create a symbolic to mace/ from the home folder.
- NodeSet is an extension of std::set and represents a set of MaceKey's.
- For generating random numbers, use RandUtil::randInt. Incidentally, NodeSet has a 'random' which does not take any arguments and returns a random element.
Submission and Grading:
Submit only the file HW4Commit3PC.mac, using BlackBoard. You can find the submission section in the "Assignments" section of BlackBoard for the class. Submissions will be graded according to this grading rubric (TXT).