x10.lang.clock {
public void resume();
public void doNext();
}
The benefit of split phase is that one can partition computation into parts
that need to be synchronized with others and parts that are only affect local
states.
The distributed implementation provides the exact same semantics, but with a slightly different timing of splitting phases, see Figure 4. First of all, the clock object is an X10 object, and thus has a home place. In a distributed setting, activities on different places and thus different VMs can synchronized on the same clock. Our implementation uses proxy/shadow clocks to extend the implementation to clusters. Every shadow clock has a remote pointer to the home node clock object. Shadow clocks and the corresponding home node clock forms conceptually the clock X10 programmers see. On a particular VM, there is only one either the true clock or shadow clock. All activities first use the same protocol to call resume/doNext. Only until the clock is locally quiescent, is an aggregate resume message sent to the home node clock. After clock is locally quiescent, the next doNext message is sent to home clock, and could either block or return, depending on the global state of the clock. After it returns, we know, globally, the clock can be moved to next phase. Thus, a phase splitting is tried. If this is the last doNext message on the shadow clock, clock phases are un-splitted. Home clock behaves as before, except it has to bookkeeping the number of shadow clocks and the aggregated resume/doNext messages.
|
Preliminary performance results show that on our platform (Xeon dual-core Linux clusters with gigabit interconnect), it takes an average 7 milliseconds to advance a clock phase among 4 VMs running on 4 Hosts.