Home
Abstracts of Selected Papers
2013
2012
2011
2010
2009
2008
2007
2006
2013

  1. 1. Plan B: A Buffered Memory Model for Java

  2. Recent advances in verification have made it possible to envision trusted implementations of real-world languages. Java with its type-safety and fully specified semantics would appear to be an ideal candidate; yet, the complexity of the translation steps used in production virtual machines have made it a challenging target for verifying compiler technology. One of Java's key innovations, its memory model, poses significant obstacles to such an endeavor. The Java Memory Model is an ambitious attempt at specifying the behavior of multithreaded programs in a portable, hardware agnostic, way. While experts have an intuitive grasp of the properties that the model should enjoy, the specification is complex and not well-suited for integration within a verifying compiler infrastructure. Moreover, the specification is given in an axiomatic style that is distant from the intuitive reordering-based reasonings traditionally used to justify or rule out behaviors, and ill suited to the kind of operational reasoning one would expect to employ in a compiler. This paper takes a step back, and introduces a Buffered Memory Model (BMM) for Java. We choose a pragmatic point in the design space sacrificing generality in favor of a model that is fully characterized in terms of the reorderings it allows, amenable to formal reasoning, and which can be efficiently applied to a specific hardware family, namely x86 multiprocessors. Although the BMM restricts the reorderings compilers are allowed to perform, it serves as the key enabling device to achieving a verification pathway from bytecode to machine instructions. Despite its restrictions, we show that it is backwards compatible with the Java Memory Model and that it does not cripple performance.

  3. 2. Compositional and Lightweight Dependent Type Inference for ML

  4. We consider the problem of inferring expressive safety properties of higher-order functional programs using first-order decision procedures. Our approach encodes higher-order features into first-order logic formula whose solution can be derived using a lightweight counterexample guided refinement loop. To do so, we extract initial verification conditions from dependent typing rules derived by a syntactic scan of the program. Subsequent type-checking and type-refinement phases infer and propagate specifications of higher order functions, which are treated as uninterpreted first-order constructs, via subtyping chains. Our technique provides several benefits not found in existing systems: (1) it enables compositional verification and inference of useful safety properties for functional programs; (2) additionally provides counterexamples that serve as witnesses of unsound assertions: (3) does not entail a complex translation or encoding of the original source program into a first-order representation; and, (4) most importantly, profitably employs the large body of existing work on verification of first-order imperative programs to enable efficient analysis of higher-order ones. We have implemented the technique as part of the MLton SML compiler toolchain, where it has shown to be effective in discovering useful invariants with low annotation burden.

2012

  1. 1. Eliminating Read Barriers through Procrastination and Cleanliness

  2. Managed languages use read barriers to interpret forwarding point- ers introduced to keep track of copied objects. For example, in a split-heap managed runtime for a multicore environment, an object initially allocated on a local heap may be copied to a shared heap if it becomes the source of a store operation whose target location resides on the shared heap. As part of the copy operation, a forwarding pointer may be established to allow existing references to the local object to reference the copied version. In this paper, we consider the design of a managed runtime that avoids the need for read barriers. Our design is premised on the availability of a sufficient degree of concurrency to stall operations that would otherwise necessitate the copy. Stalled actions are deferred until the next local collection, avoiding exposing forwarding pointers to the mutator. In certain important cases, procrastination is unnecessary - lightweight runtime techniques can sometimes be used to allow objects to be eagerly copied when their set of incom- ing references is known, or when it can be determined that having multiple copies would not violate program semantics. Experimental results over a range of parallel benchmarks on a number of different architectural platforms including an 864 core Azul Vega 3, and a 48 core Intel SCC, indicate that our approach leads to notable performance gains (20 - 32% on average) without incurring any additional complexity.

  3. 2. Resource-Sensitive Synchronization Inference Using Abduction

  4. We present an analysis which takes as its input a sequential program, augmented with annotations indicating potential parallelization opportunities, and a sequential proof, written in separation logic, and produces a correctly-synchronized parallelized program and proof of that program. Unlike previous work, ours is not an independence analysis; we insert synchronization constructs to preserve relevant dependencies found in the sequential program that may otherwise be violated by a naive translation. Separation logic allows us to parallelize fine-grained patterns of resource usage, moving beyond straightforward points-to analysis. Our analysis works by using the sequential proof to discover dependencies between different parts of the program. It leverages these discovered dependencies to guide the insertion of synchronization primitives into the parallelized program, and to ensure that the resulting parallelized program satisfies the same specification as the original sequential program, and exhibits the same sequential behaviour. Our analysis is built using frame inference and abduc- tion, two techniques supported by an increasing number of separation logic tools.

2011

  1. 1. Accentuating the Positive: Atomicity Inference and Enforcement Using Correct Executions

  2. Concurrency bugs are often due to inadequate synchronization that fail to prevent specific (undesirable) thread interleavings. Such errors, often referred to as Heisenbugs, are difficult to detect, prevent, and repair. In this paper, we present a new technique to increase program robustness against Heisenbugs. We profile correct executions from provided test suites to infer fine-grained atomicity properties. Additional deadlock-free locking is injected into the program to guarantee these properties hold on production runs. Notably, our technique does not rely on witnessing or analyzing erroneous executions. The end result is a scheme that only permits executions which are guaranteed to preserve the atomicity properties derived from the profile. Evaluation results on large, real- world, open-source programs show that our technique can effectively suppress subtle concurrency bugs, with small runtime overheads (typically less than 15%).

  3. 2. Isolating Determinism in Multi-Threaded Programs

  4. Futures are a program abstraction thatexpress a simple form of fork-join parallelism. The expression future (e) declares that e can be evaluated concurrently with the future's continuation. Safe-futures provide additional deterministic guarantees, ensuring that all data dependencies found in the original (non-future annotated) version are respected. In this paper, we present a dynamic analysis for enforcing determinism of safe-futures in an ML-like language with dynamic thread creation and first-class references. Our analysis tracks the interaction between futures (and their continuations) with other explicitly defined threads of control, and enforces an isolation property that prevents the effects of a continuation from being witnessed by its future, indirectly through their interactions with other threads. Our analysis is defined via a lightweight capability-based dependence tracking mechanism that serves as a compact representation of an effect history. Implementation results support our premise that futures and threads can extract additional parallelism compared to traditional approaches for safe-futures.

  5. 3. Composable Asynchronous Events

  6. Although asynchronous communication is an important feature of many concurrent systems, building composable abstractions that leverage asynchrony is challenging. This is because an asynchronous operation necessarily involves two distinct threads of control -- the thread that initiates the operation, and the thread that discharges it. Existing attempts to marry composability with asynchrony either entail sacrificing performance (by limiting the degree of asynchrony permitted), or modularity (by forcing natural abstraction boundaries to be broken). In this paper, we present the design and rationale for asynchronous events, an abstraction that enables composable construction of complex asynchronous protocols without sacrificing the benefits of abstraction or performance. Asynchronous events are realized in the context of Concurrent ML's first-class event abstraction. We discuss the definition of a number of useful asynchronous abstractions that can be built on top of asynchronous events (e.g., composable callbacks) and provide a detailed case study of how asynchronous events can be used to substantially improve the modularity and performance of an I/O-intensive highly concurrent server application.

  7. 4. Relaxed Memory Concurrency and Verified Compilation

  8. In this paper, we consider the semantic design and verified compilation of a C-like programming language for concurrent shared-memory computation above x86 multiprocessors. The design of such a language is made surprisingly subtle by several factors: the relaxed-memory behaviour of the hardware, the effects of compiler optimisation on concurrent code, the need to support high-performance concurrent algorithms, and the desire for a reasonably simple programming model. In turn, this complexity makes verified (or verifying) compilation both essential and challenging. We define a concurrent relaxed-memory semantics for ClightTSO, an extension of CompCert's Clight in which the processor's memory model is exposed for high-performance code. We discuss a strategy for verifying compilation from ClightTSO to x86, which we validate with correctness proofs (building on CompCert) for the most interesting compiler phases.

  9. 5. Modular Reasoning for Deterministic Parallelism

  10. Weaving a concurrency control protocol into a program is difficult and error-prone. One way to alleviate this burden is deterministic parallelism. In this well-studied approach to parallelisation, a sequential program is annotated with sections that can execute concurrently, with automatically injected control constructs used to ensure observable behaviour consistent with the original program. This paper examines the formal specification and verification of these constructs. Our high-level specification defines the conditions necessary for correct execution; these conditions reflect program dependencies necessary to ensure deterministic behaviour. We connect the high-level specification used by clients of the library with the low-level library implementation, to prove that a client's requirements for determinism are enforced. Significantly, we can reason about program and library correctness without breaking abstraction boundaries. To achieve this, we use concurrent abstract predicates, based on separation logic, to encapsulate racy behaviour in the library's implementation. To allow generic specifications of libraries that can be instantiated by client programs, we extend the logic with higher-order parameters and quantification. We show that our high-level specification abstracts the details of deterministic parallelism by verifying two different low-level implementations of the library.

    2010

    1. 1. Analyzing Concurrency Bugs Using Dual Slicing

    2. Recently, there has been much interest in developing analyzes to detect concurrency bugs that arise because of data races, atomicity violations, execution omission, etc. However, determining whether reported bugs are in fact real, and understanding how these bugs lead to incorrect behavior, remains a labor-intensive process. This paper proposes a novel dynamic analysis that automatically produces the causal path of a concurrent failure leading from the root cause to the failure. Given two schedules, one inducing the failure and the other not, our technique collects traces of the two executions, and compares them to identify salient differences. The causal relation between the differences is disclosed by leveraging a novel slicing algorithm called dual slicing that slices both executions alternatively and iteratively, producing a slice containing trace differences from both runs. Our experiments show that dual slices tend to be very small, often an order of magnitude or more smaller than the corresponding dynamic slices; more importantly, they enable precise analysis of real concurrency bugs for large programs, with reasonable overhead.

    3. 2. Analyzing Multicore Dumps to Facilitate Concurrency Bug Reproduction

    4. Debugging concurrent programs is difficult. This is primarily because the inherent non-determinism that arises because of scheduler interleavings makes it hard to easily reproduce bugs that may manifest only under certain interleavings. The problem is exacer- bated in multi-core environments where there are multiple schedulers, one for each core. In this paper, we propose a reproduction technique for concurrent programs that execute on multi-core plat- forms. Our technique performs a lightweight analysis of a failing execution that occurs in a multi-core environment, and uses the result of the analysis to enable reproduction of the bug in a single core system, under the control of a deterministic scheduler. More specifically, our approach automatically identifies the execution point in the re-execution that corresponds to the failure point. It does so by analyzing the failure core dump and leveraging a technique called execution indexing that identifies a related point in the re-execution. By generating a core dump at this point, and compar- ing the differences betwen the two dumps, we are able to guide a search algorithm to efficiently generate a failure inducing schedule. Our experiments show that our technique is highly effective and has reasonable overhead.

    5. 3. Analyzing Lightweight Checkpointing for Concurrent ML

    Transient faults that arise in large-scale software systems can often be repaired by re-executing the code in which they occur. Ascribing a meaningful semantics for safe re-execution in multithreaded code is not obvious, however. For a thread to re-execute correctly a region of code, it must ensure that all other threads that have witnessed its unwanted effects within that region are also reverted to a meaningful earlier state. If not done properly, data inconsistencies and other undesirable behavior may result. However, automatically determining what constitutes a consistent global checkpoint is not straightforward since thread interactions are a dynamic property of the program. In this paper, we present a safe and efficient checkpointing mechanism for Concurrent ML (CML) that can be used to recover from transient faults. We introduce a new linguistic abstraction called stabilizers that permits the specification of per-thread monitors and the restoration of globally consistent checkpoints. Safe global states are computed through lightweight monitoring of communication events among threads (e.g. message-passing operations or updates to shared variables). We present a formal characterization of its design, and provide a detailed description of its implementation within MLton, a whole-program optimizing compiler for Standard ML. Our experimental results on microbenchmarks as well as several realistic, multithreaded, server-style CML applications, including a web server and a windowing toolkit, show that the overheads to use stabilizers are small, and lead us to conclude that they are a viable mechanism for defining safe checkpoints in concurrent functional programs.

    2009

    1. 1. Partial Memoization of Concurrency and Communication

    Memoization is a well-known optimization technique used to eliminate redundant calls for pure functions. If a call to a function f with argument v yields result r, a subsequent call to f with v can be immediately reduced to r without the need to re-evaluate f's body.

    Understanding memoization in the presence of concurrency and communication is significantly more challenging. For example, if f communicates with other threads, it is not sufficient to simply record its input/output behavior; we must also track inter-thread dependencies induced by these communication actions. Subsequent calls to f can be elided only if we can identify an interleaving of actions from these call-sites that lead to states in which these dependencies are satisfied. Similar issues arise if f spawns additional threads.

    In this paper, we consider the memoization problem for a higher-order concurrent language whose threads may communicate through synchronous message-based communication. To avoid the need to perform unbounded state space search that may be necessary to determine if all communication dependencies manifest in an earlier call can be satisfied in a later one, we introduce a weaker notion of memoization called partial memoization that gives implementations the freedom to avoid performing some part, if not all, of a previously memoized call.

    To validate the effectiveness of our ideas, we consider the benefits of memoization for reducing the overhead of recomputation for streaming, server-based, and transactional applications executed on a multi-core machine. We show that on a variety of workloads, memoization can lead to substantial performance improvements without incurring high memory costs.

    1. 2. Exceptionally Safe Futures

    A future is a well-known programming construct used to introduce concurrency to sequential programs. Computations annotated as futures are executed asynchronously and run concurrently with their continuations. Typically, futures are not transparent annotations: a program with futures need not produce the same result as the sequential program from which it was derived. Safe futures guarantee a future-annotated program produce the same result as its sequential counterpart. Ensuring safety is especially challenging in the presence of constructs such as exceptions that permit the expression of non-local control-flow. For example, a future may raise an exception whose handler is in its continuation. To ensure safety, we must guarantee the continuation does not discard this handler regardless of the continuation's own internal control-flow (e.g. exceptions it raises or futures it spawns). In this paper, we present a formulation of safe futures for a higher-order functional language with first-class exceptions. Safety can be guaranteed dynamically by stalling the execution of a continuation that has an exception handler potentially required by its future until the future completes. To enable greater concurrency, we develop a static analysis and instrumentation and formalize the runtime behavior for instrumented programs that allows execution to discard handlers precisely when it is safe to do so.

    1. 3. Semantics-Aware Trace Analysis

    As computer systems continue to become more powerful and complex, so do programs. High-level abstractions introduced to deal with complexity in large programs, while simplifying human reasoning, can often obfuscate salient program properties gleaned from automated source-level analysis through subtle (often non-local) interactions. Consequently, understanding the effects of program changes and whether these changes violate intended protocols become difficult to infer. Refactorings, and feature additions, modifications, or removals can introduce hard-to-catch bugs that often go undetected until many revisions later.

    To address these issues, this paper presents a novel dynamic program analysis that builds a semantic view of program executions. These views reflect program abstractions; and aspects; however, views are not simply projections of execution traces, but are linked to each other to capture semantic interactions among abstractions at different levels of granularity in a scalable manner. We describe our approach in the context of Java and demonstrate its utility to improve regression analysis. We first formalize a subset of Java and a grammar for traces generated at program execution. We then introduce several types of views used to analyze regression bugs along with a novel, scalable technique for semantics differencing of traces from different versions of the same program. Bechmark results on large open-source Java programs demonstrate that semantic-aware trace differencing can identify precise and useful details about the underlying cause for a regression even in programs that use reflection, multithreading, or dynamic code generation, features that typically confound other techniques.

    4. Alchemist: A Transparent Dependence Distance Profiling Infrastructure


Effectively migrating sequential applications to take advantage of parallelism available on multicore platforms is a well-recognized challenge. This paper addresses important aspects of this issue by proposing a novel profiling technique to automatically detect available concurrency in C programs. The profiler, called Alchemist, operates completely transparently to applications, and identifies constructs at various levels of granularity (e.g., loops, procedures, and conditional statements) as candidates for asynchronous execution. Various dependences including read-after-write (RAW), write-after-read (WAR), and write-after-write (WAW), are detected between a construct and its continuation, the execution following the completion of the construct. The time-ordered {\em distance} between program points forming a dependence gives a measure of the effectiveness of parallelizing that construct, as well as identifying the transformations necessary to facilitate such parallelization. Using the notion of post-dominance, our profiling algorithm builds an execution index tree at run-time. This tree is used to differentiate among multiple instances of the same static construct, and leads to improved accuracy in the computed profile, useful to better identify constructs that are amenable to parallelization. Performance results indicate that the profiles generated by Alchemist pinpoint strong candidates for parallelization, and can help significantly ease the burden of application migration to multicore environments.

5. Speculative N-Way Barriers

Speculative execution is an important technique that has historically been used to extract concurrency from sequential programs. While techniques to support speculation work well when computations perform relatively simple actions (e.g., reads and writes to known locations), understanding speculation for multi-threaded programs in which threads communicate through shared references is significantly more challenging, and is the focus of this paper.

We use as our reference point a simple higher-order concurrent language extended with an n-way barrier and a fork/join execution model. Our technique permits the expression guarded by the barrier to speculatively proceed before the barrier has been satisfied (i.e., before all threads that synchronize on that barrier have done so) and to have participating threads that would normally block on the barrier to speculatively proceed as well. Our solution formulates safety properties under which speculation is correct in a fork/join model, and uses traces to validate these properties modularly on a per-thread and per-synchronization basis.

2008

1. Flattening Tuples in an Intermediate SSA Representation

For functional programs, unboxing aggregate data structures such as tuples removes memory indirections and frees dead components of the decoupled structures. To explore the consequences of such optimizations in a whole-program compiler, this paper presents a tuple flattening transformation and a framework that allows the formal study and comparison of different flattening schemes.

We present our transformation over functional SSA, a simply-typed, monomorphic language and show that the transformation is type-safe. The flattening algorithm defined by our transformation has been incorporated into MLton, a whole-program, optimizing compiler for SML. Experimental results indicate that aggressive tuple flattening can lead to substantial improvements in runtime performance, a reduction in code size, and a decrease in total allocation without a significant increase in compilation time.

  1. 2. A Uniform Transactional Execution Environment for Java


Transactional memory (TM) has recently emerged as an effective tool for extracting fine-grain parallelism from declarative critical sections. In order to make STM systems practical, significant effort has been made to integrate transactions into existing programming languages. Unfortunately, existing approaches fail to provide a simple implementation that permits lock-based and transaction-based abstractions to coexist seamlessly. Because of the fundamental semantic differences between locks and transactions, legacy applications or libraries written using locks can not be transparently used within atomic regions. To address these shortcomings, we implement a uniform transactional execution environment for Java programs in which transactions can be integrated with more traditional concurrency control constructs. Programmers can run arbitrary programs that utilize traditionalmutual-exclusion-based programming techniques, execute new programs written with explicit transactional constructs, and freely combine abstractions that use both coding styles.

  1. 3. Protocol Inference Using Static Path Profiles


Specifcation inference tools typically mine commonalities among states at relevant program points. For example, to infer the invariants that must hold at all calls to a procedure p requires examining the state abstractions found at all call-sites to p. Unfortunately, existing approaches to building these abstractions require being able to explore all paths (either static or dynamic) to all of p’s call-sites to derive specifications with any measure of confidence. Because programs that have complex control-flow structure may induce a large number of paths, naive path exploration is impractical.

In this paper, we propose a new specification inference technique that allows us to efficiently explore statically all paths to a program point. Our approach builds static path profiles, profile information constructed by a static analysis that accumulates predicates valid along different paths to a program point. To make our technique tractable, we employ a summarization scheme to merge predicates at join points based on the frequency with which they occur on different paths. For example, predicates present on a majority of static paths to all call-sites of any procedure p forms the pre-condition of p.

We have implemented a tool, Marga, based on static path profiling. Qualitative analysis of the specifications inferred by marga indicates that it is more accurate than existing static mining techniques, can be used to derive useful specification even for APIs that occur infrequently (statically) in the program, and is robust against imprecision that may arise from examination of infeasible or infrequently occurring dynamic paths. A comparison of the specifications generated using marga with a dynamic specification inference engine based on Cute, an automatic unit test generation tool, indicates that Marga generates comparably precise specifications with smaller cost.

4. Quasi-Static Scheduling for Safe Futures.


Migrating sequential programs to effectively utilize next generation multicore architectures is a key challenge facing application developers and implementors. Languages like Java that support complex control- and dataflow abstractions confound classical automatic parallelization techniques. On the other hand, introducing multithreading and concurrency control explicitly into programs can impose a high conceptual burden on the programmer, and may entail a significant rewrite of the original program.

In this paper, we consider a new technique to address this issue. Our approach makes use of futures, a simple annotation that introduces asynchronous concurrency into Java programs, but provides no concurrency control. To ensure concurrent execution does not yield behavior inconsistent with sequential execution (i.e., execution yielded by erasing all futures), we present a new interprocedural summary-based dataflow analysis. The analysis inserts lightweight barriers that block and resume threads executing futures if a dependency violation may ensue. There are no constraints on how threads execute other than those imposed by these barriers.

Our experimental results indicate futures can be leveraged to transparently ensure safety and profitably exploit parallelism; in contrast to earlier efforts, our technique is completely portable, and requires no modifications to the underlying JVM.

2007


1.Static Specification Inference Using Predicate Mining.


The reliability and correctness of complex software systems can be significantly enhanced through well-defined specifications that dictate the use of various units of abstraction (e.g., modules, or procedures). Oftentimes, however, specifications are either missing, imprecise, or simply too complex to encode within a signature, necessitating specification inference. The process of inferring specifications from complex software systems forms the focus of this paper. We describe a static inference mechanism for identifying the preconditions that must hold whenever a procedure is called. These preconditions may reflect both dataflow properties (e.g., whenever p is called, variable x must be non-null) as well as control-flow properties (e.g., every call to p must be preceded by a call to q). We derive these preconditions using an inter-procedural path-sensitive dataflow analysis that gathers predicates at each program point. We apply mining techniques to these predicates to make specification inference robust to errors. This technique also allows us to derive higher-level specifications that abstract structural similarities among predicates (e.g., procedure p is called immediately after a conditional test that checks whether some variable v is non-null.)

We describe an implementation of these techniques, and validate the effectiveness of the approach on a number of large open-source benchmarks. Experimental results confirm that our mining algorithms are efficient, and that the specifications derived are both precise and useful -- the implementation discovers several critical, yet previously, undocumented preconditions for well-tested libraries.

    2.Path-Sensitive Inference of Function Precedence Protocols.


Function precedence protocols define ordering relations among function calls in a program, and constitute an important part of a program's specification. In some instances, precedence protocols are well-understood (e.g., for example, a call to pthread_mutex_init must always be present on all program paths before a call to pthread_mutex_lock). Oftentimes, however, these protocols are neither well-documented, nor easily derived. As a result, protocol violations can lead to subtle errors that are difficult to identify and correct.

In this paper, we present Chronicler, a tool that applies scalable inter-procedural path-sensitive static analysis to automatically infer accurate function precedence protocols. Chronicler computes precedence relations based on a program's control-flow structure, integrates these relations into a repository, and analyzes them using sequence mining techniques to generate a collection of feasible precedence protocols. Deviations from these protocols found in the program are tagged as violations, and represent potential sources of bugs.

We demonstrate Chronicler's effectiveness by deriving protocols for a collection of benchmarks ranging in size from 66K to 2M lines of code. Our results not only confirm the existence of bugs in these programs due to precedence protocol violations, but also highlight the importance of path sensitivity on accuracy and scalability.

    3. Randomized Protocols for Duplicate Elimination in Peer-to-Peer Storage Systems.

    Distributed peer-to-peer systems rely on voluntary participation of peers to effectively manage a storage pool. In such systems, data is generally replicated for performance and availability. If the storage associated with replication is not monitored and provisioned, the underlying benefits may not be realized. Resource constraints, performance scalability, and availability present diverse considerations. Availability and performance scalability, in terms of response time, are improved by aggressive replication, whereas resource constraints limit total storage in the network. Identification and elimination of redundant data pose fundamental problems for such systems. In this paper, we present a novel and efficient solution that addresses availability and scalability with respect to management of redundant data. Specifically, we address the problem of duplicate elimination in the context of systems connected over an unstructured peer-to-peer network in which there is no a priori binding between an object and its location. We propose two randomized protocols to solve this problem in a scalable and decentralized fashion that does not compromise the availability requirements of the applicaiton. Performance results using both large-scale simulations and a prototype built on PlanetLab demonstrate that our protocols provide high probabilistic guarantees while incurring minimal administrative overheads.

      4.Randomized Leader Election.


    We present an efficient randomized algorithm for leader election in large-scale distributed systems that works correctly with high probability. Our algorithm is optimial in message complexity (O(n)) for a set of n total nodes) and has round complexity logarithmic in the number of nodes in the system. The algorithm relies on a balls-and-bins abstraction and works in two phases. The main novelty of the work is in the first phase, where the number of contending processes is reduced in a controlled manner. Probabilistic quorums are used to determine a winner in the second phase. We discuss, in detail, the synchronous version of the algorithm, provide extensions to an asynchronous version, and examine the impact of failures.

      5. Macroprogramming Heterogeneous Sensor Networks Using COSMOS.


    In this paper, we present COSMOS, a novel architecture for macroprogramming heterogeneous sensor network systems. Macroprogramming entails aggregate system behavior specification, as opposed to device-specific applications that indirectly express distributed behavior through explicit messaging between nodes. COSMOS is comprised of a macroprogramming language, mPL, and an operating system, mOS. mPL macroprograms specify distributed system behavior using statically verifiable compositions of reusable user-provided, or system supported functional components. mOS provides component management and a lean execution environment for mPL in heterogeneous resource-constrained sensor networks. COSMOS facilitates composition of complex real-world applications that are robust, scalable, and adaptive in dynamic data-driven sensor network environments. The mOS architecture allows runtime application instantiation, with over-the-air reprogramming of the network. An important and novel aspect of COSMOS is the ability to easily extend its component basis library to add rich macroprogramming abstractions to mPL, tailored to domain and resource constraints, with modification to the OS. A fully functional version of COSMOS is currently in use at the Bowen Labs for Structural Engineering and Purdue University, for high-fidelity structural dynamic measurements. We present comprehensive experimental evaluation using macro- and micro- benchmarks to demonstrate performance characteristics of COSMOS.

    2006


    1. Improving Duplicate Elimination in Storage Systems.

    Minimizing the amount of data that must be stored and managed is a key goal for any storage architecture that purports to be scalable. One way to achieve this goal is to avoid maintaining duplicate copies of the same data. Eliminating redundant data at the source by not writing data which has already been stored, not only reduces storage overheads, but can also improve bandwidth utilization. For these reasons, in the face of today's exponentially growing data volumes, redundant data elimination techniques have assumed critical significance in the design of modern storage systems.

    Intelligent object partitioning techniques identify data that are new when objects are updated, and transfer only those chunks to a storage server. In this paper, we propose a new object partitioning technique, called fingerdiff, that improves upon existing schemes in several important respects. Most notably fingerdiff dynamically chooses a partitioning strategy for a data object based on its similarities with previously stored objects in order to improve storage and bandwidth utilization. We present a detailed evaluation of fingerdiff, and other existing object partitioning schemes, using a set of real-world workloads. We show that for these workloads, the duplicate elimination strategies employed by fingerdiff improve storage utilization on average by 25\%, and bandwidth utilization on average by 40% over comparable techniques.

      2. Stabilizers: A Modular Checkpointing Abstraction for Concurrent Functional Programs


    Transient faults that arise in large-scale software systems can often be repaired by re-executing the code in which they occur. Ascribing a meaningful semantics for safe re-execution in multi-threaded code is not obvious, however. For a thread to correctly re-execute a region of code, it must ensure that all other threads which have witnessed its unwanted effects within that region are also reverted to a meaningful earlier state. If not done properly, data inconsistencies and other undesirable behavior may result. However, automatically determining what constitutes a consistent global checkpoint is not straightforward since thread interactions are a dynamic property of the program.

    In this paper, we present a safe and efficient checkpointing mechanism for Concurrent ML (CML) that can be used to recover from transient faults. We introduce a new linguistic abstraction called stabilizers that permits the specification of per-thread monitors and the restoration of globally consistent checkpoints. Safe global states are computed through lightweight monitoring of communication events among threads (e.g. message-passing operations or updates to shared variables).

    Our experimental results on several realistic, multithreaded, server-style CML applications, including a web server and a windowing toolkit, show that the overheads to use stabilizers are small, and lead us to conclude that they are a viable mechanism for defining safe checkpoints in concurrent functional programs.

      3.Sieve: A Tool for Automatically Detecting Variations Across Program Versions


    Software systems often undergo many revisions during their lifetime because new features are added, bugs repaired, abstractions simplified and refactored, and performance improved. When a revision, even a minor one, does occur, the changes it induces must be tested to ensure that assumed invariants in the original are not violated unintentionally. In order to avoid testing components that are unchanged across revisions, impact analysis is often used to identify those code blocks or functions that are affected by a change. In this paper, we present a new solution to this general problem that uses dynamic programming on instrumented traces of different program binaries to identify longest common subsequences in the strings generated by these traces. Our formulation not only allows us to perform impact analysis, but can also be used to detect the smallest set of locations within these functions where the effect of the changes actually manifest. Sieve is a tool that incorporates these ideas. Sieve is unobtrusive, requiring no programmer or compiler involvement to guide its behavior. Our experiments on multiple versions of open-source C programs shows that Sieve is an effective and scalable tool to identify impact sets and can locate the regions in the affected functions where the changes manifest. These results lead us to conclude that Sieve can play a beneficial role in program testing and software maintenance.

      4.Transparently Reconciling Transactions with Locking for Java Synchronization


    Concurrent data accesses in high-level languages like Java and C\# are typically mediated using mutual-exclusion locks. Threads use locks to guard the operations performed while the lock is held, so that the lock's guarded operations can never be interleaved with operations of other threads that are guarded by the same lock. This way both atomicity and isolation properties of a thread's guarded operations are enforced. Recent proposals recognize that these properties can also be enforced by concurrency control protocols that avoid well-known problems associated with locking, by transplanting notions of transactions found in database systems to a programming language context. While higher-level than locks, software transactions incur significant implementation overhead. This overhead cannot be easily masked when there is little contention on the operations being guarded. We describe how mutual-exclusion locks and transactions can be reconciled transparently within Java's monitor abstraction. We have implemented monitors for Java that execute using locks when contention is low and switch over to transactions when concurrent attempts to enter the monitor are detected. We formally argue the correctness of our solution with respect to Java's execution semantics and provide a detailed performance evaluation for different workloads and varying levels of contention. We demonstrate that our implementation has low overheads in the uncontended case (7% on average) and that significant performance improvements (up to 3X) can be achieved from running contended monitors transactionally.

      5.Trace-based Memory Aliasing Across Program Versions


    One of the major costs of software development is associated with testing and validation of successive versions of software systems. Memory aliasing is an important problem that occurs in many applications towards testing and validating multiple versions, viz., impact analysis, correlating variables across versions to ensure that existing invariants are preserved in the newer version and matching program execution histories. For example, impact analysis is often used to identify code blocks or functions that are affected by a change. Recent work in this area has focused on trace-based techniques, to better isolate affected regions. A variation of this general approach is to also consider operations on memory to generate more refined impact sets. However, the utility of such approach depends on effectively recognizing aliases. There have been some efforts aimed at the memory aliasing problem. In this paper, we address the general memory aliasing problem and present a probabilistic trace-based technique for correlating memory locations. Our approach is based on computing the log-odds ratio, which defines the affinity of locations, based on observed patterns. As part of the aliasing process, the traces for the initial test inputs are aligned without considering aliasing. From the aligned traces, the log-odds ratio of the memory locations are computed. Subsequently, aliasing is used for alignment of successive traces. Our technique can easily be extended to other applications where aliasing is necessary. As a case study, we have implemented our approach for impact analysis, for detecting variations across program versions that uses dynamic traces on memory operations. Using detailed experiments on real versions of software systems, we find a significant change in the regions affected in a function when aliasing detection is used.

      6.Revocation Techniques for Java Concurrency


    This paper proposes two approaches to managing concurrency in Java using a guarded region abstraction. Both approaches use revocation of such regions -- the ability to undo their effects automatically and transparently. These new techniques alleviate many of the constraints that inhibit construction of transparently scalable and robust concurrent applications. The first solution, revocable monitors, augments existing mutual exclusion monitors with the ability to resolve priority inversion and deadlock dynamically, by reverting program execution to a consistent state when such situations are detected, while preserving Java semantics. The second technique, transactional monitors, extends the functionality of revocable monitors by implementing guarded regions as lightweight transactions that can be executed concurrently (or in parallel on multiprocessor platforms). The presentation includes discussion of design and implementation issues for both schemes, as well as a detailed performance study to compare their behavior with the traditional, state-of-the-art implementation of Java monitors based on mutual exclusion.

      7.Dynamic State Restoration Using Versioning Exceptions


    We explore the semantics and analysis of a new kind of control structure called a versioning exception that ensures the state of the program, at the point when an exception handler is invoked, reflects the program state at the point when the handler is installed. Versioning exceptions provide a transaction-like versioning semantics to the code protected by a handler: modifications performed within the dynamic context of the corresponding handler are versioned, and committed to the store only if the computation completes normally. Similar to the role of backtracking in logic programming, this facility allows unwanted effects of computations to be discarded when exceptional or undesirable conditions are detected. We define a novel points-to analysis to efficiently track changes to the store within handler-protected scopes. The role of the analysis is to facilitate optimizations that minimize the number of locations which must be restored when a versioning exception is raised. The analysis is defined by a reachability approximation over locations that indicates which objects have been potentially modified within a handler scope. The analysis is defined for programs which support first-class procedures, locations, and exceptions.

      8.Unstructured Peer-to-Peer Networks for Sharing Processor Cycles

    Motivated by the needs and success of projects such as SETI@home and genome@home, we propose an architecture for a sustainable large-scale peer-to-peer environment for distributed cycle sharing among Internet hosts. Such networks are characterized by highly dynamic state due to high arrival and departure rates. This makes it difficult to build and maintain structured networks and to use state-based resource allocation techniques. We build our system to work in an environment similar to current file-sharing networks such as Gnutella and Freenet. In doing so, we are able to leverage vast network resources while providing resilience to random failures, low network overhead, and an open architecture for resource brokering. This paper describes the underlying analytical and algorithmic substrates based on randomization for job distribution, replication, monitoring, aggregation and oblivious resource sharing and communication between participating hosts. We support our claims of robustness and scalability analytically with high probabilistic guarantees. Our algorithms do not introduce any state dependencies, and hence are resilient to dynamic node arrivals, departures, and failures. We support all analytical claims with a detailed simulation-based evaluation of our distributed framework.

      9.Locality in Structured Peer-to-Peer Networks


    Distributed hash tables (DHTs), used in a number of structured peer-to-peer (P2P) systems provide efficient mechanisms for resource placement and location. A key distinguishing feature of current DHT systems, such as Chord, Pastry, CAN and Tapestry, is the way they handle locality in the underlying network. Topology-based node identifier assignment, proximity routing, and proximity neighbor selection are examples of heuristics used to minimize message delays in the underlying network. While these heuristics are sometimes effective, they all rely on a single global overlay that may install the key of a popular object at a node far from most of the nodes accessing it. Furthermore, a response to a lookup message does not contain any locality information about the nodes holding a copy of the object. We address these issues in Plethora, a novel two-level overlay P2P network. A local overlay in Plethora acts as a locality-aware cache for the global overlay, grouping nodes close together in the underlying network. Local overlays are constructed by exploiting the organization of the Internet into Autonomous Systems (ASs). We present a detailed experimental study that demonstrates performance gains in response time of up to 60% compared to a single global Pastry overlay. We also present efficient distributed algorithms for maintaining local overlays in the presence of node arrivals and departures.