Abstracts of Selected Papers
2008
2007
2006
2008

  1.        1.  A Uniform Transactional Execution Environment for Java

        Transactional memory (TM) has recently emerged as an effective tool
        for extracting fine-grain parallelism from declarative critical
        sections. In order to make STM systems practical, significant effort
        has been made to integrate transactions into existing programming
        languages. Unfortunately, existing approaches fail to provide a
        simple implementation that permits lock-based and
        transaction-based abstractions to coexist seamlessly. Because of
        the fundamental semantic differences between locks and transactions,
        legacy applications or libraries written using locks can not be
        transparently used within atomic regions. To address these shortcomings,
        we implement a uniform transactional execution environment for Java
        programs in which transactions can be integrated with more traditional
        concurrency control constructs. Programmers can run arbitrary programs that
        utilize traditionalmutual-exclusion-based programming techniques,
        execute new programs written with explicit transactional constructs,
        and freely combine abstractions that use both coding styles.

  1.        2.  Protocol Inference Using Static Path Profiles

        Specification inference tools typically mine commonalities
        among states at relevant program points. For example, to infer the
        invariants that must hold at all calls to a procedure p requires examining
        the state abstractions found at all call-sites to p. Unfortunately, existing
        approaches to building these abstractions require being able to explore
        all paths (either static or dynamic) to all of p’s call-sites to derive
        specifications with any measure of confidence. Because programs that
        have complex control-flow structure may induce a large number of
        paths, naive path exploration is impractical.

        In this paper, we propose a new specification inference technique that
        allows us to efficiently explore statically all paths to a program point.
        Our approach builds static path profiles, profile information constructed
        by a static analysis that accumulates predicates valid along different
        paths to a program point. To make our technique tractable, we employ
        a summarization scheme to merge predicates at join points based on
        the frequency with which they occur on different paths. For example,
        predicates present on a majority of static paths to all call-sites of any
        procedure p forms the pre-condition of p.

        We have implemented a tool, Marga, based on static path profiling.
        Qualitative analysis of the specifications inferred by marga indicates
        that it is more accurate than existing static mining techniques, can be
        used to derive useful specification even for APIs that occur infrequently
        (statically) in the program, and is robust against imprecision that may
        arise from examination of infeasible or infrequently occurring dynamic
        paths. A comparison of the specifications generated using marga with a
        dynamic specification inference engine based on Cute, an automatic unit
        test generation tool, indicates that Marga generates comparably precise
        specifications with smaller cost.


        3. Quasi-Static Scheduling for Safe Futures.

        Migrating sequential programs to effectively utilize next generation
        multicore architectures is a key challenge facing application
        developers and implementors.  Languages like Java that support complex
        control- and dataflow abstractions confound classical automatic
        parallelization techniques.  On the other hand, introducing
        multithreading and concurrency control explicitly into programs can
        impose a high conceptual burden on the programmer, and may entail a
        significant rewrite of the original program.

        In this paper, we consider a new technique to address this issue.  Our
        approach makes use of futures, a simple annotation that
        introduces asynchronous concurrency into Java programs, but provides
        no concurrency control.  To ensure concurrent execution does not yield
        behavior inconsistent with sequential execution (i.e., execution
        yielded by erasing all futures), we present a new interprocedural
        summary-based dataflow analysis.  The analysis inserts lightweight
        barriers that block and resume threads executing futures if a
        dependency violation may ensue.  There are no constraints on how
        threads execute other than those imposed by these barriers.

        Our experimental results indicate futures can be leveraged to
        transparently ensure safety and profitably exploit parallelism; in
        contrast to earlier efforts, our technique is completely portable, and
        requires no modifications to the underlying JVM.

        4. Memoizing Multi-Threaded Transactions

        There has been much recent interest in using transactions to simplify
        concurrent programming, improve scalability, and increase performance.
        When a transaction must abort due to a serializability violation,
        deadlock, or resource exhaustion, its effects are revoked, and the transaction
        re-executed.  For long-lived transactions, however, the cost of aborts
        and re-execution can be prohibitive.  To ensure performance,
        programmers are often forced to reason about transaction lifetimes and
        interactions while structuring their code, defeating the simplicity
        transactions purport to provide.

        One way to reduce the overheads of re-executing a failed transaction
        is to avoid re-executing those operations that were unaffected by the
        violation(s) that induced the abort.  Memoization is one way to
        capitalize on re-execution savings, especially if violations are not
        pervasive.  Within a transaction, if a procedure p is applied with
        argument v, and the transaction subsequently aborts, p need only
        be re-evaluated if its argument when the transaction is retried is
        different from v

        In this paper, we consider the memoization problem for transactions in
        the context of Concurrent ML (CML).  Our desig supports multi-threaded
        transactions which allow internal communication through synchronous
        channel-based communication.  The challenge to memoization in this
        context is ensuring that communication actions performed by memoized
        procedures in the original (aborted) execution can be satisfied when the
        transaction is retried.

        We validate the effectiveness of our approach using STMBench7, a
        customizable transaction benchmark. Our results indicate that memoization
        for CML-based transactions can lead to substantial reduction in re-execution costs
        (up to 45 on some configurations), with low memory overheads.


2007
 


        The reliability and correctness of complex software systems can be significantly enhanced
        through well-defined specifications that dictate the use of various units of abstraction (e.g.,             modules, or procedures). Oftentimes, however, specifications are either missing, imprecise, or         simply too complex to encode within a signature, necessitating specification inference. The             process of inferring specifications from complex software systems forms the focus of this                 paper. We describe a static inference mechanism for identifying the preconditions that must             hold whenever a procedure is called. These preconditions may reflect both dataflow                         properties (e.g., whenever p is called, variable x must be non-null) as well as control-flow             properties (e.g., every call to p must be preceded by a call to q).  We derive thes preconditions         using an inter-procedural path-sensitive dataflow analysis that gathers predicates at each                  program point. We apply mining techniques to these predicates to make specification                     inference robust to errors.  This technique also allows us to derive higher-level specifications         that abstract structural similarities among predicates (e.g., procedure p is called immediately             after a conditional test that checks whether  some variable v is non-null.)

        We describe an implementation of these techniques, and validate the effectiveness of the                 approach on a number of large open-source benchmarks.  Experimental results confirm that             our mining algorithms are efficient, and that the specifications derived are both precise
        and useful -- the implementation discovers several critical, yet previously, undocumented                 preconditions for well-tested libraries.



        Function precedence protocols define ordering relations among function calls in a program,             and constitute an important part of a program's specification.  In some instances, precedence         protocols are well-understood (e.g., for example, a call to pthread_mutex_init must         always be present on all program paths before a call to pthread_mutex_lock).                      Oftentimes, however, these protocols are neither well-documented, nor easily derived. As a             result, protocol violations can lead to subtle errors that are difficult to identify and correct.

        In this paper, we present Chronicler,  a tool that applies scalable inter-procedural                             path-sensitive static analysis to automatically infer accurate function precedence protocols.              Chronicler computes precedence relations based on a program's control-flow structure,
        integrates these relations into a repository, and analyzes them using sequence mining                     techniques to generate a collection of feasible precedence protocols.  Deviations from these             protocols found in the program are tagged as violations, and represent potential sources of             bugs.

        We demonstrate \name's effectiveness by deriving protocols for a collection of benchmarks             ranging in size from 66K to 2M lines of code. Our results not only confirm the existence of             bugs in these programs due to precedence protocol violations, but also highlight the
        importance of path sensitivity on accuracy and scalability.


  1. Distributed peer-to-peer systems rely on voluntary participation of peers to effectively
      1.        manage a storage pool.  In such systems, data is generally replicated for performance and availability.  If the storage associated with replication is not monitored and provisioned, the underlying benefits may not be realized.  Resource constraints, performance scalability, and availability present diverse considerations.  Availability and performance scalability, in terms of response time, are improved by aggressive replication, whereas resource constraints limit total storage in the network.  Identification and elimination of redundant data pose fundamental problems for such systems.  In this paper, we present a novel and efficient solution that addresses availability and scalability with respect to management of redundant data.  Specifically, we address the problem of duplicate elimination in the context of systems connected over an unstructured peer-to-peer network in which there is no a priori binding between an object and its location.  We propose two randomized protocols to solve this problem in a scalable and decentralized fashion that does not compromise the availability requirements of the applicaiton.  Performance results using both large-scale simulations and a prototype built on PlanetLab demonstrate that our protocols provide high probabilistic guarantees while incurring minimal administrative overheads.



       We present an efficient randomized algorithm for leader election in large-scale distributed                 systems that works correctly with high probability.  Our algorithm is optimial in message                 complexity (O(n)) for a set of n total nodes) and has round complexity logarithmic in the                 number of nodes in the system.  The algorithm relies on a balls-and-bins abstraction and                 works in two phases.  The main novelty of the work is in the first phase, where the number of         contending processes is reduced in a controlled manner.  Probabilistic quorums are used to             determine a winner in the second phase.  We discuss, in detail, the synchronous version of the         algorithm, provide extensions to an asynchronous version, and examine the impact of failures.



        In this paper, we present COSMOS, a novel architecture for macroprogramming                             heterogeneous sensor network systems.  Macroprogramming entails aggregate system                     behavior specification, as opposed to device-specific applications that indirectly express                 distributed behavior through explicit messaging between nodes.  COSMOS is comprised of a         macroprogramming language, mPL, and an operating system, mOS.  mPL macroprograms             specify distributed system behavior using statically verifiable compositions of reusable                     user-provided, or system supported functional components.  mOS provides component                 management and a lean execution environment for mPL in heterogeneous                                         resource-constrained sensor networks.  COSMOS facilitates composition of complex                     real-world applications that are robust, scalable, and adaptive in dynamic data-driven sensor             network environments.  The mOS architecture allows runtime application instantiation, with         over-the-air reprogramming of the network.  An important and novel aspect of COSMOS is         the ability to easily extend its component basis library to add rich macroprogramming                     abstractions to mPL, tailored to domain and resource constraints, with modification to the                 OS.  A fully functional version of COSMOS is currently in use at the Bowen Labs for                     Structural Engineering and Purdue University, for high-fidelity structural dynamic                         measurements.  We present comprehensive experimental evaluation using macro- and micro-         benchmarks to demonstrate performance characteristics of COSMOS.

2006


       1. Improving Duplicate Elimination in Storage Systems.

        Minimizing the amount of data that must be stored and managed is a key goal for any storage         architecture that purports to be scalable.  One way to achieve this goal is to avoid maintaining         duplicate copies of the same data. Eliminating redundant data at the source by not writing data         which has already been stored, not only reduces storage overheads, but can also improve                 bandwidth utilization.  For these reasons, in the face of today's exponentially growing data             volumes, redundant data elimination techniques have assumed critical significance in the                 design of modern storage systems.
                                                                                                                                            
        Intelligent object partitioning techniques identify data that are {\em new} when objects are             updated, and transfer only those chunks to a storage server.  In this paper, we propose a new         object partitioning technique, called  fingerdiff, that improves upon existing schemes in several         important respects.  Most notably fingerdiff dynamically chooses a partitioning strategy for a         data object based on its similarities with previously stored  objects in order to improve storage         and bandwidth utilization. We present a detailed evaluation of  fingerdiff, and other existing             object partitioning schemes, using a set of real-world workloads. We show that for these                 workloads, the duplicate elimination strategies employed by fingerdiff improve storage                     utilization on average by 25\%, and bandwidth utilization on average by 40% over                         comparable techniques.


        Transient faults that arise in large-scale software systems can often be repaired by                             re-executing the code in which they occur.  Ascribing a meaningful semantics for safe                     re-execution in multi-threaded code is not obvious, however.  For a thread to correctly                     re-execute a region of code, it must ensure that all other threads which have witnessed its                 unwanted effects within that region are also reverted to a meaningful earlier state.  If not done         properly, data inconsistencies and other undesirable behavior may result.  However,                         automatically determining what constitutes a consistent global checkpoint is not                             straightforward since thread interactions are a dynamic property of the program.

        In this paper, we present a safe and efficient checkpointing mechanism for Concurrent ML             (CML) that can be used to recover from transient faults.  We introduce a new linguistic                 abstraction called stabilizers that permits the specification of per-thread monitors and the                 restoration of globally consistent checkpoints.  Safe global states are computed through                 lightweight monitoring of communication events among threads (e.g. message-passing                     operations or updates to shared variables).

        Our experimental results on several realistic, multithreaded, server-style CML applications,             including a web server and a windowing toolkit, show that the overheads to use stabilizers are         small, and lead us to conclude that they are a viable mechanism for defining safe checkpoints         in concurrent functional programs.


        Software systems often undergo many revisions during their lifetime because new features are         added, bugs repaired, abstractions simplified and refactored, and performance improved.                 When a revision, even a minor one, does occur, the changes it induces must be tested to                 ensure that assumed invariants in the original are not violated unintentionally.  In order to                 avoid testing components that are unchanged across revisions, impact analysis is often used to         identify those code blocks or functions that are affected by a change. In this paper, we present         a new solution to this general problem that uses dynamic programming on instrumented traces         of different program binaries to identify longest common subsequences in the strings                     generated by these traces. Our formulation not only allows us to perform impact analysis, but         can also be used to detect the smallest set of locations within these functions where the effect         of the changes actually manifest. Sieve is a tool that incorporates these ideas. Sieve is                     unobtrusive, requiring no programmer or compiler involvement to guide its behavior. Our             experiments on multiple versions of open-source C programs shows that Sieve is an effective         and scalable tool to identify impact sets and can locate the regions in the affected functions             where the changes manifest. These results lead us to conclude that Sieve can play a beneficial         role in program testing and software maintenance.


        Concurrent data accesses in high-level languages like Java and C\# are typically mediated             using mutual-exclusion locks. Threads use locks to guard the operations performed                         while the lock is held, so that the lock's guarded operations can never be interleaved with                 operations of other threads that are guarded by the same lock. This way both atomicity and             isolation properties of a thread's guarded operations are enforced. Recent proposals recognize         that these properties can also be enforced by concurrency control protocols that avoid                     well-known problems associated with locking, by transplanting notions of transactions found         in database systems to a programming language context.  While higher-level than locks,
        software transactions incur significant implementation overhead.  This overhead cannot be             easily masked when there is little contention on the operations being guarded.

        We describe how mutual-exclusion locks and transactions can be reconciled transparently             within Java's monitor abstraction.  We have implemented monitors for Java that execute using         locks when contention is low and switch over to transactions when concurrent attempts to             enter the monitor are detected.  We formally argue the correctness of our solution with respect         to Java's execution semantics and provide a detailed performance evaluation for different                 workloads and varying levels of contention. We demonstrate that our implementation has low         overheads in the uncontended case (7% on average) and that significant performance                     improvements (up to 3X) can be achieved from running contended monitors                                     transactionally.


        One of the major costs of software development is associated with testing and validation of             successive versions of software systems. Memory aliasing is an important problem that occurs         in many applications towards testing and validating multiple versions, viz., impact analysis,             correlating variables across versions to ensure that existing invariants are preserved in the                 newer version and matching program execution histories. For example, impact analysis is             often used to identify code blocks or functions that are affected by a change. Recent work in         this area has focused on trace-based techniques, to better isolate affected regions.  A variation         of this general approach is to also consider operations on memory to generate more refined             impact sets. However, the utility of such approach depends on effectively recognizing aliases.

        There have been some efforts aimed at the memory aliasing problem. In this paper, we                     address the general memory aliasing problem and present a probabilistic trace-based technique         for correlating memory locations. Our approach is based on computing the log-odds ratio,             which defines the affinity of locations, based on observed patterns. As part of the aliasing             process, the traces for the initial test inputs are aligned without considering aliasing. From the         aligned traces, the log-odds ratio of the memory locations are computed. Subsequently,                 aliasing is used for alignment of successive traces. Our technique can easily be extended to             other applications where aliasing is necessary. As a case study, we have implemented our             approach for impact analysis, for detecting variations across program versions that uses                 dynamic traces on memory operations. Using detailed experiments on real versions of                     software systems, we find a significant change in the regions affected in a function when                 aliasing detection is used.


        This paper proposes two approaches to managing concurrency in Java using a guarded                 region abstraction.  Both approaches use revocation of such regions -- the ability to undo their         effects automatically and transparently.  These new techniques alleviate many of the                         constraints that inhibit construction of transparently scalable and robust concurrent                         applications.  The first solution, revocable monitors, augments existing mutual exclusion                 monitors with the ability to resolve priority inversion and deadlock dynamically, by reverting         program execution to a consistent state when such situations are detected, while preserving             Java semantics.  The second technique, transactional monitors, extends the functionality of             revocable monitors by implementing guarded regions as lightweight transactions that can be         executed concurrently (or in parallel on multiprocessor platforms).  The presentation includes         discussion of design and implementation issues for both schemes, as well as a detailed                     performance study to compare their behavior with the traditional, state-of-the-art                             implementation of Java monitors based on mutual exclusion.


        We explore the semantics and analysis of a new kind of control structure called a versioning         exception that ensures the state of the program, at the point when an exception handler is                 invoked, reflects the program state at the point when the handler is installed.  Versioning                 exceptions provide a transaction-like versioning semantics to the code protected by a handler:         modifications performed within the dynamic context of the corresponding handler are                     versioned, and committed to the store only if the computation completes normally.  Similar to         the role of backtracking in logic programming, this facility allows unwanted effects of                     computations to be discarded when exceptional or undesirable conditions are detected.

        We define a novel points-to analysis to efficiently track changes to the store within                         handler-protected scopes.  The role of the analysis is to facilitate optimizations that minimize         the number of locations which must be restored when a versioning exception is raised.  The             analysis is defined by a reachability approximation over locations that indicates which objects         have been potentially modified within a handler scope.  The analysis is defined for programs         which support first-class procedures, locations, and exceptions.

        8.Unstructured Peer-to-Peer Networks for Sharing Processor Cycles,

       Motivated by the  needs and success of  projects such as SETI@home and genome@home,             we propose an architecture  for a sustainable large-scale peer-to-peer environment for                     distributed  cycle sharing among Internet hosts. Such networks are characterized  by highly             dynamic state due to high arrival and departure rates. This makes it difficult to build and
        maintain  structured   networks  and   to  use   state-based  resource allocation techniques.  We         build our system to  work in an environment similar  to current file-sharing    networks  such          as Gnutella    and Freenet. In doing  so, we are able  to leverage vast network resources
        while providing resilience to  random failures, low network  overhead, and an open                         architecture for resource brokering.  This paper describes the underlying   analytical   and                 algorithmic   substrates  based   on randomization  for    job   distribution,  replication,                    monitoring, aggregation and oblivious  resource sharing and  communication between                     participating  hosts.    We support   our  claims of    robustness and scalability   analytically             with  high  probabilistic  guarantees.  Our algorithms do  not introduce  any state                              dependencies,  and hence  are resilient to dynamic  node  arrivals, departures,  and failures.            We support all    analytical claims   with a   detailed  simulation-based evaluation of our                 distributed framework.


        Distributed  hash    tables (DHTs), used in     a number of structured peer-to-peer (P2P)                  systems provide efficient mechanisms  for resource placement and location.  A  key                         distinguishing feature of  current DHT systems,  such as  Chord,  Pastry, CAN  and Tapestry,         is  the way they handle locality   in  the  underlying network.    Topology-based  node                    identifier assignment,  proximity   routing, and  proximity   neighbor selection  are examples         of  heuristics used to minimize message delays in  the underlying  network.   While  these              heuristics are  sometimes effective, they all rely on  a single global  overlay that may install             the  key of a  popular object at  a node  far  from most  of the nodes accessing  it.                             Furthermore,  a response  to a lookup  message does not contain any locality information                 about the nodes holding a copy of the object.    We address  these   issues in Plethora,   a                 novel two-level overlay  P2P network.    A  local  overlay in  Plethora acts as a locality-aware         cache for  the   global overlay, grouping nodes close together in the underlying network.                  Local overlays are constructed by exploiting the  organization of the   Internet into                         Autonomous Systems (ASs).  We  present  a detailed  experimental  study that demonstrates         performance gains in response time of up to 60%  compared to a single global  Pastry                      overlay.    We also    present  efficient  distributed algorithms  for maintaining local   overlays         in the  presence  of node arrivals and departures.