2008
- 1. A Uniform
Transactional Execution Environment for Java
Transactional memory (TM) has recently emerged as an effective tool
for extracting fine-grain parallelism from declarative critical
sections. In order to make STM systems practical, significant effort
has been made to integrate transactions into existing programming
languages. Unfortunately, existing approaches fail to provide a
simple implementation that permits lock-based and
transaction-based abstractions to coexist seamlessly. Because of
the fundamental semantic differences between locks and transactions,
legacy applications or libraries written using locks can not be
transparently used within atomic regions. To address these shortcomings,
we implement a uniform transactional execution environment for Java
programs in which transactions can be
integrated with more traditional
concurrency control constructs. Programmers can run arbitrary programs that
utilize traditionalmutual-exclusion-based programming techniques,
execute new programs written with explicit transactional constructs,
and freely combine abstractions that use both coding styles.
- 2. Protocol
Inference Using Static Path Profiles
Specification inference tools
typically mine commonalities
among states at relevant program
points. For example, to infer the
invariants that must hold at all
calls to a procedure
p
requires examining
the state abstractions found at
all call-sites to p. Unfortunately, existing
approaches to building these
abstractions require being able to explore
all paths (either static or
dynamic) to all of
p’s
call-sites to derive
specifications with any measure of
confidence. Because programs that
have complex control-flow
structure may induce a large number of
paths, naive path exploration is
impractical.
In this paper, we propose a new
specification inference technique that
allows us to efficiently explore
statically all paths to a program point.
Our approach builds static path
profiles, profile information constructed
by a static analysis that
accumulates predicates valid along different
paths to a program point. To make
our technique tractable, we employ
a summarization scheme to merge
predicates at join points based on
the frequency with which they
occur on different paths. For example,
predicates present on a
majority of static paths to all
call-sites of any
procedure
p forms the pre-condition of
p.
We have implemented a tool,
Marga, based on static path profiling.
Qualitative analysis of the
specifications inferred by marga indicates
that it is more accurate than
existing static mining techniques, can be
used to derive useful
specification even for APIs that occur infrequently
(statically) in the program, and
is robust against imprecision that may
arise from examination of
infeasible or infrequently occurring dynamic
paths. A comparison of the
specifications generated using marga with a
dynamic specification inference
engine based on Cute, an automatic unit
test generation tool, indicates
that Marga generates comparably precise
specifications with smaller cost.
3. Quasi-Static
Scheduling for Safe Futures.
Migrating sequential programs to effectively utilize next generation
multicore architectures is a key
challenge facing application
developers and
implementors. Languages like Java that support complex
control- and dataflow
abstractions confound classical automatic
parallelization techniques.
On the other hand, introducing
multithreading and concurrency
control explicitly into programs can
impose a high conceptual burden
on the programmer, and may entail a
significant rewrite of the
original program.
In this paper, we consider a new
technique to address this issue. Our
approach makes use of futures, a simple annotation that
introduces asynchronous
concurrency into Java programs, but provides
no concurrency control. To
ensure concurrent execution does not yield
behavior inconsistent with
sequential execution (i.e., execution
yielded by erasing all futures),
we present a new interprocedural
summary-based dataflow
analysis. The analysis inserts lightweight
barriers that block and resume
threads executing futures if a
dependency violation may
ensue. There are no constraints on how
threads execute other than those
imposed by these barriers.
Our experimental results indicate
futures can be leveraged to
transparently ensure safety and
profitably exploit parallelism; in
contrast to earlier efforts, our
technique is completely portable, and
requires no modifications to the
underlying JVM.
4.
Memoizing Multi-Threaded Transactions
There has been much recent
interest in using transactions to simplify
concurrent programming, improve
scalability, and increase performance.
When a transaction must abort due
to a serializability violation,
deadlock, or resource exhaustion,
its effects are revoked, and the transaction
re-executed. For long-lived
transactions, however, the cost of aborts
and re-execution can be
prohibitive. To ensure performance,
programmers are often forced to
reason about transaction lifetimes and
interactions while structuring
their code, defeating the simplicity
transactions purport to provide.
One way to reduce the overheads
of re-executing a failed transaction
is to avoid re-executing those
operations that were unaffected by the
violation(s) that induced the
abort. Memoization is one way to
capitalize on re-execution
savings, especially if violations are not
pervasive. Within a
transaction, if a procedure p
is applied with
argument v, and the transaction subsequently
aborts, p need only
be re-evaluated if its argument
when the transaction is retried is
different from v.
In this paper, we consider the
memoization problem for transactions in
the context of Concurrent ML
(CML). Our desig supports multi-threaded
transactions which allow
internal communication through synchronous
channel-based
communication. The challenge to memoization in this
context is ensuring that
communication actions performed by memoized
procedures in the original
(aborted) execution can be satisfied when the
transaction is retried.
We validate the effectiveness of
our approach using STMBench7, a
customizable transaction
benchmark. Our results indicate that memoization
for CML-based transactions
can lead to substantial reduction in re-execution costs
(up to 45 on some
configurations), with low memory overheads.
2007
-
The reliability and correctness
of complex software systems can be significantly enhanced
through well-defined
specifications that dictate the use of various units of abstraction
(e.g.,
modules, or procedures). Oftentimes, however, specifications are either
missing, imprecise, or simply too
complex to encode within a signature, necessitating specification
inference. The
process of inferring specifications from complex software systems forms
the focus of this
paper. We describe a static
inference mechanism for identifying the preconditions that must
hold whenever
a procedure is called. These preconditions may reflect both dataflow
properties
(e.g., whenever
p is called,
variable
x must be non-null)
as well as control-flow
properties (e.g., every call to
p must be preceded by a call to
q). We derive thes
preconditions using an
inter-procedural path-sensitive dataflow analysis that gathers
predicates at each
program point. We apply mining
techniques to these predicates to make specification
inference robust to errors. This technique
also allows us to derive higher-level specifications
that abstract structural similarities among
predicates (e.g., procedure
p
is called immediately
after a conditional test that checks whether
some variable
v is non-null.)
We describe an implementation of
these techniques, and validate the effectiveness of the
approach on a number of large open-source
benchmarks. Experimental results confirm that
our mining algorithms are
efficient, and that the specifications derived are both precise
and useful -- the implementation
discovers several critical, yet previously, undocumented
preconditions for well-tested libraries.
-
Function precedence protocols
define ordering relations among function calls in a program,
and constitute
an important part of a program's specification. In some
instances, precedence protocols
are well-understood (e.g., for example, a call to
pthread_mutex_init must
always be present on all program
paths before a call to
pthread_mutex_lock).
Oftentimes, however, these
protocols are neither well-documented, nor easily derived. As a
result,
protocol violations can lead to subtle errors that are difficult to
identify and correct.
In this paper, we present
Chronicler, a tool that applies scalable inter-procedural
path-sensitive static analysis to automatically
infer accurate function precedence protocols.
Chronicler computes precedence
relations based on a program's control-flow structure,
integrates these relations into a
repository, and analyzes them using sequence mining
techniques to generate a collection of feasible
precedence protocols. Deviations from these
protocols found in the program
are tagged as violations, and represent potential sources of
bugs.
We demonstrate \name's
effectiveness by deriving protocols for a collection of benchmarks
ranging in
size from 66K to 2M lines of code. Our results not only confirm the
existence of
bugs in these programs due to precedence protocol violations, but also
highlight the
importance of path sensitivity on
accuracy and scalability.
-
- Distributed peer-to-peer systems rely on voluntary
participation of peers to effectively
-
manage a storage pool. In such systems, data is generally
replicated for performance and availability. If the storage
associated with replication is not monitored and provisioned, the
underlying benefits may not be realized. Resource constraints,
performance scalability, and availability present diverse
considerations. Availability and performance scalability, in
terms of response time, are improved by aggressive replication, whereas
resource constraints limit total storage in the network.
Identification and elimination of redundant data pose fundamental
problems for such systems. In this paper, we present a novel and
efficient solution that addresses availability and scalability with
respect to management of redundant data. Specifically, we address
the problem of duplicate elimination in the context of systems
connected over an unstructured
peer-to-peer network in which there is no a priori binding between an
object and its location. We propose two randomized protocols to
solve this problem in a scalable and decentralized fashion that does
not compromise the availability requirements of the applicaiton.
Performance results using both large-scale simulations and a prototype
built on PlanetLab demonstrate that our protocols provide high
probabilistic guarantees while incurring minimal administrative
overheads.
-
-
We present an efficient randomized
algorithm for leader election in large-scale distributed
systems that works correctly with high
probability. Our algorithm is optimial in message
complexity (
O(n))
for a set of n total nodes) and has round complexity logarithmic in the
number of nodes in the system. The algorithm
relies on a
balls-and-bins
abstraction and
works in two phases. The
main novelty of the work is in the first phase, where the number of
contending processes is reduced
in a controlled manner. Probabilistic quorums are used to
determine a
winner in the second phase. We discuss, in detail, the
synchronous version of the
algorithm, provide extensions to an asynchronous version, and examine
the impact of failures.
-
In this paper, we present COSMOS,
a novel architecture for macroprogramming
heterogeneous
sensor network systems. Macroprogramming entails aggregate system
behavior specification, as
opposed to device-specific applications that indirectly express
distributed behavior through explicit messaging
between nodes. COSMOS is comprised of a
macroprogramming language, mPL, and an operating
system, mOS. mPL macroprograms
specify distributed system
behavior using statically verifiable compositions of reusable
user-provided, or system
supported functional components. mOS provides component
management and a lean execution environment for mPL
in heterogeneous
resource-constrained sensor
networks. COSMOS facilitates composition of complex
real-world applications that are
robust, scalable, and adaptive in dynamic data-driven sensor
network
environments. The mOS architecture allows runtime application
instantiation, with over-the-air
reprogramming of the network. An important and novel aspect of
COSMOS is the ability to easily
extend its component basis library to add rich macroprogramming
abstractions to mPL, tailored to
domain and resource constraints, with modification to the
OS. A fully functional version of COSMOS is
currently in use at the Bowen Labs for
Structural Engineering and Purdue University, for
high-fidelity structural dynamic
measurements. We present comprehensive
experimental evaluation using macro- and micro-
benchmarks to demonstrate performance
characteristics of COSMOS.
1.
Improving
Duplicate Elimination in Storage Systems.
Minimizing the amount of data
that must be stored and managed is a key goal for any storage
architecture that purports to be
scalable. One way to achieve this goal is to avoid maintaining
duplicate copies of the same
data. Eliminating redundant data at the source by not writing data
which has already been stored,
not only reduces storage overheads, but can also improve
bandwidth utilization. For these reasons, in
the face of today's exponentially growing data
volumes, redundant data
elimination techniques have assumed critical significance in the
design of modern storage systems.
Intelligent object partitioning
techniques identify data that are {\em new} when objects are
updated, and
transfer only those chunks to a storage server. In this paper, we
propose a new object partitioning
technique, called
fingerdiff,
that improves upon existing schemes in several
important respects. Most notably
fingerdiff dynamically chooses a
partitioning strategy for a data
object based on its similarities with previously stored objects
in order to improve storage and
bandwidth utilization. We present a detailed evaluation of
fingerdiff, and other existing
object
partitioning schemes, using a set of real-world workloads. We show that
for these
workloads, the duplicate elimination strategies
employed by
fingerdiff
improve storage
utilization on
average by 25\%, and bandwidth utilization on average by 40% over
comparable
techniques.
-
Transient faults that arise in
large-scale software systems can often be repaired by
re-executing the code in which they occur.
Ascribing a meaningful semantics for safe
re-execution in multi-threaded code is not obvious,
however. For a thread to correctly
re-execute a region of code, it must ensure that all
other threads which have witnessed its
unwanted
effects within that region are also reverted to a meaningful earlier
state. If not done
properly, data inconsistencies and other undesirable behavior may
result. However,
automatically determining what constitutes a
consistent global checkpoint is not
straightforward since thread interactions are a dynamic property of the
program.
In this paper, we present a safe
and efficient checkpointing mechanism for Concurrent ML
(CML) that can
be used to recover from transient faults. We introduce a new
linguistic
abstraction called
stabilizers
that permits the specification of per-thread monitors and the
restoration of globally consistent
checkpoints. Safe global states are computed through
lightweight monitoring of communication events among
threads (e.g. message-passing
operations or
updates to shared variables).
Our experimental results on
several realistic, multithreaded, server-style CML applications,
including a
web server and a windowing toolkit, show that the overheads to use
stabilizers are small, and lead
us to conclude that they are a viable mechanism for defining safe
checkpoints in concurrent
functional programs.
-
Software systems often undergo many
revisions during their lifetime because new features are
added, bugs repaired,
abstractions simplified and refactored, and performance improved.
When a revision, even a minor one, does occur, the
changes it induces must be tested to
ensure that
assumed invariants in the original are not violated
unintentionally. In order to
avoid testing
components that are unchanged across revisions, impact analysis is
often used to identify those code
blocks or functions that are affected by a change. In this paper, we
present a new solution to this
general problem that uses dynamic programming on instrumented traces
of different program binaries to
identify longest common subsequences in the strings
generated by these traces. Our formulation not only
allows us to perform impact analysis, but
can also be used to detect the smallest set of
locations within these functions where the effect
of the changes actually manifest. Sieve is a tool
that incorporates these ideas. Sieve is
unobtrusive, requiring no programmer or compiler
involvement to guide its behavior. Our
experiments on multiple versions
of open-source C programs shows that Sieve is an effective
and scalable tool to identify
impact sets and can locate the regions in the affected functions
where the
changes manifest. These results lead us to conclude that Sieve can play
a beneficial role in program
testing and software maintenance.
-
Concurrent data accesses in
high-level languages like Java and C\# are typically mediated
using
mutual-exclusion locks. Threads use locks to
guard the operations performed
while the lock
is held, so that the lock's guarded operations can never be interleaved
with
operations of other threads that are guarded by the
same lock. This way both
atomicity
and
isolation properties of a
thread's guarded operations are enforced. Recent proposals recognize
that these properties can also be
enforced by concurrency control protocols that avoid
well-known problems associated with locking, by
transplanting notions of
transactions
found in database systems to a
programming language context. While higher-level than locks,
software transactions incur
significant implementation overhead. This overhead cannot be
easily masked
when there is little contention on the operations being guarded.
We describe how mutual-exclusion
locks and transactions can be reconciled transparently
within Java's
monitor abstraction. We have implemented monitors for Java that
execute using locks when
contention is low and switch over to transactions when concurrent
attempts to
enter the monitor are detected. We formally argue the correctness
of our solution with respect to
Java's execution semantics and provide a detailed performance
evaluation for different
workloads and varying levels of
contention. We demonstrate that our implementation has low
overheads in the uncontended case
(7% on average) and that significant performance
improvements (up to 3X) can be achieved from running
contended monitors
transactionally.
-
One of the major costs of
software development is associated with testing and validation of
successive
versions of software systems. Memory aliasing is an important problem
that occurs in many applications
towards testing and validating multiple versions, viz., impact
analysis,
correlating variables across versions to ensure that existing
invariants are preserved in the
newer version and matching
program execution histories. For example, impact analysis is
often used to
identify code blocks or functions that are affected by a change. Recent
work in this area has focused on
trace-based techniques, to better isolate affected regions. A
variation of this general
approach is to also consider operations on memory to generate more
refined impact
sets. However, the utility of such approach depends on effectively
recognizing aliases.
There have been some efforts
aimed at the memory aliasing problem. In this paper, we
address the general memory
aliasing problem and present a probabilistic trace-based technique
for correlating memory locations.
Our approach is based on computing the log-odds ratio,
which defines
the affinity of locations, based on observed patterns. As part of the
aliasing
process, the traces for the initial test inputs are aligned without
considering aliasing. From the
aligned traces, the log-odds ratio of the memory locations are
computed. Subsequently,
aliasing is used for alignment of
successive traces. Our technique can easily be extended to
other
applications where aliasing is necessary. As a case study, we have
implemented our
approach for impact analysis, for detecting
variations across program versions that uses
dynamic traces
on memory operations. Using detailed experiments on real versions of
software systems, we find a
significant change in the regions affected in a function when
aliasing detection is used.
-
This paper proposes two approaches to
managing concurrency in Java using a
guarded
region abstraction. Both approaches use
revocation of such regions --
the ability to
undo their
effects automatically and
transparently. These new techniques alleviate many of the
constraints
that inhibit construction of transparently scalable and robust
concurrent
applications. The first solution,
revocable monitors, augments
existing mutual exclusion
monitors with the ability to
resolve priority inversion and deadlock dynamically, by reverting
program execution to a consistent
state when such situations are detected, while preserving
Java
semantics. The second technique,
transactional monitors, extends the
functionality of
revocable monitors by implementing guarded regions
as lightweight transactions that can be
executed concurrently (or in parallel on
multiprocessor platforms). The presentation includes
discussion of design and
implementation issues for both schemes, as well as a detailed
performance study to compare
their behavior with the traditional, state-of-the-art
implementation of Java monitors based on mutual
exclusion.
-
We explore the semantics and
analysis of a new kind of control structure called a
versioning
exception that ensures the state of the program, at
the point when an exception handler is
invoked,
reflects the program state at the point when the handler is
installed. Versioning
exceptions provide a
transaction-like versioning semantics to the code protected by a
handler: modifications performed
within the dynamic context of the corresponding handler are
versioned, and committed to the
store only if the computation completes normally. Similar to
the role of backtracking in logic
programming, this facility allows unwanted effects of
computations to be discarded when
exceptional or undesirable conditions are detected.
We define a novel points-to
analysis to efficiently track changes to the store within
handler-protected scopes. The role of the analysis is to
facilitate optimizations that minimize
the number of locations which must be restored when
a versioning exception is raised. The
analysis is defined by a
reachability approximation over locations that indicates which objects
have been potentially modified
within a handler scope. The analysis is defined for programs
which support first-class
procedures, locations, and exceptions.
8.Unstructured
Peer-to-Peer Networks for Sharing Processor Cycles,
Motivated by the needs and
success of projects such as SETI@home and genome@home,
we propose an
architecture for a sustainable large-scale peer-to-peer
environment for
distributed cycle sharing among Internet hosts. Such networks are
characterized by highly
dynamic state due to high arrival and departure
rates. This makes it difficult to build and
maintain
structured networks and to
use state-based resource allocation techniques.
We build our system to work
in an environment similar to current
file-sharing networks such
as Gnutella and
Freenet. In doing so, we are able to leverage vast network
resources
while providing resilience
to random failures, low network overhead, and an open
architecture
for resource brokering. This paper describes the
underlying analytical and
algorithmic substrates based on
randomization for job
distribution, replication,
monitoring,
aggregation and oblivious resource sharing and
communication between
participating hosts. We support
our claims of robustness and
scalability analytically
with high
probabilistic guarantees. Our algorithms do not
introduce any state
dependencies, and
hence are resilient to dynamic node arrivals,
departures, and failures.
We support all analytical
claims with a detailed simulation-based
evaluation of our
distributed framework.
-
-
Distributed
hash tables (DHTs), used in a
number of structured peer-to-peer (P2P)
systems
provide efficient mechanisms for resource placement and
location. A key
distinguishing feature of current DHT
systems, such as Chord, Pastry, CAN and
Tapestry, is the way they
handle locality in the underlying
network. Topology-based node
identifier assignment, proximity
routing, and proximity neighbor selection are
examples of heuristics used
to minimize message delays in the underlying
network. While these
heuristics are sometimes
effective, they all rely on a single global overlay that
may install
the key of a popular object at a node far
from most of the nodes accessing it.
Furthermore, a response to a lookup message does not
contain any locality information
about the nodes holding a copy of
the object. We address these issues
in Plethora, a
novel two-level overlay P2P
network. A local overlay in
Plethora acts as a locality-aware
cache for the global overlay, grouping nodes close
together in the underlying network.
Local overlays
are constructed by exploiting the organization of the
Internet into
Autonomous
Systems (ASs). We present a detailed
experimental study that demonstrates
performance gains in response time of up to
60% compared to a single global Pastry
overlay. We
also present efficient distributed
algorithms for maintaining local overlays
in the presence of
node arrivals and departures.