Representative Research Projects

Below are some of the research projects actively being pursued in my group.  Besides these, there are a number of other currently less developed ideas my students and I are seriously exploring, broadly in the areas of programming language design, program analysis, type systems, protocol and specification inference, testing, and verification. 

Programming Eventually Consistent Data Stores.


Geo-distributed web applications often favor high availability over strong consistency. In response to this bias, modern-day replicated data stores often eschew sequential consistency in favor of weaker eventual consistency data semantics. While most operations supported by a typical web application can be engineered, with sufficient care, to function under EC, there are oftentimes critical operations that require stronger consistency guarantees. Unfortunately, existing approaches to tunable consistency suffers from poorly-defined implementation-specific semantics, often expressed at a level of abstraction far removed from the application. Programming modern-day distributed systems is thus a complex error-prone endeavor. The Quelea project addresses these concerns by defining a declarative programming model for eventually consistent data stores that abstracts the actual implementation of the data store via high-level programming and system-level models that are agnostic to a specific implementation. By doing so, Quelea frees application programmers from having to reason about their application in terms of low-level implementation specific data store semantics. Instead, programmers can now reason in terms of an abstract model of the data store, and develop applications by defining and composing high-level replicated data types. Quelea is equipped with a formal specification language capable of expressing precise semantics of high-level consistency guarantees (e.g., causal consistency) in the abstract model. Any eventually consistent key-value store can support Quelea by implementing a thin shim layer and a choosen set of high-level consistency guarantees on top of its existing low-level interface.

      Representative publications:

Machine Learning for Specification and Type Inference.


Having precise specifications that describe a program's intended behavior is a critical pre-requisite to ensure reliabiity, extensibiity, and maintainability of modern-day software. Generating these specifications manually is a challenging, often unsuccessful, exercise; unfortunately, existing static and dynamic analysis techniques often produce poor quality specifications that are ineffective in aiding program verification tasks to guarantee a program behaves as expected. We have pursued a recent line of work on automated synthesis of specifications that overcome many of the deficiencies that plague existing specification inference methods. Our main contribution is a formulation of the problem as a sample driven one, in which specifications, represented as terms in a decidable refinement type representation, are discovered from observing a program's sample runs in terms of either program execution paths or input-output values, and automatically verified through the use of expressive refinement type systems. Our approach is realized as a series of inductive synthesis frameworks, which use various logic-based or classification-based learning algorithms to provide sound and precise machine-checked specifications. Experimental results indicate that the learning algorithms are both efficient and effective, capable of automatically producing sophisticated specifications in nontrivial hypothesis domains over a range of complex real-world programs, going well beyond the capabilities of existing solutions.

      Representative publications:

Verified Compilation and Reasoning for Concurrency.

With colleagues at Cambridge, INRIA, and MPI-SWS, we have built a compiler for ClightTSO, a C-like language for expressing shared-memory concurrency above x86 processors. Our approach defines a TSO (total store order) relaxed-memory semantics for ClightTSO that exposes the processor's memory model for expressing high-performance (potentially racy) code. We have implemented a verifying compiler for the language as an extension to Compcert. We have recently extended this work to consider an alternative definition of the Java Memory Model that is more closely aligned with the semantics of hardware memory models such as TSO and Power.

Along with colleagues at Cambridge, and Microsoft Research, we have also explored formal specification and verification techniques for deterministic parallelism. We define a high-level specification that expresses the necessary conditions for correct execution, and examine ways to tie this high-level specification to low-level implementations. We are particularly interested in modular reasoning technique that allow us to reason about program and library correctness without breaking abstraction boundaries. We have recently applied techniques like frame inference and abduction to build program analyses that automatically inject synchronization barriers to enforce determinism in data parallel applications.

      Representative publications:

Abstractions, compilation, and runtime support for high-level concurrent programming.

We are working on a number of topics  related to the design and implementation of scalable concurent and parallel functional programs.   Our research centers around extensions to message-passing dialects of ML (such as CML), software transactions and transactional events, and associated program analyses.  We are also working on runtime infrastructure to build lightweight and efficient parallel and concurrent garbage collectors for ML-like languages.  Much of our implementation effort centers around Multi-MLton, a multicore aware extension of the MLton whole-program optimizing compiler for Standard ML.  In addition, we have an active effort exploring compilation techniques and runtime support for safe futures, a lightweight annotation for expressing deterministic parallelism; we 've studied the behavior and analysis of safe futures for both ML as well as Java.

      Representative publications:
Program analyses and type systems.

We have developed a compositional and lightweight dependent type inference system for ML capable of inferring useful safety properties of ML programs that go beyond the kinds of properties expressible using classical Hindley-Milner polymorphic type inference. Our technique infers refinements, requires no programmer annotations or guidance, is context-sensitive, and is not based on a whole-program analysis.

We are also looking at a variety of different path-sensitive static analysis techniques to enable effective automatic mining of specifications, and to facilitate more effective program optimizations.  Our work on specification inference is in the context of C and Java, while our current focus on program optimizations is centered around developing scalable path sensitive analyses for the SSA intermediate representation in MLton.

      Representative publications:
Program analyses for debugging and testing concurrent programs.

Debugging and testing shared memory concurrent programs is difficult because of complex non-deterministic interactions introduced by scheduler-induced thread interleavings.  We have explored a number of different dynamic analyses to enable efficent debugging, replay, and regression testing of large concurrent C and Java programs.  Our techniques involve analyzing core dumps and execution traces, aligning correct and faulty executions to identify salient sources of divergence that can be used to identify potential faults.

       Representative publications:

Distributed programming for graph-structured applications.

We investigate linguistic extensions to map/reduce abstractions for programming large-scale distributed systems, with special focus on applications that manipulate large, unstructured graphs. We target real-world graph analysis tasks found in comparative analysis of biological networks as an important case study.

We address the following specific questions: (i) how can highly unstructured graph-based formalisms be cast in the map/reduce framework? (ii) how effectively can these specifications leverage existing map/reduce infrastructure? (iii) how can these abstractions and their execution environments be enhanced to provide the semantic expressiveness necessary for programmability and scalable performance? (iv) how can these analysis tasks be integrated into comprehensive scientific resources usable by the wider applications community? Answers to these questions entail exploration of linguistic extensions to existing map/reduce abstractions, defining new implementations on wide-area multicore/SMP platforms, and crafting an expressive graph analysis toolkit suitable for realistic deployment in important domains such as systems biology.

          Representative publications: