Lin Tan
Below are some example projects (our exciting research moves fast: projects complete and new exciting projects start all the time):

Desired experience: Strong coding skills and motivation in research are required. Background in security or machine learning is not required but a plus.

Possible industry involvement: Some of these projects are funded by Meta/Facebook research awards and J.P.Morgan AI research awards. 

We especially encourage applications from women, Aboriginal peoples, and other groups underrepresented in computing.

Some of the positions are funded by NSF REU, which requires U.S. citizenship and permanent residence. In your email, please indicate whether you are a U.S. citizen or permanent resident.

*** Project 1. Data-Free Model Extraction

Many deployed machine learning models such as ChatGPT and Codex are accessible via a pay-per-query system. It is profitable for an adversary to steal these models for either theft or reconnaissance. Recent model-extraction attacks on Machine Learning as a Service (MLaaS) systems have moved towards data-free approaches, showing the feasibility of stealing models trained with difficult-to-access data. However, these attacks are ineffective or limited due to the low accuracy of extracted models and the high number of queries to the models under attack. The high query cost makes such techniques infeasible for online MLaaS systems that charge per query.

In this project, we will design novel approaches to get higher accuracy and query efficiency than prior data-free model extraction techniques.

Our recent piror work and background can be found here: [DisGUIDE-AAAI23]

*** Project 2. Language Models for Detecting and Fixing Software Bugs and Vulnerabilities

In this project, we will develop machine learning approaches including code language models to automatically learn bug and vulnerability patterns and fix patterns from historical data to detect and fix software bugs and security vulnerabilities. We will also study and compare general code language models and domain-specific language models.

Our recent piror work and background can be found here: [VulFix-ISSTA23] [CLM-ICSE23] [KNOD-ICSE23]

*** project 3. Inferring Specifications from Software Text for Finding Bugs and Vulnerabilities

A fundamental challenge of detecting or preventing software bugs and vulnerabilities is to know programmers' intentions, formally called specifications. If we know the specification of a program (e.g., where a lock is needed, what input a deep learning model expects, etc.), a bug detection tool can check if the code matches the specification. 

Building upon our expertise on being the first to extract specifications from code comments to automatically detect software bugs and bad comments, in this project, we will analyze various new sources of software textual information (such as API documents and StackOverflow Posts) to extract specifications for bug detection. For example, the API documents of deep learning libraries such as TensorFlow and PyTorch contain a lot of input constraint information about tensors. 

Our recent piror work and background can be found here: [Software Text Analytics]


*** Project 4. Testing Deep Learning Systems  

We will build cool and novel techniques to make deep learning code such as TensorFlow and PyTorch reliable and secure. We will build it on top of our award-winning paper (ACM SIGSOFT Distinguished Paper Award)! 

Machine learning systems including deep learning (DL) systems demand reliability and security. DL systems consist of two key components: (1) models and algorithms that perform complex mathematical calculations, and (2) software that implements the algorithms and models. Here software includes DL infrastructure code (e.g., code that performs core neural network computations) and the application code (e.g., code that loads model weights). Thus, for the entire DL system to be reliable and secure, both the software implementation and models/algorithms must be reliable and secure. If software fails to faithfully implement a model (e.g., due to a bug in the software), the output from the software can be wrong even if the model is correct, and vice versa.  

This project aims to use novel approaches including differential testing to detect and localize bugs in DL software (including code and data) to address the testing oracle challenge. 

Our recent piror work and background can be found here: [EAGLE-ICSE22] [Fairness-NeurIPS21] [Variance-ASE20]