Lin Tan
Below are some example projects (our exciting research moves fast: projects complete and new exciting projects start all the time):

Desired experience: Strong coding skills and motivation in research are required. Background in security or machine learning is not required but a plus.

Possible industry involvement: Some of these projects are funded by Meta/Facebook research awards and J.P.Morgan AI research awards. 

We especially encourage applications from women, Aboriginal peoples, and other groups underrepresented in computing.

Some of the positions are funded by NSF REU, which requires U.S. citizenship and permanent residence. In your email, please indicate whether you are a U.S. citizen or permanent resident.

*** Project 1. Robotics Manipulation and Navigation, Vision-Language-Action and Reinforcement Learning Vision-Language-Action (VLA) models have recently demonstrated strong capabilities in robotic manipulation by leveraging large-scale multimodal pretraining. However, applying reinforcement learning (RL) to improve these models remains computationally expensive.

In this project, we will develop a sample-efficient RL approach for VLA models, aiming to reduce training cost while improving policy robustness and generalization. Our ultimate goal is to enable scalable and efficient RL for VLA models in real-world robotic manipulation.

Our recent piror work and background: [SELP-ICRA25] (Best Paper Award Finalist!)

*** Project 2. Agents and LLMs for Sofware Engineering including Detecting and Fixing Bugs and Vulnerabilities

In this project, we will develop machine learning approaches including agents and large language models for the entire processes of software engineering, including requirements, design, code generation, test generation, code review, and detection and fixing of software bugs and security vulnerabilities. We will also build benchmarks for such coding and engineering tasks.

Our recent piror work and background can be found here: [TENET] [USEagent-ICSE26] [RepoCod-ACL25] [VulFix-ISSTA23] [CLM-ICSE23] [KNOD-ICSE23]

*** Project 3. Binary Recovery and Foundation Models

Binary code analysis is the foundation of crucial security and development tasks, including legacy software maintenance, vulnerability detection, malware detection, and binary recovery. Combined with the sophistication of cybercrime that poses threats worldwide (e.g., cybercrime is predicted to cost $10.5 trillion annually by 2025), effective binary analysis techniques are in high demand. Existing models do not understand the syntax or semantics of binaries. The idea is to build binary foundation models considering syntaxes, compiler optimizations, hardware, etc. Our recent binary foundation model Nova is one of the first. Our CCS 2024 paper recovers data structures and identifier names from binaries, which could be useful for identifier recovery and renaming.

Our recent piror work and background can be found here: [CoRe-NeurIPS25] (Spotlight!) [Nova-ICLR25] [ReSym-CCS24] (ACM Distinguished Paper Award!)

*** project 4. Autoformalization: Inferring Specifications from Software Text for Finding Bugs and Vulnerabilities

A fundamental challenge of detecting or preventing software bugs and vulnerabilities is understanding programmers' intentions, formally called specifications. If we know the specification of a program (e.g., where a lock is needed, what input a deep learning model expects, etc.), a bug detection tool can check if the code matches the specification.

Building upon our expertise as being the first to extract specifications from code comments to automatically detect software bugs and bad comments, in this project, we will extract new types of specifications for automated reasoning and bug detection.

Our recent piror work and background: [Software Text Analytics]