Xiangyu Zhang Pronounciation (SHAHNG YOO JAHNG)

Samuel Conte Professor of Computer Science

Purdue University

Biography

Xiangyu Zhang is a professor specializing in AI security, software analysis and cyber forensics. His work involves developing techniques to detect bugs, including security vulnerabilities, in traditional software systems as well as AI models and systems, and to diagnose runtime failures. He has served as the Principal Investigator (PI) for numerous projects funded by organizations such as DARPA, IARPA, ONR, NSF, AirForce, and industry. Many of the techniques developed by his team have successfully transitioned into practical applications. His research outcome has been published on top venues in the areas of Security, AI, Software Engineering, and Programming Languages, and recognized by various distinguished paper awards including the prestigious ACM Distinguished Dissertation Awards. He has mentored over 30 PhD students and post-docs, with fifteen of them securing academic positions in various universities. Many of them have been honored with NSF Career Awards or comparable recognitions.

Interests

Artificial Intelligence Security
Program Analysis
Cyber forensics

Ongoing Projects

Detecting Deception in Natural Language Processing Applications by Model Interpretation

Good Ventures Foundation Feb 2023

The goal of that project is to develop principled techniques that can detect deceptive contents in natural language conversions. These contents could be generated by humans or by Large Language Models (LLMs). These techniques are based on interpreting conversation contents (e.g., using another LLM) and analyzing model internals (e.g., using gradient-based optimizations).

On-the-fly Cyber Crime Scene Transcription

ONR Jan 2023

There has been an upsurge of cybercrime in recent years. The number of cybercrime in 2022 has increased by 600% compared to before pandemic, and the annual damage hits 6 trillion dollars. On the other hand, attack investigation is becoming increasingly challenging. It takes 228 days on average to identify an attack and 80 more days to investigate one. In industries that face more attacks, such as healthcare, the detection of an attack takes 329 days on average. Such lengthy dwell-time, i.e., the time an attack remains undetected leads to substantial monetary loss and institutional sabotage. In this project, we aim to develop an AI-based log transcribing technique. We will first define a universal behavior description language that can describe high level system/user behaviors that are forensics relevant, such as opening a URL, saving an attachment, playing a video, and chatting with a remote agent. This language will be so general that it can describe behaviors of all popular applications. We will then formulate the problem as a machine translation problem that translates audit logs generated by the unlying operating systems (in a very low level language) to descriptions in the human understandable high level language.

Program Analysis for Domain Specific Language Extraction of Legacy Software

DARPA Jul 2021

As part of the DARPA V-SPELLS program, this project aims to automate domain specific program analysis. There are inherently hard challenges in general program analysis, such as handling pointers, indirect calls, constructing loop invariants, and decompilation, despite the steady progress the community has been making. Fuzzing techniques and bug finding tools are still limited to finding low level bugs such as memory bugs, and formal methods often require substantial human efforts to translate domain specific and application specific properties down to annotations to implementation artifacts. The project focuses on lifting implementation to post hoc domain specific models, providing a new perspective to these hard problems. Instead of dealing with the low level implementation details, we abstract them away such that their high-level semantics become clean and easy to reason. With lifted domain models, domain specific properties can be easily checked. This allows existing fuzzers to find complex logical bugs, formal methods can be substantially simplified and automated. We are interested in lifting implementations in various domains such as parsers, network protocols, robotic systems, smart contracts, and even binary executables.

Scanning AI Models for Backdoors by Artificial Brain Simulation

IARPA Jul 2020

As part of the IARPA TrojAI program, this project aims to develop techniques to scan backdoors injected into AI models of varius modalities such as Computer Vision, Object Detection, Natural Language Processing, and Cyber Security. AI backdoor attacks leverage vulnerabilities in pre-trained models such that inputs stamped with a specific (small) input pattern (e.g., a polygon patch) or undergone some fixed transformation (e.g., applying a filter) induce intended model misbehaviors, such as misclassification to a target label. The misbehavior-inducing input patterns/transformations are called backdoor triggers. The vulnerabilities are usually injected through various data poisoning methods. Some even naturally exist in normally trained models. The attack model of AI backdoors becomes increasingly similar to that of traditional cyber attacks (on software), and in the meantime AI models have more and more applications in critical tasks such as autonomous driving and ID recognition (for access control). Defending model backdoors hence becomes a pressing need. In this project, we develop novel analytic techniques to scan AI models for Trojans. Our approach analyzes inner neuron behaviors by determining how output activations change when we introduce different stimulations to a neuron. This is analogous to Electrical Brain Stimulation (EBS), a technique invented in the 19th century, and widely used ever since to study the functionalities/behaviors of human/animal brain neurons. EBS applies an electrical current of various strength to stimulate selected neurons, and then observes the external behavior, such as happiness or aversive reactions. Hence, we call our technique Artificial Brain Stimulation (ABS).

TrojAI is a competition based program. Competitions are organized in rounds, each having a different focus such as Computer Vision (CV), Natural Language Processing (NLP), and Object Detection. In each round, hundreds of AI models are provided with half of them containing trojans. Performers are supposed identify the trojaned models. Team performance is recorded by a public leaderboard. A round ends once any team has reached the round target (and won the round), and the next round often starts immediately. Our team has been having top performance in the past three years (please refer to the TrojAI leaderboard)

AI Model Debugging by Analyzing Model Internals with Python Program Analysis

NSF Oct 2019

Just like software inevitably contains bugs and software debugging is a key step in software development, AI models may have undesirable behaviors, which we call model bugs, and model debugging IS an essential step in intelligent software engineering. Model bugs are different from traditional coding bugs. They are misconducts in the model engineering process, such as biased training data and problematic model structure, that lead to undesirable consequences such as low model accuracy and vulnerabilities to adversarial sample attacks, in which normal inputs are mutated (e.g., by perturbations not human perceptible) to induce mis-classification. We observe that AI models, especially neural network models, are essentially programs (e.g., in Python) that compute state variable values (called neurons) through multiple program phases (called layers). The values of neurons in a layer are computed from those of the previous layer through matrix multiplication and activation function, which is some kind of thresholding function to determine if values shall be used in the computation of next layer (called neuron activated). Intuitively, each neuron is considered denoting some abstract feature(s). The computation from a layer to the next is a further step of feature abstraction. The knowledge acquired in training is encoded in the weight values of the matrices. As such, AI model debugging can substantially benefit from analyzing these programs and their execution states. Hence, we propose to learn from the substantial experience of software debugging that is built up by the software engineering and program analysis community over decades of intensive research and develop novel analyses to inspect model internals for diagnosis and repair of model defects.

Principled Co-Reasoning of Software and Natural-Language Artifacts

NSF Jul 2019

The goal of that project is to develop principled co-analysis of code and natural language (NL) artifacts, including code comments, change logs, manual pages, constant strings in code, and variable and function names. That proposed research treats NL artifacts as first-class objects, instead of simple sources for additional information, to take full advantage of software NL artifacts. For example, a comment made by the developer at a code location can be propagated to other correlated code locations to help understanding and maintenance.

Research Group

Current Students

Guanhong Tao (expected 2024, on market 🔥)
Zhiyuan Cheng (expected 2024)
Shengwei An (expected 2025)
Le Yu (expected 2025)
Sayali Kate
Yi Sun
Yu Shi
Guangyu Shen
Xuwei Liu
Yunshu Mao
Siyuan Cheng
Frank Feng
Xiangzhe Xu
Kaiyuan Zhang (co-advised with Ninghui Li)
Xuan Chen
Lu Yan
Mingwei Zheng
Zhou Xuan
Hanxi Guo
Syed Yusuf Ahmed
Xiaolong Jin
Jiasheng Jiang

Current Postdocs

Zhuo Zhang, previously PhD from Purdue
Stephen Wang, previously PhD from HKUST

Current Visitors

Wuqi Zhang, PhD student from HKUST
Junmin Zhu, previously MS from SJTU

Former Students

Yapeng Ye, 2024 PhD graduation, Google
I Luk Kim, 2023 PhD graduation, Senior Computational Scientist at Purdue
Qiuling Xu, 2023 PhD graduation, Netflix
Yingqi Liu, 2023 PhD graduation, Research Scientist at Microsoft
Hongjun Choi, 2022 PhD graduation, Assistant Professor at Daegu Gyeongbuk Institute of Science and Technology
Fei Wang, 2021 PhD graduation, Research Scientist at Meta
Dohyeong Kim, 2020 PhD graduation, Google
Wen-chuan Lee, 2019 PhD graduation, Senior Scientist at Apple
Shiqing Ma, 2019 PhD graduation, Assistant Professor at University of Massachusetts, Amherst, NSF Career Awardee, previously Assistant Professor at Rutgers University
Yonghwi Kwon, 2018 PhD graduation, Assistant Professor at University of Maryland, College Park, NSF Career Awardee, previously Assistant Professor at University of Virginia
Weihang Wang, 2018 PhD graduation, Assistant Professor at University of Southern California, NSF Career Awardee, previously Assistant Professor at SUNY Buffalo
Jianjun Huang, 2017 PhD graduation, Associate Professor at Renmin University (China)
Brendan Saltaformmagio, 2017 PhD graduation (co-advised with Dongyan Xu), Associate Professor at Georgia Tech, Recipients of ACM SIGACT Distinguished Dissertation Award and NSF Career Award
Chung Hwan Kim, 2017 PhD graduation (co-advised with Dongyan Xu), Assistant Professor at UT Dallas, previously NEC Labs
Fei Peng, 2017 PhD graduation, Senior Manager at Apple
Zhui Deng, 2017 PhD graduation (co-advised with Dongyan Xu), Apple
Kyuhyung Lee, 2014 PhD graduation (co-advised with Dongyan Xu), Associate Professor at University of Georgia
Yunhui Zheng, 2014 PhD graduation, Co-founder and CTO of Sec3, previously Research Staff at IBM TJ Watson
Bao Tao, 2014 PhD graduation, Lead of Engineering of Sec3 Sec3, previously Google
William N. Sumner, 2014 PhD graduation, Associate Professor at Simon Fraser University
Vinai Sundaram, 2013 PhD graduation (co-advised with Patrick Eugster), Co-founder and CEO OF SensorHound Innovations LLC
Dasarath Weeratunge, 2012 PhD graduation (co-advised with Suresh Jagannathan), Intel Lab
Zhiqiang Lin, 2014 PhD graduation (co-advised with Dongyan Xu), Distinguished Professor at Ohio State University, previously UT Dallas, IEEE Fellow, NSF CAREER Award and AFOSR YIP Award
Bin Xin, 2010, Google
Mingwu Zhang, 2008 (co-advised with Sunil Prabhakar), Microsoft

Former Postdocs

Qingkai Shi, 2022-2023, Assistant Professor at Nanjing University, China
Hongyu Liu, 2021-2022, Gifted Young Researcher at HuaWei
Mijung Kim, 2020-2021, Assistant Professor at Ulsan National Institute of Science and Technology (Korea)
Yousra Aafer, 2019-2021, Assistant Professor at University of Waterloo (Canada), Awardee of Discovery Grant by NSERC, similar to NSF Career in the US
Juan Zhai, 2018-2019, Assistant Professor at University Massachusetts Amherst, previously Professor of Practice at Rugters University, Assistant Professor at Nanjing University (China)
Wei You, 2019-2022, Associate Professor at Renmin University (China), recipient of Excellent Young Scientists Overseas
Peng Liu, 2014-2015, Research Scientist at Uber, previously Google and IBM TJ Watson
Zhiyong Shan, 2012-2013, Assocaite Professor at Wichita State University

Former Visitors (incomplete list)

Lei Xu, 2013-2014, Associate Professor at Nanjing University (China)
Lin Chen, 2013-2014, Associate Professor at Nanjing University (China)
Zhaogui Xu, 2014-2016, Senior Software Engineer at Ant Financial Tech (China)
Zhifei Chen, 2015, Assistant Professor at Nanjing University of Science and Technology (China)
Chang-Ai Sun, 2014-2015, Professor and Head of the Department of Computer Science and Technology at University of Science and Technology Beijing (China)
Juan Zhai, 2014-2015, Assistant Professor at University Massachusetts Amherst
Hao Sun, 2014, Engineer at Alibaba
Wanying Ma, 2016-2017, previously PhD student at Nanjing University (China)
Zhenhao Tang, 2016-2017, previously PhD student at Nanjing University (China)
Yang Zhang, 2016-2018, Professor at Hebei University of Sci. and Tech. (China)
Yuan Zhuang, 2019-2020, PhD student at Nanjing University (China)

Personal

I like playing soccer. I play twice a week. I used to coach kids soccer for many years. I also like scuba diving and all sorts of water sports. More can be found here

Publications

Please refer to either my DBLP page or Google Scholar page.

Teaching

CS590 AI and Security in fall 2024
CS307 Software Engineering in spring 2024
CS510 Software Engineering in fall 2023
CS240 C Programming in spring 2023
CS510 Software Engineering in spring 2022
CS307 Software Engineering in fall 2021
CS240 C Programming in spring 2020.
CS590 Program Analysis for Deep Learning in fall 2019.
CS408 Software Testing in spring 2019.

Contact

xyzhang at cs dot purdue dot edu
765-496-9415
LAWSON 3154K, 305 N. University Street, West Lafayette, IN 47907
Monday 12:30 to 13:30