Zhuo Zhang  张倬


Postdoctoral Researcher @ Purdue CS
Oops, your browser doesn't support this application.

I am a Postdoctoral Research Associate at Purdue University, advised by Samuel Conte Professor Xiangyu Zhang.

I completed my Ph.D. degree at Purdue University in 2023. Prior to that, I obtained my B.Sc. degree with Zhiyuan Honours from Shanghai Jiao Tong University (SJTU) in 2018.

I am a core member of the CTF team 0ops. Sometimes I play with A*0*E, too.

I am also a Web3 bug hunter. The total amount of my awarded bug bounty in Web3 reaches $?00,000.

I am a Postdoctoral Research Associate at Purdue University, advised by Samuel Conte Professor Xiangyu Zhang.

I completed my Ph.D. degree at Purdue University in 2023. Prior to that, I obtained my B.Sc. degree with Zhiyuan Honours from Shanghai Jiao Tong University (SJTU) in 2018.

I am a core member of the CTF team 0ops. Sometimes I play with A*0*E, too.

I am also a Web3 bug hunter. The total amount of my awarded bug bounty in Web3 reaches $?00,000.

My research interest mainly lies in software engineering and program analysis, especially for native code without sources.

I also draw my attention to Web3 security, working on automated bug detection and the identification of MEV strategies in smart contracts.

Run the following command in a terminal (GNU):

$ echo "$(echo "ghNsgnm" | md5sum - | xxd -r -p | base64 | cut -c3-10)@purdue.edu" 

Or calculate it online:

If necessary, click here to get my GPG key.

Please refer to my publications. If necessary, click here to get my GPG key.
  • ODSCAN: Backdoor Scanning for Object Detection Models  
    Siyuan Cheng, Guangyu Shen, Guanhong Tao, Kaiyuan Zhang, Zhuo Zhang, Shengwei An, Xiangzhe Xu, Yingqi Liu, Shiqing Ma, Xiangyu Zhang
    Proceedings of the 45th IEEE Symposium on Security and Privacy (S&P 2024)
    San Francisco, CA, May, 2024  
  • To Appear

  • Demystifying Exploitable Bugs in Smart Contracts   Logo
    Zhuo Zhang, Brian Zhang, Wen Xu, Zhiqiang Lin
    Proceedings of the 45st ACM/IEEE International Conference on Software Engineering (ICSE 2023)
    Melbourne, Australia, May 2023   [dataset: ★]
  • Key Words: Smart Contract, Web3, Blockchain close
    Abstract:
    Logo      Exploitable bugs in smart contracts have caused significant monetary loss. Despite the substantial advances in smart contract bug finding, exploitable bugs and real-world attacks are still trending. In this paper we systematically investigate 516 unique real-world smart contract vulnerabilities in years 2021-2022, and study how many can be exploited by malicious users and cannot be detected by existing analysis tools. We further categorize the bugs that cannot be detected by existing tools into seven types and study their root causes, distributions, difficulties to audit, consequences, and repair strategies. For each type, we abstract them to a bug model (if possible), facilitating finding similar bugs in other contracts and future automation. We leverage the findings in auditing real world smart contracts, and so far we have been rewarded with $102,660 bug bounties for identifying 15 critical zero-day exploitable bugs, which could have caused up to $22.52 millions monetary loss if exploited.

  • D-ARM: Disassembling ARM Binaries by Lightweight Superset Instruction Interpretation and Graph Modeling
    Yapeng Ye, Zhuo Zhang, Qingkai Shi, Yousra Aafer, Xiangyu Zhang
    Proceedings of the 44th IEEE Symposiums on Security and Privacy (Oakland 2023)
    San Francisco, CA, May 2023  
  • Key Words: Disassembly, ARM, Binary Rewriting, Reverse Engineering close
    Abstract:
         ARM binary analysis has a wide range of applications in ARM system security. A fundamental challenge is ARM disassembly. ARM, particularly AArch32, has a number of unique features making disassembly distinct from x86 disassembly, such as the mixing of ARM and Thumb instruction modes, implicit mode switching within an application, and more prevalent use of inlined data. Existing techniques cannot achieve high accuracy when binaries become complex and have undergone obfuscation. We propose a novel ARM binary disassembly technique that is particularly designed to address challenges in legacy code for 32-bit ARM binaries. It features a lightweight superset instruction interpretation method to derive rich semantic information and a graph-theory based method that aggregates such information to produce final results. Our comparative evaluation with a number of state-of-theart disassemblers, including Ghidra, IDA, P-Disasm, XDA, DDisasm, and Spedi, on thousands of binaries generated from SPEC2000 and SPEC2006 with various settings, and real-world applications collected online show that our technique D-ARM substantially outperforms the baselines.

  • Poirot: Probabilistically Recommending Protections for the Android Framework
    Zeinab El-Rewini, Zhuo Zhang, Yousra Aafer
    Proceedings of the 29th Conference on Computer and Communications Security (CCS 2022)
    Los Angeles, CA, November 2022   [bibtex]
  • Key Words: Android Security, Access Control, Mobile Platform Security, Probabilistic Analysis close
    Abstract:
         Inconsistent security policy enforcement within the Android framework can allow malicious actors to improperly access sensitive resources. A number of prominent inconsistency detection approaches have been proposed in and across various layers of the Android operating system. However, the existing approaches suffer from high false positive rates as they rely solely on simplistic convergence analysis and reachability based relations to reason about the validity of access control enforcement. We observe that resource-to-access control associations are highly uncertain in the context of Android. Thus, we introduce Poirot, a next-generation inconsistency detection tool that leverages probabilistic inference to generate a comprehensive set of protection recommendations for Android framework APIs. We evaluate Poirot on four Android images and detect 26 total inconsistencies.

  • Constrained Optimization with Dynamic Bound-scaling for Effective NLP Backdoor Defense
    Guangyu Shen, Yingqi Liu, Guanhong Tao, Qiuling Xu, Zhuo Zhang, Shengwei An, Shiqing Ma, Xiangyu Zhang
    Proceedings of Thirty-ninth International Conference on Machine Learning (ICML 2022)
    Baltimore, Maryland, July 2022   [code]   [bibtex]
  • Key Words: close
    Abstract:
         Modern language models are vulnerable to backdoor attacks. An injected malicious token sequence (i.e., a trigger) can cause the compromised model to misbehave, raising security concerns. Trigger inversion is a widely-used technique for scanning backdoors in vision models. It can- not be directly applied to NLP models due to their discrete nature. In this paper, we develop a novel optimization method for NLP backdoor inversion. We leverage a dynamically reducing temperature coefficient in the softmax function to provide changing loss landscapes to the optimizer such that the process gradually focuses on the ground truth trigger, which is denoted as a one-hot value in a convex hull. Our method also features a temperature rollback mechanism to step away from local optimals, exploiting the observation that local optimals can be easily determined in NLP trigger inversion (while not in general optimization). We evaluate the technique on over 1600 models (with roughly half of them having injected backdoors) on 3 prevailing NLP tasks, with 4 different backdoor attacks and 7 architectures. Our results show that the technique is able to effectively and efficiently detect and remove backdoors, outperforming 5 baseline methods. The code is available at https://github.com/PurduePAML/DBS.

  • TensileFuzz: Facilitating Seed Input Generation in Fuzzing via String Constraint Solving
    Xuwei Liu, Wei You, Zhuo Zhang, Xiangyu Zhang
    Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2022)
    Virtual, July 2022   [code]   [bibtex]
  • Key Words: Fuzzing, Constraint Solving, Software Security close
    Abstract:
         Seed inputs are critical to the performance of mutation based fuzzers. Existing techniques make use of symbolic execution and gradient descent to generate seed inputs. However, these techniques are not particular suitable for input growth (i.e., making input longer and longer), a key step in seed input generation. Symbolic execution models very low level constraints and prefer fix-sized inputs whereas gradient descent only handles cases where path conditions are arithmetic functions of inputs. We observe that growing an input requires considering a number of relations: length, offset, and count, in which a field is the length of another field, the offset of another field, and the count of some pattern in another field, respective. String solver theory is particularly suitable for addressing these relations. We hence propose a novel technique called TensileFuzz, in which we identify input fields and denote them as string variables such that a seed input is the concatenation of these string variables. Additional padding string variables are inserted in between field variables. The aforementioned relations are reverse-engineered and lead to string constraints, solving which instantiates the padding variables and hence grows the input. Our technique also integrates linear regression and gradient descent to ensure the grown inputs satisfy path constraints that lead to path exploration. Our comparison with AFL, and a number of state-of-the-art fuzzers that have similar target applications, including Qsym, Angora, and SLF, shows that TensileFuzz substantially outperforms the others, by 39% - 98% in terms of path coverage.

  • Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Security
    Guanhong Tao, Yingqi Liu, Guangyu Shen, Qiuling Xu, Shengwei An, Zhuo Zhang, Xiangyu Zhang
    Proceedings of the 43th IEEE Symposiums on Security and Privacy (Oakland 2022)
    Oakland, CA, May 2022   [code]   [bibtex]
  • Key Words: Backdoor Attack, Model Hardening, Deep Neural Network close
    Abstract:
         The distance between two classes for a deep learning classifier can be measured by the level of difficulty in flipping all (or majority of) samples in a class to the other. The class distances of many pre-trained models in the wild are very small and do not align well with humans’ intuition (e.g., classes turtle and bird have smaller distance than classes cat and dog), making the models vulnerable to backdoor attacks, which aim to cause misclassification by stamping a specific pattern to inputs. We propose a novel model hardening technique called model orthogonalization which is an add-on training step to pretrained models, including clean models, poisoned models, and adversarially trained models. It can substantially enlarge class distances with reasonable training cost and without much accuracy degradation. Our evaluation on 5 datasets with 22 model structures shows that our technique can enlarge class distances by 177.63% on average with less than 1% accuracy loss, outperforming existing hardening techniques such as adversarial training, universal adversarial perturbation, and directly using generated backdoors. It reduces 80% false positives for a stateof-the-art backdoor scanner as the enlarged class distances allow the scanner to easily distinguish clean and poisoned models, and substantially outperforms three existing techniques in removing injected backdoors.

  • StochFuzz: Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting   Logo
    Zhuo Zhang, Wei You, Guanhong Tao, Yousra Aafer, Xuwei Liu, Xiangyu Zhang
    🏆   CSAW 2021 Best Applied Security Paper Award TOP-10 Finalists
    Proceedings of the 42th IEEE Symposiums on Security and Privacy (Oakland 2021)
    Virtually, May 2021   [poster]   [bibtex]   [benchmarks]   [code: ★]
  • Key Words: Fuzz, Binary Rewriting, Probabilistic Analysis close
    Abstract:
    Logo      Fuzzing stripped binaries poses many hard challenges as fuzzers require instrumenting binaries to collect runtime feedback for guiding input mutation. However, due to the lack of symbol information, correct instrumentation is difficult on stripped binaries. Existing techniques either rely on hardware and expensive dynamic binary translation engines such as QEMU, or make impractical assumptions such as binaries do not have inlined data. We observe that fuzzing is a highly repetitive procedure providing a large number of trial-and-error opportunities. As such, we propose a novel incremental and stochastic rewriting technique STOCHFUZZ that piggy-backs on the fuzzing procedure. It generates many different versions of rewritten binaries whose validity can be approved/disapproved by numerous fuzzing runs. Probabilistic analysis is used to aggregate evidence collected through the sample runs and improve rewriting. The process eventually converges on a correctly rewritten binary. We evaluate STOCHFUZZ on two sets of real-world programs and compare with five other baselines. The results show that STOCHFUZZ outperforms state-of-the-art binary-only fuzzers (e.g., e9patch, ddisasm, and RetroWrite) in terms of soundness and cost-effectiveness and achieves performance comparable to source-based fuzzers. STOCHFUZZ is publicly available.

  • OSPREY: Recovery of Variable and Data Structure via Probabilistic Analysis for Stripped Binary   Logo
    Zhuo Zhang, Yapeng Ye, Wei You, Guanhong Tao, Wen-chuan Lee, Yonghwi Kwon, Yousra Aafer, Xiangyu Zhang
    Proceedings of the 42th IEEE Symposiums on Security and Privacy (Oakland 2021)
    Virtually, May 2021   [bibtex]
  • Key Words: Binary Analysis, Variable Recovery, Probabilistic Analysis, Reverse Engineering close
    Abstract:
    Logo      Recovering variables and data structure information from stripped binary is a prominent challenge in binary program analysis. While various state-of-the-art techniques are effective in specific settings, such effectiveness may not generalize. This is mainly because the problem is inherently uncertain due to the information loss in compilation. Most existing techniques are deterministic and lack a systematic way of handling such uncertainty. We propose a novel probabilistic technique for variable and structure recovery. Random variables are introduced to denote the likelihood of an abstract memory location having various types and structural properties such as being a field of some data structure. These random variables are connected through probabilistic constraints derived through program analysis. Solving these constraints produces the posterior probabilities of the random variables, which essentially denote the recovery results. Our experiments show that our technique substantially outperforms a number of state-of-the-art systems, including IDA, Ghidra, Angr, and Howard. Our case studies demonstrate the recovered information improves binary code hardening and binary decompilation.

  • NetPlier: Probabilistic Network Protocol Reverse Engineering from Message Traces
    Yapeng Ye, Zhuo Zhang, Fei Wang, Xiangyu Zhang, Dongyan Xu
    Proceedings of the 28th Network and Distributed System Security Symposium (NDSS 2021)
    Virtually, February 2021   [code]   [bibtex]
  • Key Words: Network Protocol Reverse Engineering, Probabilistic Analysis close
    Abstract:
         Network protocol reverse engineering is an important challenge with many security applications. A popular kind of methods leverage network message traces. These methods rely on pair-wise sequence alignment and/or tokenization. They have various limitations such as difficulties of handling a large number of messages and dealing with inherent uncertainty. In this paper, we propose a novel probabilistic method for network trace based protocol reverse engineering. It first makes use of multiple sequence alignment to align all messages and then reduces the problem to identifying the keyword field from the set of aligned fields. The keyword field determines the type of a message. The identification is probabilistic, using random variables to indicate the likelihood of each field (being the true keyword). A joint distribution is constructed among the random variables and the observations of the messages. Probabilistic inference is then performed to determine the most likely keyword field, which allows messages to be properly clustered by their true types and enables the recovery of message format and state machine. Our evaluation on 10 protocols shows that our technique substantially outperforms the state-of-the-art and our case studies show the unique advantages of our technique in IoT protocol reverse engineering and malware analysis.

  • ALchemist: Fusing Application and Audit Logs for Precise Attack Provenance without Instrumentation
    Le Yu, Shiqing Ma, Zhuo Zhang, Guanhong Tao, Xiangyu Zhang, Dongyan Xu, Vincent E. Urias, Han Wei Lin, Gabriela Ciocarlie, Vinod Yegneswaran, Ashish Gehani
    Proceedings of the 28th Network and Distributed System Security Symposium (NDSS 2021)
    Virtually, February 2021   [bibtex]
  • Key Words: Cyber Attack Detection, Attack Forensics, Attack Provenance Graph close
    Abstract:
         Cyber-attacks are becoming more persistent and complex. Most state-of-the-art attack forensics techniques either require annotating and instrumenting software applications or rely on high quality execution profile to serve as the basis for anomaly detection. We propose a novel attack forensics technique ALchemist. It is based on the observations that built-in application logs provide critical high-level semantics and audit log provides low-level fine-grained information; and the two share a lot of common elements. ALchemist is hence a log fusion technique that couples application logs and audit log to derive critical attack information invisible in either log. It is based on a relational reasoning engine Datalog and features the capabilities of inferring new relations such as the task structure of execution(e.g., tabs in firefox), especially in the presence of complex asynchronous execution models, and high-level dependencies between log events. Our evaluation on 15 popular applications including firefox, Chromium, and OpenOffice, and 14 APT attacks from the literature demonstrates that although ALchemist does not require instrumentation, it is highly effective in partitioning execution to autonomous tasks(in order to avoid bogus dependencies) and deriving precise attack provenance graphs, with very small overhead. It also outperforms NoDoze and OmegaLog, two state-of-art techniques that do not require instrumentation.

  • PMP: Cost-Effective Forced Execution with Probabilistic Memory Pre-Planning
    Wei You, Zhuo Zhang, Yonghwi Kwon, Yousra Aafer, Fei Peng, Yu Shi, Carson Makena Harmon, Xiangyu Zhang
    Proceedings of the 41th IEEE Symposiums on Security and Privacy (Oakland 2020)
    Virtually, May 2020   [artifact]   [bibtex]
  • Key Words: Malware Analysis, Forced Execution, Probabilistic Analysis close
    Abstract:
         Malware is a prominent security threat and exposing malware behavior is a critical challenge. Recent malware often has payload that is only released when certain conditions are satisfied. It is hence difficult to fully disclose the payload by simply executing the malware. In addition, malware samples may be equipped with cloaking techniques such as VM detectors that stop execution once detecting that the malware is being monitored. Forced execution is a highly effective method to penetrate malware self-protection and expose hidden behavior, by forcefully setting certain branch outcomes. However, an existing state-of-the-art forced execution technique X-Force is very heavy-weight, requiring tracing individual instructions, reasoning about pointer alias relations on-the-fly, and repairing invalid pointers by on demand memory allocation.
         We develop a light-weight and practical forced execution technique. Without losing analysis precision, it avoids tracking individual instructions and on demand allocation. Under our scheme, a forced execution is very similar to a native one. It features a novel memory pre-planning phase that pre-allocates a large memory buffer, and then initializes the buffer, and variables in the subject binary, with carefully crafted values in a random fashion before the real execution. The pre-planning is designed in such a way that dereferencing an invalid pointer has a very large chance to fall into the pre-allocated region and hence does not cause any exception, and semantically unrelated invalid pointer dereferences highly likely access disjoint (pre-allocated) memory regions, avoiding state corruptions with probabilistic guarantees. Our experiments show that our technique is 84 times faster than X-Force, has 6.5X and 10% fewer false positives and negatives for program dependence detection, respectively, and can expose 98% more malicious behaviors in 400 recent malware samples.

  • BDA: Practical Dependence Analysis for Binary Executables by Unbiased Whole-Program Path Sampling and Per-Path Abstract Interpretation Logo
    Zhuo Zhang, Wei You, Guanhong Tao, Guannan Wei, Yonghwi Kwon, Xiangyu Zhang
    🏆   ACM SIGPLAN Distinguished Paper Award
    Proceedings of the ACM on Programming Languages Volume 3 Issue OOPSLA (OOPSLA 2019)
    Athens, Greece, October 2019   [artifact]   [bibtex]
  • Key Words: Path Sampling, Abstract Interpretation, Binary Analysis, Data Dependence close
    Abstract:
    Logo      Binary program dependence analysis determines dependence between instructions and hence is important for many applications that have to deal with executables without any symbol information. A key challenge is to identify if multiple memory read/write instructions access the same memory location. The state-of-the-art solution is the value set analysis (VSA) that uses abstract interpretation to determine the set of addresses that are possibly accessed by memory instructions. However, VSA is conservative and hence leads to a large number of bogus dependences and then substantial false positives in downstream analyses such as malware behavior analysis. Furthermore, existing public VSA implementations have difficulty scaling to complex binaries.
         In this paper, we propose a new binary dependence analysis called BDA enabled by a randomized abstract interpretation technique. It features a novel whole program path sampling algorithm that is not biased by path length, and a per-path abstract interpretation avoiding precision loss caused by merging paths in traditional analyses. It also provides probabilistic guarantees. Our evaluation on SPECINT2000 programs shows that it can handle complex binaries such as gcc whereas VSA implementations from the-state-of-art platforms have difficulty producing results for many SPEC binaries. In addition, the dependences reported by BDA are 75 and 6 times smaller than Alto, a scalable binary dependence analysis tool, and VSA, respectively, with only 0.19% of true dependences observed during dynamic execution missed (by BDA). Applying BDA to call graph generation and malware analysis shows that BDA substantially supersedes the commercial tool IDA in recovering indirect call targets and outperforms a state-of-the-art malware analysis tool Cuckoo by disclosing 3 times more hidden payloads.

  • Probabilistic Disassembly
    Kenneth Miller, Yonghwi Kwon, Yi Sun, Zhuo Zhang, Xiangyu Zhang, Zhiqiang Lin
    Proceedings of the 41st ACM/IEEE International Conference on Software Engineering (ICSE 2019)
    Montréal, QC, Canada, May 2019   [code]   [bibtex]
  • Key Words: Disassembly, Binary Rewrite, Probabilistic Analysis close
    Abstract:
         Disassembling stripped binaries is a prominent challenge for binary analysis, due to the interleaving of code segments and data, and the difficulties of resolving control transfer targets of indirect calls and jumps. As a result, most existing disassemblers have both false positives (FP) and false negatives (FN). We observe that uncertainty is inevitable in disassembly due to the information loss during compilation and code generation. Therefore, we propose to model such uncertainty using probabilities and propose a novel disassembly technique, which computes a probability for each address in the code space, indicating its likelihood of being a true positive instruction. The probability is computed from a set of features that are reachable to an address, including control flow and data flow features. Our experiments with more than two thousands binaries show that our technique does not have any FN and has only 3.7% FP. In comparison, a state-of-the-art superset disassembly technique has 85% FP. A rewriter built on our disassembly can generate binaries that are only half of the size of those by superset disassembly and run 3% faster. While many widely-used disassemblers such as IDA and BAP suffer from missing function entries, our experiment also shows that even without any function entry information, our disassembler can still achieve 0 FN and 6.8% FP.

Academic Awards

Selected Capture-The-Flag (CTF)

  • 1st place at Paradigm CTF 2023 (w/ Offside Labs)
  • 1st place at the 40th IEEE S&P Celebration Scavenger Hunt
  • 4th place at DEFCON CTF 2018 (w/ A*0*E)
  • 3rd place at DEFCON CTF 2017 (w/ A*0*E)

Selected Web3 Bug Bounties

  • Critical bug report for Anonymous Project, awarded $3,000
  • Critical bug report for Duet Protocol, awarded $50,000
  • Critical bug report for Grizzly.fi, awarded $10,000
  • Critical bug report for ApeX Protocol, awarded ~$25,000
  • Critical bug report for Infinity NFT Marketplace, awarded $20,000
  • Critical bug reports for ENS, awarded ~$40,000
  • Program Committee Member
  • The ACM Conference on Computer and Communications Security (CCS), 2024
    International Conference on Software Engineering (ICSE), 2025
    International Conference on Automated Software Engineering (ASE), 2024
    International Symposium on Software Testing and Analysis (ISSTA), 2024
    The ACM ASIA Conference on Computer and Communications Security (ASIACCS), 2024
    Workshop on Binary Analysis Research (BAR), 2022
  • Reviewer
  • IEEE Transactions on Software Engineering
    IEEE/ACM Transactions on Networking
    The Association for Computational Linguistics (ACL) Rolling Review, 2023
  • Sub-reviewer
  • USENIX Security Symposium
    IEEE Symposium on Security and Privacy (Oakland)
    The Network and Distributed System Security Symposium (NDSS)
    International Conference on Dependable Systems and Networks (DSN)
    International Conference on Automated Software Engineering (ASE)
    International Symposium on Software Testing and Analysis (ISSTA)
    International Symposium on the Foundations of Software Engineering (FSE)
    The ACM Conference on Computer and Communications Security (CCS)
    The ACM Conference on Systems, Programming, Languages, and Applications (OOPSLA)

August 2018 - Present

Mentor: Xiangyu Zhang
Research Assistant: Purdue University, West Lafayette, IN

August 2017 - June 2019

Mentor: Anton Kochkov
Project: radeco ( ★)
Radare2-based binary analysis framework

June 2016 - July 2017

Mentor: Sen Nie
Research Intern: Keen Security Lab of Tencent, Shanghai, China
FREE-FALL: HACKING TESLA FROM WIRELESS TO CAN BUS  [video]   [writeup]   [slides]