Lin Tan @ University of Waterloo

Software, Data, and Models Resulting from Our Research:

ACL-25 [Code & Data]	Can Language Models Replace Programmers for Coding? RepoCod Says 'Not Yet'. Shanchao Liang, Yiran Hu, Nan Jiang, and Lin Tan. The 63rd Annual Meeting of the Association for Computational Linguistics. July-August, 2025. Vienna, Austria.
ICRA-25 [Demo & Data]	SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models. Yi Wu, Zikang Xiong, Yiran Hu, Shreyash Iyengar, Nan Jiang, Aniket Bera, Lin Tan, and Suresh Jagannathan. The International Conference on Robotics & Automation. May 2025. Atlanta, USA. Best Paper Award Finalist!
ICLR-25 [Model]	Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning. Nan Jiang, Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Lin Tan, Xiangyu Zhang, and Petr Babkin. Acceptance Rate: 32.1%
AAAI-25 [Data]	LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement. Nan Jiang, Shanchao Liang, Chengxiao Wang, Jiannan Wang, and Lin Tan. In the proceedings of AAAI Conference on Artificial Intelligence. February-March, 2025. Philadelphia, Pennsylvania, USA. Acceptance Rate: 23.4% [Poster]
CCS-24 [Code & Data]	ReSym: Harnessing LLMs to Recover Variable and Data Structure Symbols from Stripped Binaries. Danning Xie, Zhuo Zhang, Nan Jiang, Xiangzhe Xu, Lin Tan, and Xiangyu Zhang. In the proceedings of the ACM Conference on Computer and Communications Security, October 2024. Salt Lake City, USA. Won Distinguished Paper Award!
ISSTA-23 [Code & Data]	How Effective are Neural Networks for Fixing Security Vulnerabilities? Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, and Sameena Shah. In the proceedings of ACM SIGSOFT International Symposium on Software Testing and Analysis. July 2023. Seattle, USA. Acceptance Rate: 23% (49/215)
ICSE-23 [Code & Data]	Impact of Code Language Models on Automated Program Repair. Nan Jiang, Kevin Liu, Thibaud Lutellier, and Lin Tan. In the proceedings of the International Conference on Software Engineering. May 2023. Melbourne, Australia. Acceptance Rate: 26% (208/796)
ICSE-23 [Code & Data]	KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair. Nan Jiang, Thibaud Lutellier, Yiling Lou, Lin Tan, Dan Goldwasser, and Xiangyu Zhang. In the proceedings of the International Conference on Software Engineering. May 2023. Melbourne, Australia. Acceptance Rate: 26% (208/796)
AAAI-23 [Code & Data]	DisGUIDE: Disagreement-Guided Data-Free Model Extraction. (Oral Presentation) Jonathan Rosenthal, Eric Enouen, Hung Viet Pham, and Lin Tan. In the proceedings of AAAI Conference on Artificial Intelligence. February, 2023. Washington D.C., USA. Acceptance Rate: 19.6%
ICSE-22 [Code & Data]	EAGLE: Creating Equivalent Graphs to Test Deep Learning Libraries. Jiannan Wang, Thibaud Lutellier, Shangshu Qian, Hung Viet Pham, and Lin Tan. In the proceedings of the International Conference on Software Engineering. May 2022. Pittsburgh, USA. Acceptance Rate: 26% (197/751)
ISSTA-22 [Code & Data]	DocTer: Documentation-Guided Fuzzing for Testing Deep Learning API Functions. Danning Xie, Yitong Li, Mijung Kim, Hung Viet Pham, Lin Tan, Xiangyu Zhang, Mike Godfrey. In the proceedings of ACM SIGSOFT International Symposium on Software Testing and Analysis. July 2022. Virtual. Acceptance Rate: 24% (61/250)
NeurIPS-21 [Code & Data]	Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training. Shangshu Qian, Hung Viet Pham, Thibaud Lutellier, Zeou Hu, Jungwon Kim, Lin Tan, Yaoliang Yu, Jiahao Chen, and Sameena Shah. To appear in the proceedings of the Conference on Neural Information Processing Systems, December 2021. Virtual. Acceptance Rate: 26%
ASE-21 (Tool) [Code & Data]	DEVIATE: A Deep Learning Variance Testing Framework. Hung Viet Pham, Mijung Kim, Lin Tan, Yaoliang Yu, and Nachiappan Nagappan. In the proceedings of the IEEE/ACM International Conference on Automated Software Engineering, November, 2021. Virtual/Melbourne, Australia.
ICSE-21 [Code&Data]	CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. Nan Jiang, Thibaud Lutellier, and Lin Tan. In the proceedings of the International Conference on Software Engineering. May 2021. Acceptance Rate: 22% (138/615)
ASE-20 [Code&Data]	Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance. Hung Viet Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, and Nachiappan Nagappan. In the proceedings of the IEEE/ACM International Conference on Automated Software Engineering, September, 2020. Virtual/Melbourne, Australia. Acceptance Rate: 22.5% (93/414) Won ACM SIGSOFT Distinguished Paper Award!
ISSTA-20 [Code & Data]	CoCoNuT: Combining Context-Aware Neural Translation Models using Ensemble for Program Repair. Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei and Lin Tan. In the proceedings of ACM SIGSOFT International Symposium on Software Testing and Analysis. July 2020. Virtual/Los Angeles, United States. Acceptance Rate: 26.5% (43/162)
FSE-18/EMSE-18 [Data] (Journal First)	On the Correctness of Electronic Documents: Studying, Finding, and Localizing Inconsistency Bugs in PDF Readers and Files. (Open Access) Tomasz Kuchta, Thibaud Lutellier, Edmund Wong, Lin Tan, and Cristian Cadar. (* The first two authors contributed equally to this paper) Accepted to the Springer Empirical Software Engineering. (34 pages)
FSE-17 [Data]	QTEP: Quality-aware Test Case Prioritization. Song Wang, Jaechang Nam and Lin Tan. In the Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on the Foundations of Software Engineering. Acceptance Rate: 24% (72/295)
TSE-17 (Journal) [Data]	Measuring the Impact of Code Dependencies on Software Architecture Recovery Techniques. Thibaud Lutellier, Devin Chollak, Joshua Garcia, Lin Tan, Derek Rayside, Nenad Medvidovic and Robert Kroeger. In IEEE Transactions on Software Engineering.
ICSE-15 (SEIP) [Data]	Comparing Software Architecture Recovery Techniques Using Accurate Dependencies. Thibaud Lutellier, Devin Chollak, Joshua Garcia, Lin Tan, Derek Rayside, Nenad Medvidovic and Robert Kroeger. In the proceedings of the International Conference on Software Engineering. Acceptance Rate: 22.5% (23/102)
SANER-15 [Code & Data]	CloCom: Mining Existing Source Code for Automatic Comment Generation. Edmund Wong, Taiyue Liu and Lin Tan. In the proceedings of the IEEE International Conference on Software Analysis, Evolution, and Reengineering. (10 pages) Acceptance Rate: 31.9% (46/144) [Poster]
EMSE-14 (Journal) [Data]	SWordNet: Inferring Semantically Related Words from Software Context. Jinqiu Yang and Lin Tan. In the Springer Empirical Software Engineering. (28 pages) [DOI] [BIBTEX]
ASE-13 [Data]	AutoComment: Mining Question and Answer Sites for Automatic Comment Generation. Edmund Wong, Jinqiu Yang, and Lin Tan. In the proceedings of the IEEE/ACM International Conference on Automated Software Engineering, New Idea Papers. (6 pages) Acceptance Rate: 23% (74/317) [BIBTEX]
ICST-12 [Code & Data]	@tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies. Shin Hwei Tan, Darko Marinov, Lin Tan and Gary T. Leavens. In the proceedings of the 5th International Conference on Software Testing, Verification and Validation. April, 2012. (10 pages) Acceptance Rate: 26.9% (39/145). [Slides in PDF] [BIBTEX]
ICSE-09 [Code & Data]	Listening to Programmers - Taxonomies and Characteristics of Comments in Operating System Code. (Alphabetic order) Yoann Padioleau, Lin Tan and Yuanyuan Zhou. In the proceedings of the International Conference on Software Engineering. May, 2009. (11 pages) Acceptance Rate: 12.3% (50/405). [PS] [Slides in PDF] [BIBTEX]