Jianguo Wang

Assistant Professor

Department of Computer Science
Purdue University
West Lafayette, Indiana

Email: csjgwang@purdue.edu
Office: LWSN 1123H
Phone: (765) 496-0726





Biography

Jianguo Wang is currently a Tenure-Track Assistant Professor in the Department of Computer Science at Purdue University.

Prior to joining Purdue, he worked at Zilliz on Milvus, a purpose-built vector database system, which has been used in many data science applications including ChatGPT. Before that, he worked at Amazon Web Services (AWS) on Amazon Aurora, a cloud-native database system. He also interned at Microsoft Research, Oracle, and Samsung on various database systems.

He obtained his PhD degree in Computer Science from the University of California, San Diego, his MPhil degree from The Hong Kong Polytechnic University, and his Bachelor's degree from Zhengzhou University, China.




Research Interests

In general, I'm interested in new database systems for non-traditional architecture and non-traditional data. Currently, I'm leading a team of students working on Database Systems for the Cloud and Generative AI, including Disaggregated Databases, Vector Databases, and Unified Databases.

  • Disaggregated Database Systems (for the Cloud)
    • Memory-Disaggregated Databases
    • Storage-Disaggregated Databases
    • Disaggregated Databases with Multi-Masters
    • Distributed Shared-Memory & Shared-Storage Databases
    • [Papers: SIGMOD'24a, SIGMOD'24b, VLDBJ'24, VLDB'23, SIGMOD'23, ICDE'23]
    • [Systems: We built OpenAurora, an open-source version of Amazon Aurora, based on PostgreSQL v13.0. OpenAurora is a cloud-native database prototype optimized for the storage-disaggregated infrastructure. We are currently working on memory disaggregation and multi-masters within OpenAurora. We hope it will be used by the broader database system research community.]
    • [External Grants: NSF CAREER Award]
  • Vector Database Systems (for Large Language Models)
  • Unified Database Systems (for Structured/Unstructured Data and Generative AI)
    • We believe that the era of generative AI calls for a unified database system that seamlessly integrates the management of structured and unstructured data, while also natively supporting GenAI capabilities and its ecosystem.
    • Such a database will (1) efficiently support, at the very least, relational tables, texts, documents, images, videos, vectors, and GenAI embedding/inference/finetuning in a real-time fashion; and (2) enable efficient processing of hybrid multimodal queries, which combine traditional SQL queries and new LLM queries, for advanced data analytics.
    • Our goal is three-fold: (1) build a unified data infrastructure that provides all the data needed for generative AI (including multimodal GenAI); (2) address critical limitations of LLMs, such as hallucination, lack of real-time data, and high costs; and (3) enable interesting queries that were not possible before (through the unification of data management and GenAI).



Working Experience




Recent Publications

  • Yunan Zhang, Shige Liu, Jianguo Wang.
    Are There Fundamental Limitations in Supporting Vector Data Management in Relational Databases? A Case Study of PostgreSQL.
    Proceedings of International Conference on Data Engineering (ICDE), 2024.

  • Xi Pang, Jianguo Wang.
    Understanding the Performance Implications of the Design Principles in Storage-Disaggregated Databases.
    Proceedings of ACM Conference on Management of Data (SIGMOD), 2024.

  • Yongye Su, Yinqi Sun, Minjia Zhang, Jianguo Wang.
    Vexless: A Serverless Vector Data Management System Using Cloud Functions.
    Proceedings of ACM Conference on Management of Data (SIGMOD), 2024.

  • Qiaolin Yu, Chang Guo, Jay Zhuang, Viraj Thakkar, Jianguo Wang, Zhichao Cao.
    CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage-Disaggregated Infrastructure.
    Proceedings of ACM Conference on Management of Data (SIGMOD), 2024.

  • Jiale Lao, Yibo Wang, Yufei Li, Jianping Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Mingjie Tang, Jianguo Wang.
    GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization.
    Proceedings of Very Large Data Bases Conference (VLDB), 2024.
  • More publications...




Team




Hiring

I'm always looking for highly motivated students in database systems (hiring info). Please feel free to contact me if you're interested.




Teaching




Honors and Awards




Services

Academic Services

University Services

National Services

  • NSF Panelist: 2024