Midwest Data Day 2026

Systems • LLMs • AI Data Pipelines

Friday, April 24, 2026  |  Chicago, IL  |  10:00 AM – 5:00 PM (Chicago Time)

A Decade in the Making

It is time for a reunion. A decade after the inaugural Midwest Big Data Opportunities and Challenges (MBDOC) workshop in 2016, we are gathering regional data researchers, practitioners, and industry leaders to reconnect and discuss the current landscape of database systems, LLM infrastructure, and AI data management.

Hosted at the UChicago Data Science Institute, this workshop is designed to reignite connections across institutions including UChicago, UIC, DePaul, Northwestern, UWisc, GoogleWisc, UIUC, Purdue, UMich, and OSU. Beyond reconnecting, we want to use this event to create a sustainable structure for a recurring gathering—perhaps a bi-annual Midwest Data Day—so we don't have to wait another decade to come together.

The Legacy

Midwest Data Day builds on a rich tradition of regional collaboration. Explore the archives of our past gatherings:

Program Schedule

All times are in Central Daylight Time (Chicago).

  • Talk Guidelines: Each presentation is exactly 10 minutes + 3 minutes for Q&A. Please strictly adhere to this time limit to keep the sessions on track.
  • Poster Guidelines: Please bring your physical poster and attach your assigned Poster ID (found in the poster session program). We will collect posters in the morning to mount on 24" x 36" boards (horizontal/vertical). Posters will remain displayed for both the lunch and afternoon reception sessions.
10:00 AM
Welcome & Opening Remarks
Aaron Elmore & Michael Franklin (UChicago)
10:20 AM
Morning Session I: Vector DBs & AI Acceleration
View Talks
Jianguo Wang (Purdue)
Databases for AI: The Case for Vector Databases
Mohsen Dehghankar (UIC)
Hierarchical Epsilon-Net Graphs: Time Guarantees for HNSW in Approximate Nearest Neighbor Search
Yanran Wu (Purdue)
Position-Aware Dynamic Drafter Selection in Speculative Decoding
11:00 AM
Break
11:10 AM
Morning Session II: DB Architecture & Optimization
View Talks
Zhongwei Xu (UMich)
LakeHelm: Zero-Shot Lakehouse Advisor for Joint Engine-Format Selection and Configuration
Stavros Sintos (UIC)
Faster Relational Algorithms Using Geometric Data Structures: Applications in Clustering, Diversity, and Beyond
Wenjie Hu (UW-Madison)
Nemo: Efficient Caching with Index Pushdown for Disaggregated OLTP Databases
Udesh Kumarasinghe (Purdue)
iPDB -- Optimizing SQL Queries with ML and LLM Predicates
12:00 PM
Lunch & Student Poster Session I

The poster session will be held alongside lunch. Please feel free to enjoy your meal and socialize around the posters.
Note to Presenters: You are required to be stationed at your poster from 1:30 PM to 2:00 PM (the half hour before the afternoon talk sessions begin).

View Posters by Topic
LLMs & GenAI for Data
#3 Ting Cai (UW-Madison)
Columbo: Expanding Abbreviated Column Names for Tabular Data Using Large Language Models
#8 Minh Phan (UW-Madison)
Keyword Search over Tables in Data Catalogs
#9 Mark Tervo (UW-Madison)
SmartCat: A Composable, AI-Enriched Data Catalog
#20 Mahdi Erfanian (UIC)
NeedleDB, a generative ai based image database
#21 Xinzhi Wang (Purdue)
SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing
#23 Siyuan (Doug) Dong (UMich)
An Empirical Evaluation of Pretrained LLMs for Query Plan Representation
#29 Sushmitha Reddy Balaji Reddy (Purdue)
Assign: A Graph-Native, Semantically-Routed Infrastructure for Stateful Personalised Learning at Scale
#31 Shashank Mukkera (Purdue)
TransForm: A Dynamic Format Selection Framework for Data Transmission in AI Agent Systems
Vector DBs & AI Acceleration
#1 Jiayi Liu (Purdue)
PostgreSQL-V: An Integrated Vector Database System in PostgreSQL
#10 Mohsen Dehghankar (UIC)
Hierarchical Epsilon-Net Graphs: Time Guarantees for HNSW in Approximate Nearest Neighbor Search
#24 Mohsen Dehghankar (UIC)
An Efficient Matrix Multiplication Algorithm for Accelerating Inference in Binary and Ternary Neural Networks
#28 Muttaqi Ahmad Alladin (Purdue)
Knowledge Graph Construction from Domain Science Papers Using Structure-Aware and Hierarchical RAG Chunking
DB Architecture & Optimization
#4 Abigale Kim (UW-Madison)
Heterogeneous String Pattern Matching
#5 Zhongwei Xu (UMich)
LakeHelm: Zero-Shot Lakehouse Advisor for Joint Engine-Format Selection and Configuration
#11 Alicia Lyu (UW-Madison)
Storing and Indexing Multiple Tables by Interesting Orderings...
#12 Simon Frisk (UW-Madison)
K-Join: Combining Vertex Covers for Parallel Joins
#16 Wenjie Hu (UW-Madison)
Nemo: Efficient Caching with Index Pushdown for Disaggregated OLTP Databases
#18 Zhenghong Yu (UW-Madison)
FlowLog: Efficient and Extensible Datalog via Incrementality
#32 Udesh Kumarasinghe (Purdue)
iPDB Demo-- Optimizing SQL Queries with ML and LLM Predicates
NLP, Semantic Analysis & Notebooks
#2 Chuxuan Hu (UIUC)
DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries
#15 Billy Li (UIUC)
Kishu: World's First Undoable Notebook
#22 Chuxuan Hu (UIUC)
LEAP: LLM-powered End-to-end Automatic Library for Processing Social Science Queries...
Data Integration, Cleaning & Mining
#6 Dev Ahluwalia (UW-Madison)
MadMatcher: A Platform for Scalable and Accurate Entity Matching
#13 Aryan Esmailpour (UIC)
Metric k-clustering using only Weak Comparison Oracles
#14 Rubab Zahra Sarfraz (UIC)
Speeding up Transformation Search With Lightweight Statistics
#25 Khoi Le, Aryan Esmailpour (UIC)
Compact Representations of Geometric Bipartite Graphs via Weighted Biclique Covers
#26 Chenjie Li (UIC)
Refining Labeling Functions with Limited Labeled Data
#30 Yaoxu Song (Purdue)
BASA-KGMD for Knowledge-Graph-Driven Autonomous Material Discover
Secure & Private Data Management
#17 Xiling Li (Northwestern)
CASTLE: Collaborative Analytics via Secure, Trustworthy and Scalable Query Evaluation
#27 Donghyun Sohn (Northwestern)
HAMMER: Homomorphic Analytics with Multi-party Method for Encrypted Relational Queries
#33 Sana Ebrahimi (UIC)
An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring
02:00 PM
Afternoon Session I: LLMs & GenAI for Data
View Talks
Ting Cai (UW-Madison)
Columbo: Expanding Abbreviated Column Names for Tabular Data Using Large Language Models
Andrew Crotty (Northwestern)
Making Prompts First-Class Citizens for Adaptive LLM Pipelines
Xinzhi Wang (Purdue)
SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing
Siyuan (Doug) Dong (UMich)
An Empirical Evaluation of Pretrained LLMs for Query Plan Representation
02:50 PM
Break
03:00 PM
Afternoon Session II: NLP & Semantic Data Analysis
View Talks
Chuxuan Hu (UIUC)
Automating Open-domain Data Analysis
Boris Glavic (UIC)
Relevance-based Data Management and the Value of Data
Billy Li (UIUC)
Kishu: World's First Undoable Notebook
04:00 PM
Concurrent Sessions & Social Hour
  • PIs Only Town Hall (4:00 PM - 4:15 PM): A brief discussion regarding the future plan of Midwest Data Day (establishing a recurring, bi-annual event).
  • Student Poster Session II & Social Hour (4:00 PM - 5:00 PM): Students will host the second half of the poster session. Please join us for beer, wine, and light appetizers!
05:00 PM
Closing Remarks

Acknowledgments

We would like to extend our sincere gratitude to Aaron Elmore for supporting the lunch and reception through a Google Unrestricted Gift, as well as the Data Science Institute (DSI) for their generous support of the workshop.

Directions, Parking & Hotels

The workshop will be hosted at the University of Chicago's Data Science Institute.

For detailed information on how to get here, including building access and public transportation options, please visit the official DSI visiting guide.

Visiting the DSI Guidelines →

Parking Information

  • University Garage: Located at 55th and Ellis, right by the DSI. The daily rate is approximately $25/day.
  • Street Parking: There is limited free parking available along the neighborhood streets in Hyde Park.

Hotels

Here is a list of local hotels in Hyde Park commonly used by the Data Science Institute. (Note: We cannot guarantee dedicated parking at any of these locations, so please check with the hotel directly.)

  • The Sophy Hotel
    1411 E 53rd St, Chicago, IL 60615
    About a 15-minute walk to the venue. A slightly more expensive option and the sister hotel to Hyatt Place.
  • Hyatt Place Chicago
    5225 S Harper Ave, Chicago, IL 60615
    About a 15-minute walk to the venue. The less expensive sister hotel to The Sophy.
  • The Study at University of Chicago
    1227 E 60th St, Chicago, IL 60637
    About a 15-minute walk to the venue. This hotel is located on the south end of campus.

Co-Chairs

Aaron Elmore
University of Chicago
aelmore@cs.uchicago.edu
Chunwei Liu
Purdue University
chunwei@purdue.edu