III: Small: In-memory, Distributed, and Adaptive Spatio-textual Query Processing
Walid G. Aref
Department of Computer Science
305 N. University Street
West Lafayette, Indiana 47907
Phone: (765) 494-1997
Fax : (765) 494-0739
The PI acknowledges the support of the National Science Foundation under Grant Number III-1815796.
Project Award Information
NSF Award Number: III- 1815796
Duration: 8/1/2018 -- 7/31/2021
Title: In-memory, Distributed, and Adaptive Spatio-textual Query Processing
PI: Walid G. Aref
Project Web Page:
The widespread use of GPS-enabled smartphones along with the popularity of microblogging and social networking, e.g., Twitter and Facebook, has resulted in producing large amounts of text data. Typically, this text data, e.g., the tweets, are geo-tagged by the location in which the text data has been produced. Many applications make good use of this stream of geo-tagged text data (also termed spatial-keyword or spatio-textual data), and provide services to users based on the textual and the spatial components of the data. Applications need to process large number of user queries against spatio-textual data. For example, in location-aware ad targeting publish/subscribe systems, it is required to disseminate millions of ads and promotions to millions of users based on the users' locations and textual profiles. This project will address the hurdles that face these applications and their underlying systems in order to function properly.
More specifically, this project will address the following research challenges that face spatio-textual servers:
(1) The scalability challenge to support large amounts of spatio-textual data streams and queries
(2) The expressiveness challenge that is exemplified in the lack of mechanisms that adequately express complex spatio-textual queries. Querying capabilities need to match the growing sophistication and complexity of the continuously evolving location services, and
(3) The adaptivity challenge, where systems need to adapt to changes in data distribution over time.
In location services, location-data distribution, users' interests, and hot keyword topics change over time. A scalable spatio-textual server needs to continuously adapt to these changes. The project will address these scalability, expressiveness, and adaptivity challenges when processing large amounts of queries on continuously-streamed spatio-textual data. The project will investigate how to support spatio-textual data and queries as first-class citizens in an in-memory distributed data system. Scalable architectures for handling large amounts of spatio-textual data and continuous queries will be investigated. In contrast to tailored solutions, relational-like spatio-textual building-block operators will be developed to express extended-SQL spatio-textual queries along with costing, algebraic transformation rules, and query optimization techniques. To address scalability and the variation in the workload, adaptive and frequency-aware in-memory distributed indexing and query processing techniques will be developed to dynamically organize and process the continuously-evolving spatio-textual data. The spatio-textual indexing and query processing techniques to be developed will dynamically account for the changes and differences in the frequencies of keywords within the various spatial regions to automatically choose the best spatio-textual data organization that optimizes the system performance.
For further information see the project web page: <https://www.cs.purdue.edu/homes/aref/IDAS>