NSF-IDM/CAREER: Database Technologies for Emerging Applications

 

Walid G. Aref

Principal Investigator

Department of Computer Sciences
Purdue University

 

Contact Information

Walid G. Aref

Department of Computer Sciences

1398 Computer Science Building

Purdue University

West Lafayette, Indiana 47907-1398

Phone: (765) 494-1997
Fax : (765) 494-0739
Email: aref@cs.purdue.edu
URL: http://www.cs.purdue.edu/faculty/aref.html

WWW PAGE
http://www.cs.purdue.edu/homes/aref/dbsystems.html

List of Supported and Collaborating Students and Staff

  • M. Mokbel (Ph.D. student)
  • H. Gu (Ph.D. student)
  • I. Ilyas (Ph.D. student)
  • E. Redmond (Undergraduate student)

 

Project Award Information

  • Award Number: IIS-0093116
  • Duration: 9/15/2001 -- 9/14/2006
  • Title: Research and Teaching of Database Technologies for Emerging Applications

 

Keywords

·  Disk scheduling

·  Quality of service

·  Adaptive spatial query processing

·  Spatial indexing

·  High-dimensional indexing

 

Project Summary

This project supports an integrated research and education efforts in the area of database technologies for emerging applications. Its goal is to develop database technologies that address the demands of data-intensive applications, which handle distributed, spatial and multimedia data sources.

The research effort focuses on:

  • The development of new spatial query processing algorithms that can be dynamically and flexibly reordered and adapted during query execution to address the fluctuation in response time and the unexpected delays from the various input web-based spatial data sources
  • The development of indexing techniques and technologies to facilitate the embedding of non-traditional spatial and high-dimensional indexing structures into database systems
  • The development of disk scheduling algorithms that scale up towards achieving multiple quality of service requirements for multimedia applications.

The techniques generated by these research tasks will facilitate the efficient support of modern applications in database systems and will allow database systems to match the dynamically changing demands of these applications.

The education effort focuses on:

  • The involvement of both undergraduate and graduate students in hands-on research activities and experimentation through the development of a lab environment to facilitate the integration of research results and education
  • The development of several curriculum elements including course modules and educational tools

Publications and Products

[1]     W. G. Aref and I.F. Ilyas, An Extensible Index for Spatial Databases, The 13th International Conference on Statistical and Scientific Databases, Virginia, July 2001. [PS], [PDF]

[2]     W. G. Aref and I.F. Ilyas, A Framework for Supporting the Class of Space Partitioning Trees, Technical Report CSD-TR #01-002, Department of Computer Sciences, Purdue University, February 2001.

[3]     W. G. Aref, K. El-Bassyouni, I. Kamel, and M. F. Mokbel, Scalable QoS-Aware Disk Scheduling, International Database Engineering and Applications Symposium (IDEAS), Alberta, Canada, July 2002.

[4]     W. G. Aref, A. Catlin, A. Elmagarmid, J. Fan, J. Guo, M. Hammad, I. Ilyas, M. Marzouk, S. Prabhakar, A. Rezgui, S. Teoh, E. Terzi, Y. Tu, A. Vakali, and X. Zhu . A distributed server for continuous media. In ICDE'02 Proc. of the 18th International Conference on Data Engineering. February 26-March 1. San Jose, California. February 2002.

[5]     W. G. Aref, A. Catlin, A. Elmagarmid, J. Fan, M. Hammad, I. Ilyas, M. Marzouk, and X. Zhu . Search and discovery in digital video libraries. CDS TR #02-005, Computer Sciences Department, Purdue University. February 2002.

Goals, Objectives, and Targeted Activities

1. Adaptive Spatial Query Processing

The goals of this project are to develop and implement a broad class of innovative database technologies that can accommodate the fluctuation, unpredictability, and new data types in data –intensive distributed spatial and multimedia information sources.

 

2. Space Partitioning and High-Dimensional Generalized Search Trees (SP-GiST)
The goal of this project is to develop new indexing techniques and new technologies to facilitate the embedding of non-traditional indexing structures that are commonly used in emerging applications into industrial strength database systems. This includes building a prototype toolbox that aids in constructing non-traditional index structures, studying the relevance of (and benchmarking) the alternative node clustering strategies, and developing a generic high-dimensional index that serves as an umbrella index that can be tailored easily to realize a good variety of high-dimensional indexes.

3. QoS-Aware Disk Scheduling
The goal of this project is to develop new disk scheduling algorithms that scale up towards achieving multiple quality of service requirements in a measurable way, and develop a software environment for tailoring, testing, and tuning disk schedulers to match the dynamic requirements and the workloads of the underlying multimedia applications.

Current Progress

Space Partitioning and High-Dimensional Generalized Search Trees (SP-GiST) [1,2]

Emerging database applications require the use of new indexing structures beyond B-trees and R-trees. Examples are the k-D tree, the trie, the  quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SP-GiST is developed [1,2] that supports this class of data structures, mainly the class of space-partitioning unbalanced trees. Simple method implementations are developed and are tested that demonstrate how SP-GiST can behave as a k-D tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SP-GiST are addressed. A dynamic minimum-height clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SP-GiST is currently being developed as well as performance studies of the various SP-GiST's tuning parameters.

 

Currently, we study the extensions of SP-GiST to support bulk-loading and to enhance the deletion capabilities and dynamic reclustering after node deletions. We plan to study high-dimensional space-partitioning trees and generalized frameworks for supporting them.

QoS-Aware Disk Scheduling [3]

A new quality of service (QoS) aware disk scheduling algorithm is developed and is tested. It is applicable in environments where data requests arrive with different QoS requirements such as real-time deadline, and user priority. The disk scheduler tries to meet these requirements and minimizes the violations. The number of scheduling parameters can increase depending on the QoS requirements of the underlying system. Previous work on disk scheduling has focused on optimizing the seek times and/or meeting the real-time deadlines. A unified framework for QoS disk scheduling is being developed that scales with the number of scheduling parameters. The general idea is based on modeling the disk scheduler requests as points in the multi-dimensional space, where each of the dimensions represent one of the parameters (e.g., one dimension represents the request deadline, another represents the disk cylinder number, and a third dimension represents the priority of the request, etc.). Then the disk scheduling problem reduces to the problem of finding a linear order to traverse these multi-dimensional points. Space-filling curves are adopted to define a linear order for sorting and scheduling objects that lie in the multi-dimensional space. This generalizes the one-dimensional disk scheduling algorithms (e.g., EDF, SATF, FIFO). A set of goodness measures are adopted that are derived from the alternative QoS requirements. High-dimensional versions of the space-filling curves, e.g., Sweep, Peano, Diagonal, and Hilbert, are compared based on the proposed measures. Several techniques are studied to show how a QoS-aware disk scheduler deals with the progressive arrival of requests over time. Simulation experiments are presented to show a comparison of the alternative techniques and to demonstrate the scalability of the proposed QoS-aware disk scheduling algorithm over other traditional approaches.

We are currently investigating the cascaded use of multiple space-filling curves in disk scheduling with quality of service constraints. We plan to develop a Multi-SFC disk scheduler, where we partition the scheduling space to multiple sub-spaces, and the choice of which space-filling curve to use in each scheduling sub-space is based on the nature of the scheduling parameters representing this sub-space and on a set of goodness measures that reflect the scheduling requirements in the sub-space.

Project References

  • W. G. Aref and I. F. Ilyas. "A Framework for Supporting the Class of Space Partitioning Trees", Technical Report CSD-TR #01-002, Department of Computer Sciences, Purdue University, February 2001.
  • W. G. Aref, K. Elbassiouni, I. Kamel, M. F. Mokbel, and D. Yau. "Scalable QoS-Aware Disk-Scheduling", Technical Report CSD-TR #01-005, Department of Computer Sciences, Purdue University, March 2001.

Potential Related Projects

Joe Hellerstein (Online query processing and the GiST projects), Sharad Mehrotra (High-dimensional indexing and concurrency control for search trees)