NSF-IDM/CAREER:
Database Technologies for Emerging Applications
Walid G. Aref
Principal Investigator
Department of
Contact
Information
Walid G. Aref
Department of Computer Sciences
1398
Phone: (765)
494-1997
Fax : (765)
494-0739
Email:
URL:
http://www.cs.purdue.edu/faculty/aref.html
WWW PAGE
http://www.cs.purdue.edu/homes/aref/dbsystems.html
List of
Supported and Collaborating Students and Staff
Project Award
Information
Keywords
· Disk scheduling
· Quality of service
· Adaptive spatial query processing
· Spatial indexing
· High-dimensional indexing
Project Summary
This project supports an integrated research and education efforts in the area of database technologies for emerging applications. Its goal is to develop database technologies that address the demands of data-intensive applications, which handle distributed, spatial and multimedia data sources.
The research effort focuses on:
The techniques generated by these research tasks will facilitate the efficient support of modern applications in database systems and will allow database systems to match the dynamically changing demands of these applications.
The education effort focuses on:
Publications and Products
[1] W. G. Aref
and I.F. Ilyas, An
Extensible Index for Spatial Databases, The 13th International Conference on
Statistical and Scientific Databases,
[2] W. G. Aref
and I.F. Ilyas, A Framework for Supporting the Class
of Space Partitioning Trees, Technical Report CSD-TR #01-002, Department of
Computer Sciences, Purdue University, February 2001.
[3] W. G. Aref,
K. El-Bassyouni,
[4] W. G. Aref,
A. Catlin, A. Elmagarmid,
J. Fan, J. Guo, M. Hammad,
[5] W. G. Aref,
A. Catlin, A. Elmagarmid,
J. Fan, M. Hammad,
Goals, Objectives, and Targeted Activities
1. Adaptive Spatial Query
Processing
The goals of this project are to
develop and implement a broad class of innovative database technologies that
can accommodate the fluctuation, unpredictability, and new data types in data
–intensive distributed spatial and multimedia information sources.
2. Space Partitioning and High-Dimensional Generalized
Search Trees (SP-GiST)
The goal of this project is to develop new indexing techniques and new
technologies to facilitate the embedding of non-traditional indexing structures
that are commonly used in emerging applications into industrial strength
database systems. This includes building a prototype toolbox that aids in
constructing non-traditional index structures, studying the relevance of (and
benchmarking) the alternative node clustering strategies, and developing a
generic high-dimensional index that serves as an umbrella index that can be
tailored easily to realize a good variety of high-dimensional indexes.
3. QoS-Aware
Disk Scheduling
The goal of this project is to develop new disk scheduling algorithms that
scale up towards achieving multiple quality of service requirements in a
measurable way, and develop a software environment for tailoring, testing, and
tuning disk schedulers to match the dynamic requirements and the workloads of
the underlying multimedia applications.
Current Progress
Space Partitioning and High-Dimensional Generalized Search
Trees (SP-GiST) [1,2]
Emerging
database applications require the use of new indexing structures beyond B-trees
and R-trees. Examples are the k-D tree, the trie, the quadtree,
and their variants. They are often proposed as supporting structures in data
mining, GIS, and CAD/CAM applications. A common feature of all these indexes is
that they recursively divide the space into partitions. A new extensible index
structure, termed SP-GiST
is developed [1,2] that supports this class of data
structures, mainly the class of space-partitioning unbalanced trees. Simple method implementations are developed and are tested that demonstrate
how SP-GiST can behave as a k-D tree, a trie, a quadtree, or any of their
variants. Issues related to clustering tree nodes into pages as well as
concurrency control for SP-GiST are addressed. A
dynamic minimum-height clustering technique is applied to minimize disk
accesses and to make using such trees in database systems possible and
efficient. A prototype implementation of SP-GiST is
currently being developed as well as performance studies of the various SP-GiST's tuning parameters.
Currently,
we study the extensions of SP-GiST to support
bulk-loading and to enhance the deletion capabilities and dynamic reclustering after node deletions. We plan to study
high-dimensional space-partitioning trees and generalized frameworks for
supporting them.
QoS-Aware
Disk Scheduling [3]
A
new quality of service (QoS) aware disk scheduling
algorithm is developed and is tested. It is applicable in environments where
data requests arrive with different QoS requirements
such as real-time deadline, and user priority. The disk scheduler tries to meet
these requirements and minimizes the violations. The number of scheduling
parameters can increase depending on the QoS
requirements of the underlying system. Previous work on disk scheduling has
focused on optimizing the seek times and/or meeting the real-time deadlines. A
unified framework for QoS disk scheduling is being
developed that scales with the number of scheduling parameters. The general
idea is based on modeling the disk scheduler requests as points in the multi-dimensional
space, where each of the dimensions represent one of the parameters (e.g., one
dimension represents the request deadline, another represents the disk cylinder
number, and a third dimension represents the priority of the request, etc.). Then
the disk scheduling problem reduces to the problem of finding a linear order to
traverse these multi-dimensional points. Space-filling curves are adopted to
define a linear order for sorting and scheduling objects that lie in the
multi-dimensional space. This generalizes the one-dimensional disk scheduling
algorithms (e.g., EDF, SATF, FIFO). A set of goodness
measures are adopted that are derived from the alternative QoS
requirements. High-dimensional versions of the space-filling curves, e.g.,
Sweep, Peano, Diagonal, and Hilbert, are compared
based on the proposed measures. Several techniques are studied to show how a QoS-aware disk scheduler deals with the progressive arrival
of requests over time. Simulation experiments are presented to show a
comparison of the alternative techniques and to demonstrate the scalability of
the proposed QoS-aware disk scheduling algorithm over
other traditional approaches.
We are currently investigating the cascaded
use of multiple space-filling curves in disk scheduling with quality of service
constraints. We plan to develop a Multi-SFC disk scheduler, where we partition
the scheduling space to multiple sub-spaces, and the choice of which
space-filling curve to use in each scheduling sub-space is based on the nature
of the scheduling parameters representing this sub-space and on a set of
goodness measures that reflect the scheduling requirements in the sub-space.
Project
References
Potential Related Projects
Joe
Hellerstein (Online query processing and the GiST projects), Sharad Mehrotra (High-dimensional indexing and concurrency control
for search trees)