NSF-IDM/CAREER:
Database Technologies for Emerging Applications
Walid G. Aref
Principal Investigator
Department of
Contact
Information
Walid G. Aref
Department of Computer Sciences
1398
Phone: (765)
494-1997
Fax : (765)
494-0739
Email: aref@cs.purdue.edu
URL:
http://www.cs.purdue.edu/faculty/aref.html
WWW PAGE
http://www.cs.purdue.edu/homes/aref/dbsystems.html
List of
Supported and Collaborating Students and Staff
Project Award
Information
Keywords
· Disk scheduling
· Quality of service
· Adaptive spatial query processing
· Spatial indexing
· High-dimensional indexing
Project Summary
This project supports an integrated research and education efforts in the area of database technologies for emerging applications. Its goal is to develop database technologies that address the demands of data-intensive applications, which handle distributed, spatial and multimedia data sources.
The research effort focuses on:
The techniques generated by these research tasks will facilitate the efficient support of modern applications in database systems and will allow database systems to match the dynamically changing demands of these applications.
The education effort focuses on:
Publications and Products
[1] W. G. Aref
and I.F. Ilyas, An
Extensible Index for Spatial Databases, The 13th International Conference on
Statistical and Scientific Databases,
[2] W. G. Aref
and I.F. Ilyas, A Framework for Supporting the Class
of Space Partitioning Trees, Technical Report CSD-TR #01-002, Department of
Computer Sciences, Purdue University, February 2001.
[3] W. G. Aref,
K. El-Bassyouni,
[4] W. G. Aref,
A. Catlin, A. Elmagarmid,
J. Fan, J. Guo, M. Hammad,
[5] W. G. Aref,
A. Catlin, A. Elmagarmid,
J. Fan, M. Hammad,
Goals, Objectives, and Targeted Activities
1. Adaptive Spatial Query
Processing
The goals of this project are to
develop and implement a broad class of innovative database technologies that
can accommodate the fluctuation, unpredictability, and new data types in data
–intensive distributed spatial and multimedia information sources.
2. Space Partitioning and High-Dimensional Generalized
Search Trees (SP-GiST)
The goal of this project is to develop new indexing techniques and new
technologies to facilitate the embedding of non-traditional indexing structures
that are commonly used in emerging applications into industrial strength
database systems. This includes building a prototype toolbox that aids in
constructing non-traditional index structures, studying the relevance of (and
benchmarking) the alternative node clustering strategies, and developing a
generic high-dimensional index that serves as an umbrella index that can be
tailored easily to realize a good variety of high-dimensional indexes.
3. QoS-Aware
Disk Scheduling
The goal of this project is to develop new disk scheduling algorithms that
scale up towards achieving multiple quality of service requirements in a
measurable way, and develop a software environment for tailoring, testing, and
tuning disk schedulers to match the dynamic requirements and the workloads of
the underlying multimedia applications.
Current
Progress
Research
Space Partitioning and High-Dimensional Generalized Search
Trees (SP-GiST) [1,2]
Emerging
database applications require the use of new indexing structures beyond B-trees
and R-trees. Examples are the k-D tree, the trie, the quadtree,
and their variants. They are often proposed as supporting structures in data
mining, GIS, and CAD/CAM applications. A common feature of all these indexes is
that they recursively divide the space into partitions. A new extensible index
structure, termed SP-GiST
is developed [1,2] that supports this class of data
structures, mainly the class of space-partitioning unbalanced trees. Simple method implementations are developed and are tested that
demonstrate how SP-GiST can behave as a k-D
tree, a trie, a quadtree,
or any of their variants. Issues related to clustering tree nodes into pages as
well as concurrency control for SP-GiST are
addressed. A dynamic minimum-height clustering technique is applied to minimize
disk accesses and to make using such trees in database systems possible and
efficient. A prototype implementation of SP-GiST is
currently being developed as well as performance studies of the various SP-GiST's tuning parameters.
Currently,
we study the extensions of SP-GiST to support
bulk-loading and to enhance the deletion capabilities and dynamic reclustering after node deletions. We plan to study
high-dimensional space-partitioning trees and generalized frameworks for
supporting them.
QoS-Aware
Disk Scheduling [3]
A
new quality of service (QoS) aware disk scheduling
algorithm is developed and is tested. It is applicable in environments where
data requests arrive with different QoS requirements
such as real-time deadline, and user priority. The disk scheduler tries to meet
these requirements and minimizes the violations. The number of scheduling
parameters can increase depending on the QoS
requirements of the underlying system. Previous work on disk scheduling has
focused on optimizing the seek times and/or meeting the real-time deadlines. A
unified framework for QoS disk scheduling is being
developed that scales with the number of scheduling parameters. The general
idea is based on modeling the disk scheduler requests as points in the
multi-dimensional space, where each of the dimensions represent one of the
parameters (e.g., one dimension represents the request deadline, another
represents the disk cylinder number, and a third dimension represents the
priority of the request, etc.). Then the disk scheduling problem reduces to the
problem of finding a linear order to traverse these multi-dimensional points.
Space-filling curves are adopted to define a linear order for sorting and
scheduling objects that lie in the multi-dimensional space. This generalizes
the one-dimensional disk scheduling algorithms (e.g., EDF, SATF, FIFO). A set of goodness measures are adopted that are
derived from the alternative QoS requirements.
High-dimensional versions of the space-filling curves, e.g., Sweep, Peano, Diagonal, and Hilbert, are compared based on the
proposed measures. Several techniques are studied to show how a QoS-aware disk scheduler deals with the progressive arrival
of requests over time. Simulation experiments are presented to show a
comparison of the alternative techniques and to demonstrate the scalability of
the proposed QoS-aware disk scheduling algorithm over
other traditional approaches.
We are currently investigating the cascaded
use of multiple space-filling curves in disk scheduling with quality of service
constraints. We plan to develop a Multi-SFC disk scheduler, where we partition
the scheduling space to multiple sub-spaces, and the choice of which space-filling
curve to use in each scheduling sub-space is based on the nature of the
scheduling parameters representing this sub-space and on a set of goodness
measures that reflect the scheduling requirements in the sub-space.
Education
Multimedia Database and Information
Systems Course
During the past two decades, database
systems have been extended significantly beyond handling only textual data, to
be able to match the demands of emerging multimedia applications. A
proposed course on multimedia database and information systems has recently
been approved and will be offered at the CS department at
The course will cover the following topics: overview of digital
libraries, video conferencing, quality of service; multimedia content
description and standards; multi-dimensional and high-dimensional indexing
structures; image and text databases, similarity-based search techniques, and
multi-feature ranking algorithms; audio and video databases, techniques to
capture, index, store and retrieve video and audio streams by content;
multimedia databases, techniques to store multimedia data containing a mix of
image data, text, audio and video data, as well as other heterogeneous data;
multimedia servers, storage servers that handle real-time constraints, quality
of service guarantees, network bandwidth, and tertiary storage; communication
software and networking for multimedia database and information systems;
multimedia presentations, presentation schedules and delivery; mobile/wireless
and peer-to-peer multimedia database and information systems; multimedia
database security, watermarking, and cryptography.
Project
References
Potential Related Projects
Joe
Hellerstein (Online query processing and the GiST projects), Sharad Mehrotra (High-dimensional indexing and concurrency control
for search trees)