Purdue University


Nile: Data Stream Management System

A growing number of applications in areas like networking, retail industry, proteomics, and sensor networks are dealing with a new and challenging type of data. Data is produced over time in an unpredictable and bursty fashion, representing streams of network traffic, retail transactions, peptides spectrum, and sensor-measured values. A key requirement of such applications is to continuously monitor and eventually react to interesting phenomena occurring in the input streams. For example, a sudden rise in the temperature of a sensor-controlled object represents a phenomenon that could trigger an alarm in a temperature-monitoring application.

Streaming applications are usually characterized by transient relations, append mode for data updates, continuous queries, approximate answers, and one-pass evaluation. These characteristics make them at odds with several assumptions usually made in traditional databases. Indeed, simply storing the arriving data into a traditional database management system and manipulating the stored data is not an option. We highlight in the following the major requirements for supporting streaming applications:

  • Processing the whole history of the stream is usually inapplicable. It is necessary to limit the scope of interest over the infinite data stream. The concept of window over streams is widely used in stream data systems including Nile.
  • Supporting continuous queries (CQs) is equally important to supporting snapshot queries. Continuous queries are repeatedly evaluated each time a new data item arrives. Incremental, in contrast to whole, evaluation is essential to provide efficient evaluation for CQs.
  • Preserving ordered execution is important in several streaming applications. If the input stream contains data that is ordered (e.g., by some timestamps), then the output tuples should appear as an ordered stream.
  • Due to resource limitation, it might be not always feasible to get exact answers. Approximate answers may be acceptable for some applications (e.g., data mining, analysis, etc.)
Nile is a full-fledged data stream management system. Nile introduces advanced database technologies required for managing and processing online stream data. The premise that we set forth is that the system would need to address major requirements for supporting streaming applications and be applicable to a large array of streaming applications.

Here are some of the salient features of Nile:
  • The stream-in stream-out paradigm
  • Summary Manager with the notion of promising tuples
  • Sliding and predicate windows
  • Negative Tuples
  • Shared Execution
  • Admission control and quality of service support
  • Context-aware query processing and optimization
  • Built-in online data mining
  • Sensor network support
  • Disk-based data streams