CS541 Spring12 - Project 4: Query Evaluation and Optimization

Overview

Last updated: Monday April 9, 2012.
Due: Midnight, Friday, April 20, 2012. 10% penalty for every late day (Limit: 5 late days)
This project has 3 parts. You work INDIVIDUALLY in part 1 and part 2; and in A GROUP OF 3 in part 3.
Read chapter 12 (Overview of Query Evaluation) in the textbook carefully to get background for this project.
It should go without saying, but START EARLY!

Provided code: MiniSQL Parser

No, you don't have to write the parser! Instead, a complete parser and type checker for the MiniSQL language is provided with the skeleton code. This small subset of SQL includes the basic commands CREATE, DROP, INSERT, SELECT, UPDATE, DELETE, DESCRIBE, and EXPLAIN. Supported data types include INTEGER, FLOAT, and STRING (notice the slight deviations from the SQL standard). The following (incomplete) list illustrates what is not included in the language:

Support for NULL values
Complex expressions / parentheses
for example WHERE a = b + 1
Aliasing (i.e. FROM Employees E)
this means column names should be unique
DISTINCT and ORDER BY
Aggregates, GROUP BY, HAVING, etc.

Once a MiniSQL statement is parsed, the abstract syntax tree (AST) is passed to the Optimizer, which in turn dispatches the query to the corresponding class implementing the Plan interface. In this project, you will implement several of these plan classes, using the provided parser and system Catalog.

PART 1: CREATE and DROP INDEX (INDIVIDUALLY)

Your first task is to implement the CreateIndex and DropIndex commands, allowing you to build and destroy hash indexes on tables. Your implementation of these and the remaining Plan classes must meet the following general requirements:

You must validate all query input.
For example, you should not create an index if the file name already exists. Please review (and call) the appropriate methods in the provided class QueryCheck to fulfill this requirement.
Execute the query using the components we have developed throughout the semester.
If the query affects the system catalogs (i.e. create/drop statements), then call the appropriate method(s) in Catalog to maintain them.
Each query should print a one-line message at the end, such as "Table Created" or "1 row inserted" -- or anything else you would find appropriate.

Some useful hints and tips:

You may want to use CreateTable and DropTable as a reference.
(i.e. these are provided to demonstrate how to use the parser)
Don't forget that CREATE INDEX should actually build the hash index!
(i.e. don't just rename the word "table" in CreateTable to "index")

Part 2: INSERT and SELECT (INDIVIDUALLY)

Your next task is to implement the Insert and Select classes, allowing you to create and query actual data. Remember to fulfill the general requirements listed in part one. For Select, you will implement a basic query optimizer. The parser will give you an array of table names to select from, an array of (unique) column names to project, and an array of selection predicates (in conjunctive normal form). The basic plan is to use FileScans and SimpleJoins for all the tables, add a Selection for each conjunct, and have one Projection at the root of the Iterator tree.

For example:
Given the tables T1(a, b) and T2(c, d) and the following query:

     EXPLAIN SELECT d, a FROM T1, T2 WHERE a = c and b = d or a = 5;

The default (naive) execution plan is as follows:

     Projection : {3}, {0}      
     Selection : b = d OR a = 5
        Selection : a = c OR a = 5          
           SimpleJoin : (cross)            
               FileScan : T2            
               FileScan : T1

(Note how the conditions of the WHERE clause change into the predicates)

In order to get the full credit, you need to implement the pushing selections optimization technique described in textbook (section 12.5.1): if the predicates of a selection involve only the attributes of one table, you should execute the selection before the join

Some more useful hints and tips:

Think carefully about any side effects of INSERT statements.
(i.e. what if the table you're inserting into has indexes?)
The main goal of Select's constructor is to create an Iterator query tree.
(i.e. all you need to do in execute() is call iter.explain() or iter.execute())
For testing purpose, your execution plan will be printed before the returned results.

Note: The Selection class has been changed from the previous project. The default logical operator for the set of predicates is now OR.

Part 3: Query Optimization (WORK IN GROUP OF 3)

Besides basic evaluation plan, there is much space for optimization. We are going to provide you with some MiniSQL queries and expect you to utilize information from the Catalog to optimize the queries. Think about possibilities to optimize those queries.
Query 1
Query 2
Query 3

Getting Started

Here is Project documentation, and Project skeleton code.

Note that this code is a complete starting point for the project, i.e. the bufmgr, heap, index, and relop packages are provided for you (in jar files). The framework classes may be sightly different than the ones used in previous projects.

Running and testing this project will be quite different from the others. Instead of using an automated test driver, you will run the provided command-line utility, Msql, which resembles the behavior of SQL*Plus. Several test queries are provided with the skeleton code, but more queries will be tested at the time of grading. Use the STATS command to view performance counters. The Msql program can receive input from the command line, or from a file. In the latter case, you need to provide the file name as an input parameter. For example:

    java -classpath bin global.Msql mytest.sql

If you use an IDE, feel free to set up your run configuration to accept that input parameter.

Turnin

For individual parts, submit you work via Blackboard. As usual, please include together with your code the Makefile, Readme (which describes your project). All files need to be zipped in a file named: your_career_login_qe.zip.

For group part, one of the group member will submit the work via Blackboard. Also please include with your code the Makefile, Readme (which lists group members, how you do the optimization, new features that you want us to know, and roles of each member, i.e. who do what). All files need to be zipped in a file named: your_career_login1_your_career_login2_your_career_login3_qe.zip.

I should be able to compile/run your program using make on a CS deparment Unix machine. The directory structure of your zip file should be identical to the directory structure of the provided zip file (i.e., having the directory src, the Makefile, ...), except the top-level name (should be your career login above). Your grade may be deduced 5% off if you don’t follow this.