CS541 Spring12 - Project 3: Relational Operators

Overview

Last updated: Tuesday March 27, 2012: the due date should be April 6.
Last updated: Monday March 26, 2012.
Due: Midnight, Friday, April 6, 2012. 10% penalty for every late day (Limit: 5 late days)
In this programming assignment, you work INDIVIDUALLY!
In this project you will implement half a dozen relational operators, which form the basis for the Minibase query processor. The underlying layers and other framework classes are provided with the skeleton code.
Before jumping into this project, make sure you thoroughly understand sections 12.3, 12.4, and 14.4.3 (pages 400 to 409, and 463 to 464) of the textbook.
It should go without saying, but START EARLY!

Part 1: From Records to Tuples

As you have learned in class, a typical optimizer breaks down queries into trees of relational operators, implemented as iterators. The iterators (i.e. HeapScan and HashScan) deal with file access directly, and return records or their ids. In this project, you will build and use a higher-level view of these records with the provided classes Schema and Tuple.

Each high level relational operator you will implement inherits the abstract class Iterator, which contains a schema and requires the following methods:

    protected Schema schema

    public void restart()
    public boolean isOpen()
    public void close()

    public boolean hasNext()
    public Tuple getNext()

Your task is to implement the following wrappers for the heap and index scans:

FileScan
A HeapScan that returns Tuples instead of byte[] 's
KeyScan
A HashScan that returns Tuples instead of RIDs
IndexScan
A BucketScan that returns Tuples instead of RIDs

Some useful hints and tips:

Each constructor should initialize the inherited field schema. (i.e. it's given as a parameter to these three iterators)
The difficult thing is figuring out all the provided code. (i.e. don't be surprised by how little code these classes require)
HashScan scans the hash index for records having a given search key. BucketScan scans the whole hash index. Those classes only return RIDs. You have to build wrapper classes KeyScan and IndexScan that return Tuple.

PART 2: Primitive Operators

Now that you have the basic leaf nodes of most query trees, you can make some more interesting iterators. Your next task is to implement the three fundamental operations of relational algebra:

Selection
Filters another iterator on a set of Predicates. When there are multiple predicates, they are connected by operator "AND" by default (i.e. simply call evaluate() on each one)
Projection
Removes (projects) columns from another iterator. Note: duplicates are OK.
SimpleJoin
More commonly known as "Nested Loops Join"

Some more useful hints and tips:

The tricky part of this project is transforming the algorithms in the book into iterators. (i.e. in each iterator class, you will need to save some sort of state in data fields)
Please notice the join methods in the Schema and Tuple classes. (i.e. you will make use of them when writing your SimpleJoin)

PART 3: Hash Join

Your next task is to implement one final iterator, namely HashJoin. The algorithm is described in section 14.4.3 in the textbook.

For efficiency in hash joins, you should use the as many pages of the buffer pool as possible.

Getting Started

Skeleton code is available here, and the documentation is available here.

Note that this code is a complete starting point for the project, i.e. the bufmgr, heap, and index packages are provided for you (in jar files). Note that the minibase code used in this project is slightly different than the one used in the previous project, e.g., more access to HFPage.

To test your code, simply run the provided RelOperatorTest test driver. You should try to minimize the number of reads, writes, and allocations needed by the test cases.

May be the most challenging part is HashJoin. It will account for 20% of the project's grade.

Turnin

You should turn in your code with the Makefile and a readme file. All files need to be zipped in a file named: your_career_login_ro.zip.

In the readme file, put the feature you would like us to know. I should be able to compile/run your program using make on a CS department Unix machine. You don't need to include the library file and other files provided.

Do not change the directory structure of the code. The directory structure of your zip file should be identical to the directory structure of the provided zip file (i.e., having the directory src, the Makefile, ...), except the top-level name (should be your career login above). Your grade may be deduced 5% off if you don't follow this.