CS541 Spring12 - Project 3: Relational Operators

Overview


Part 1: From Records to Tuples

As you have learned in class, a typical optimizer breaks down queries into trees of relational operators, implemented as iterators. The iterators (i.e. HeapScan and HashScan) deal with file access directly, and return records or their ids. In this project, you will build and use a higher-level view of these records with the provided classes Schema and Tuple.

Each high level relational operator you will implement inherits the abstract class Iterator, which contains a schema and requires the following methods:
    protected Schema schema

    public void restart()
    public boolean isOpen()
    public void close()

    public boolean hasNext()
    public Tuple getNext()
Your task is to implement the following wrappers for the heap and index scans:
  1. FileScan
    A HeapScan that returns Tuples instead of byte[] 's
  2. KeyScan
    A HashScan that returns Tuples instead of RIDs
  3. IndexScan
    A BucketScan that returns Tuples instead of RIDs
Some useful hints and tips:


PART 2: Primitive Operators

Now that you have the basic leaf nodes of most query trees, you can make some more interesting iterators. Your next task is to implement the three fundamental operations of relational algebra:
  1. Selection
    Filters another iterator on a set of Predicates. When there are multiple predicates, they are connected by operator "AND" by default (i.e. simply call evaluate() on each one)
  2. Projection
    Removes (projects) columns from another iterator. Note: duplicates are OK.
  3. SimpleJoin
    More commonly known as "Nested Loops Join"
Some more useful hints and tips:


PART 3: Hash Join

Your next task is to implement one final iterator, namely HashJoin. The algorithm is described in section 14.4.3 in the textbook.

For efficiency in hash joins, you should use the as many pages of the buffer pool as possible.




Getting Started

Skeleton code is available here, and the documentation is available here.

Note that this code is a complete starting point for the project, i.e. the bufmgr, heap, and index packages are provided for you (in jar files). Note that the minibase code used in this project is slightly different than the one used in the previous project, e.g., more access to HFPage.

To test your code, simply run the provided RelOperatorTest test driver. You should try to minimize the number of reads, writes, and allocations needed by the test cases.

May be the most challenging part is HashJoin. It will account for 20% of the project's grade.




Turnin

You should turn in your code with the Makefile and a readme file. All files need to be zipped in a file named: your_career_login_ro.zip.

In the readme file, put the feature you would like us to know. I should be able to compile/run your program using make on a CS department Unix machine. You don't need to include the library file and other files provided.

Do not change the directory structure of the code. The directory structure of your zip file should be identical to the directory structure of the provided zip file (i.e., having the directory src, the Makefile, ...), except the top-level name (should be your career login above). Your grade may be deduced 5% off if you don't follow this.