Project Part 1: Basic Distributed Database

Start date 3 February, due 17 February.

The project will be to build a distributed database, implementing some of the key components / protocols presented in class. The first part of the project will be to simply set up three communicating database servers. Each must be capable of responding to requests from a client (a transaction program), and processing those transactions. The transactions may involve tables at multiple sites.

The database you will support consists of three tables. Schema (as create table statements in SQL) is:

CREATE TABLE supplies (  -- At site "Supplier"
  item			char(10),
  unitquantity		number,
  price			number
);

CREATE TABLE recipe (  -- At site "DiningHall"
  menu			char(10),
  ingredient		char(10),
  servingquantity	number
);

CREATE TABLE budget ( -- At site "Freehafer"
  mealtime		char(10),
  day			number,
  budgetleft		number
);

Transaction programs will be arbitrary collections of the following statements:

INSERT INTO table VALUES (tuple of schema matching table);
UPDATE table SET attribute=value WHERE update_expression;
SELECT attribute_list FROM tables WHERE select_expression;
ROLLBACK;
COMMIT;

The update_expression is simply a list of attribute=value conditions connected by AND. The select_expression may also include join conditions, i.e., both the left and right side of the equality test are attribute names. attribute_list and tables are simply comma-separated lists. An example transaction would be:

INSERT INTO supplies VALUES ('flour    ',30,5);
SELECT price,servingquantity,unitquantity FROM recipe,supplies WHERE ingredient=item;
UPDATE budget SET budgetleft=20 WHERE day=12 AND mealtime='dinner    ';
COMMIT;

Note that a new transaction is assumed to begin after every commit or abort, or with the first statement submitted. If in doubt about syntax/semantics, note that this is a subset of standard SQL supporting only single value updates and equality conditions. Presumably in the real world, some computation with the values read would be used to determine the values to write. This is irrelevant to distributed transaction management, so you won't need to implement anything to support this...

You can either have your database server read statements from the command line, or from a connection from a separate program. However, you must be able to read/process requests from the other sites while processing a command line transaction. You will be asked to demonstrate that your system supports such concurrency by demonstrating a non-serializable set of transactions (specifically, the final state should not be equivalent to any serial execution, which you should demonstrate.) You'll implement concurrency control to prevent this later.

Part 1 does not need to support joins. This will ease query processing, since each statement in a transaction is executed at one site (you just need to determine which site based on the name of the table, and send it to that site.)

Implementation Suggestions

I suggest using java and java RMI for communications. If you haven't used RMI, the Hello World tutorial at http://java.sun.com is a good starting point.

Because of the fixed schema, it should be easy to set up a main-memory database for this. However, you are welcome to make this even easier by using a real database to manage each table, accessing it via JDBC. While I wouldn't recommend using it for real code (because of SQL Injection attacks), the java.sql.Statement.execute method will make parsing trivial (note that every statement is terminated by a ;, and the ; appears nowhere else.) You will have to do some parsing later (to support joins), but not yet. The department runs an Oracle server you can use for this, or you can set up your own DBMS.

Turning in the assignment

You will turn in this assignment by demoing it to me, along with a brief code walkthrough. Also upload a tar file with the code (just source and instructions on how to run - not binaries/etc.) to Blackboard Please email the possible times for an appointment by 2/11 - 30 minutes sometime between 8:30 and 17:00 on 2/17.