CS 54100: Database Systems

Project 1: Relational Databases

Part 1 due beginning of class Monday, 6 February, 2012
Part 2 due beginning of class Monday, 13 February, 2012

Part 1: Database Creation

You are provided with a dataset of store transaction information. Part 1 of this project is to load the data into a relational database. The dataset you are provided consists of a single row where each row describes the purchase of a single item (or a quantity of identical items.) You will need to create an appropriately normalized database to hold the data.

The dataset can be downloaded, or, if you are using a department unix system, try:

ln -s /homes/clifton/.www/project1/data.csv .

to save disk space. This is a comma-separated value file, with string values in quotes, a relatively common format. The first line in the file contains column names; please use these in your schema. You cannot access this from outside Purdue, and please do not share this information outside the class or make it available outside Purdue. If you download the file, please remove it once you have it loaded into the database.

The database management system you will use is the CS department Oracle server. You should already have an account set up, for information on connecting, see: http://www.cs.purdue.edu/oracle/. If you want to use a client other than SQL*Plus, feel free - but be advised that we'll be testing your queries by treating them as a SQL*Plus script.

You can now start typing in sqlplus commands or SQL statements. Be sure to terminate every SQL statement with semicolon (;). If you have a query file, say ttr.sql, you can run it by command
SQL>@ttr

You may find the Oracle documentation helpful, in particular the SQL Reference, SQL*Loader reference (sqlldr is only installed on cs SPARC Solaris systems, such as lore and ector. To use, set your environment as follows:

ORACLE_HOME=/p/oracle/home
path=( /p/oracle/home/bin $path)
LD_LIBRARY_PATH=/p/oracle/home/lib:$LD_LIBRARY_PATH

Alternatively, you can create a file of insert statements as input to sqlplus. ) and SQL*Plus Quick Reference.

Hints

You may find it easier to create a table that holds all the data, load the data into that table, then populate your database using select ... into statements. (when done, drop the temporary table so we don't overload the department server.)
Your constraints should make sense, and they should also match the data you are given. If there are a few data items that violate your constraints, please relax your constraints and include a brief description of what happened and why.
You may find it useful to create key attributes for relations, and create the values for the keys. But you shouldn't find it absolutely necessary (think about why.)

Part 2: SQL Queries

The second part of the assignment is to query a database. You will be assigned a partner. You will be given a set of questions to answer; you will have to write queries to answer them using your partner's database. To make live easier on your partner (improving query performance - we'll learn why later), please issue the command analyze table table_name compute statistics; for each of your tables.

There seem to be two significantly different ways to design the schema. I've tried to partner you up with someone who did it differently, so you can discuss the tradeoffs with them.

Note that in working with a partner, you will be able to check your results against each other (they should be the same, even though the queries will be different.) The team part of the grade will be based partly on correctness of result, and partly on complexity of queries; if you design a poor schema, this will make it hard for your partner to write good queries, and affect a portion both of your scores. You should compare notes/results with your partner

As to the queries, I've provided English-language descriptions of what I want.

What were the total sales for August 8, 2001?
How many customers made purchases on August 8, 2001?
What were the total sales per store for the week August 1-7, 2001?
How many transactions (visits) were made at each store for the week August 1-7, 2001?
What is the average purchase per visit for each store?
What was the most expensive item sold, how much was it, and where was it sold?
What item resulted in the highest gross revenue (quantity times price)?
Which manager had the highest average daily gross revenue?
For items that have multiple colors, what was the most popular color? Only show items that sold more than 100 in that color.

After discussing the schema, you ought to be able to do the first five on your own. The last four are more difficult, and you may find some discussion with your partner helps to figure it out.

Please fully qualify the table names with your partner's username (e.g., SELECT ... FROM CLIFTON.PURCHASES). Not only will this make sure you are getting their tables, not yours, it also makes it much easier for us to try it out when grading.

Turning in the assignment

For Part 1, You will need to turn in four things:

A list of the commands used to create the tables and constraints for your schema. You should document this with comments explaining the constraints, where not obvious. Alternatively, you can have a short description of the constraints/design as a separate document.
An explanation of how you loaded the data (e.g., the SQL*Loader script, or a description of how you created insert statements.)
The output of a script that issues the following commands for each table:
```
describe table_name
select count(*) from table_name;
```
An SQL statement that joins all the tables to reconstruct the original single table view (a lossless natural join).

Also, please grant read access on all your tables to nduong, as well as the partner you are assigned for Part II.
grant select on table_or_view_name to nduong;

For Part 2, please turn in the queries, and the output they give. Hint: In sqlplus, if you have saved your commands in a file part2.sql, the following will generate part2.out:

SET TRIMOUT ON
SET FEED OFF
SET PAGES 999
SET LINESIZE 200
SET TRIMSPOOL ON
SPOOL part2.out
@part2.sql
SPOOL OFF

Please turn the output in through Blackboard.

This page last modified .