You are provided with a dataset of store transaction information. Part 1 of this project is to load the data into a relational database. The dataset you are provided consists of a single row where each row describes the purchase of a single item (or a quantity of identical items.) You will need to create an appropriately normalized database to hold the data.
The dataset can be downloaded, or, if you are using a department unix system, try:
ln -s /homes/clifton/.www/project1/data.csv .
to save disk space. This is a comma-separated value file, with string values in quotes, a relatively common format. The first line in the file contains column names; please use these in your schema. You cannot access this from outside Purdue, and please do not share this information outside the class or make it available outside Purdue. If you download the file, please remove it once you have it loaded into the database.
The database management system you will use is the CS department Oracle server. You should already have an account set up, for information on connecting, see: http://www.cs.purdue.edu/oracle/. If you want to use a client other than SQL*Plus, feel free - but be advised that we'll be testing your queries by treating them as a SQL*Plus script.
You can now start typing in sqlplus commands or SQL statements. Be
sure to terminate every SQL statement with semicolon (;
). If you have
a query file, say ttr.sql, you can run it by command
SQL>@ttr
You may find the Oracle documentation helpful, in particular the SQL Reference, SQL*Loader reference (sqlldr is only installed on cs SPARC Solaris systems, such as lore and ector. To use, set your environment as follows:
ORACLE_HOME=/p/oracle/home path=( /p/oracle/home/bin $path) LD_LIBRARY_PATH=/p/oracle/home/lib:$LD_LIBRARY_PATH
Alternatively, you can create a file of insert statements as input to sqlplus. ) and SQL*Plus Quick Reference.
createkey attributes for relations, and create the values for the keys. But you shouldn't find it absolutely necessary (think about why.)
The second part of the assignment is to query a database.
You will be assigned a partner. You will be given a set
of questions to answer; you will have to write queries to
answer them using your partner's database. To make live easier
on your partner (improving query performance - we'll learn
why later), please issue the command
analyze table
table_name compute statistics;
for each of your tables.
There seem to be two significantly different ways to design the schema. I've tried to partner you up with someone who did it differently, so you can discuss the tradeoffs with them.
Note that in working with a partner, you will be able to
check your results against each other (they should be the
same, even though the queries will be different.)
The team
part of the grade will be based
partly on correctness of result, and partly on complexity
of queries; if you design a poor schema, this will make
it hard for your partner to write good queries, and affect
a portion both of your scores.
You should compare notes/results with your partner
As to the queries, I've provided English-language descriptions of what I want.
After discussing the schema, you ought to be able to do the first five on your own. The last four are more difficult, and you may find some discussion with your partner helps to figure it out.
Please fully qualify the table names with your partner's username
(e.g., SELECT ... FROM CLIFTON.PURCHASES
). Not only
will this make sure you are getting their tables, not yours, it
also makes it much easier for us to try it out when grading.
For Part 1, You will need to turn in four things:
describe table_name select count(*) from table_name;
single tableview (a lossless natural join).
Also, please grant read access on all your tables to nduong,
as well as the partner you are assigned for Part II.
grant select on table_or_view_name to nduong;
For Part 2, please turn in the queries, and the output they give. Hint: In sqlplus, if you have saved your commands in a file part2.sql, the following will generate part2.out:
SET TRIMOUT ON SET FEED OFF SET PAGES 999 SET LINESIZE 200 SET TRIMSPOOL ON SPOOL part2.out @part2.sql SPOOL OFF
Please turn the output in through Blackboard.
This page last modified .