Your task for this project is to identify and perform an association rule mining task. This involves
While you are on your own to select an appropriate data set,
I will point you to one easy
source: The
UCI Machine Learning Repository.
This contains many data sets, not all of which are appropriate
for association rules, so you'll need to do some thinking.
You are also welcome to identify data from
other sources, especially those that you find personally of interest.
The project report should contain the following:
find patterns, but what would improve because of the use of the patterns.
client.
clientdo because of the rules discovered.
Also turn in (likely as a separate plain-text file) a complete listing of the rules found, and instructions (preferably machine-readable/executable) for recreating your results. WEKA provides several ways to do this, from command-line scripts to Explorer - your call.
If you iterate over different attribute sets / parameter settings / etc., only turn in the rule list and scripts for your final iteration. You should include a description of the iterations, and why you needed to make changes from your initial choices, in the project description.
Scoring will be based on:
Extra points will be given for making the problem more challenging (provided you do so appropriately - no extra credit for doing something the hard way when an easy way is available.) Examples could include implementing an algorithm other than apriori that you think will be faster than apriori on your data, or accessing data directly from a database (JDBC) rather than as comma-separated value or ARFF formats. For peak credit, such extra challenges should be documented in a way that enables the rest of the class to make use of what you've done, e.g., simple instructions for connecting directly to a campus-accessible database.
Electronic submission preferred. Please use the turnin command (on mentor.ics.purdue.edu, turnin -c cs490d -p proj1 directoryname). If that doesn't work, you can tar/zip and email to . Pdf is the safest for capturing non-text. Hard copy is acceptable, please hand in at the beginning of class.