The goal of this project is to choose and evaluate clustering mechanisms. I would suggest using the mechanisms available in Weka, although you may implement your own if you wish.
Use datasets from the UCI Machine Learning Repository. Choose one from each column of the following table:
One from this column | One from this column |
---|---|
Arrythmia | Abalone |
Image Segmentation | Auto-MPG |
Isolet | Housing |
Nursery |
If you wish to use other datasets in place of these, please give me a pointer to or description of the datasets and I'll let you know if that is okay (and which column it would count as).
What you need to do for this project is:
The data has continuous values, the algorithm only applies to nominal valuesisn't good enough - you should instead discretize the continuous attributes. "Not applicable" is only valid if there is no reasonable way of preprocessing the data to make the algorithm apply. Each data set and algorithm you choose must be used at least once.
The project report should contain the following:
You should also include the output from your sample runs.
Scoring will be based on:
Particularly good discussions of any of the experiments may result in more than 1 point
Electronic submission preferred. Please use the turnin command (on mentor.ics.purdue.edu, turnin -c cs490d -p proj3 directoryname). If that doesn't work, you can tar/zip and email to . Pdf is the safest for capturing non-text. Hard copy is acceptable, please hand in at the beginning of class.