Assignment 1:  Data Set Selection/Preparation
Start date 17 January, due 24 January beginning of class.
Your task for this assignment is to identify and characterize
a data set.  It would be best if you have some domain experience,
as this will help with data preparation.
Answer the following questions about the data:
- What the data is about.
- What type of benefit you might hope to get from data mining.
- What type of data mining (classification, clustering, etc.) you
think would be relevant.
 For each, illustrate with an example, e.g., if you think clustering
is relevant, describe what you think a likely cluster might contain
and what the real-world meaning would be.
- Name one type of data mining that you think would not
be relevant, and describe briefly why not.
- Discuss data quality issues:  For each attribute,
- Are there problems with the data?
- What might be an appropriate response to the quality issues.
 
- For at least two attributes, discuss data preprocessing,
and give an example of how it would be done / the outcome on
a small subset of the data.
- What would an appropriate smoothing or generalization technique be?
- What is an appropriate normalization or data reduction technique?
 The goal of this question is to do something equivalent
to Han questions 3.3 and 3.5, but on a data set of your
own choice.  Keep in mind that you should show something
quantitative, but also try to keep it easy to grade.
 You should be able to figure out why I gave you a choice of smoothing or
generalization for the first, and normalization or data reduction for
the second...
Turning in assignment
Electronic submission preferred.
Please email to
 .
<--or (preferably) use the
turnin
command (on mentor.ics.purdue.edu,
turnin -c cs490d -p asn1 filename).-->
Pdf is the safest for capturing non-text.
Hard copy is acceptable, please hand in at the beginning of class.
.
<--or (preferably) use the
turnin
command (on mentor.ics.purdue.edu,
turnin -c cs490d -p asn1 filename).-->
Pdf is the safest for capturing non-text.
Hard copy is acceptable, please hand in at the beginning of class.
