Security Issues in Data Mining

Tuesdays and Thursdays, 9:00-10:15
Heavilon Hall 123

Chris Clifton

Email: clifton_nospam@cs_nojunk.purdue.edu

Data mining, the discovery of new and interesting patterns in large datasets, is an exploding field. Recently there has been a realization that data mining has an impact on security (including a workshop on Data Mining for Security Applications.) One aspect is the use of data mining to improve security, e.g., for intrusion detection. A second aspect is the potential security hazards posed when an adversary has data mining capabilities.

This seminar will explore the field of data mining from a security perspective. My goal is that on completing the course you will have a solid background in the area, such that you will be ready to pursue research on some aspect of data mining security.

Course Methodology

The course will begin with a tutorial on data mining. The contents and scope of this tutorial will depend on the background and preparation of the students. The bulk of the course will concentrate on exploring recent advances in the field through investigation of the research literature.

The workload in the course will be as follows:

There may be additional/alternative work assigned, especially during the tutorial portion of the class.

Prerequisites

Ideally, students in the course would have a good background in data mining, some database experience, a knowledge of probability and statistics, and a good background in computer security. However, I doubt many students will have such a background. What I consider a reasonable set of prerequisites is two of the following three:

Permission of instructor is of course a sufficient prerequisite. If you are interested in the course, but do not have two of the above three (or you are unsure if you have sufficient background), please email me with why you are interested and what you consider to be your relevant background.

Policy on Intellectual Honesty

Please read the above link to the policy written by Professor Spafford. This will be followed unless I provide written documentation of exceptions.

Late work will only be accepted in case of documented emergency (e.g., medical emergency), or by prior arrangement if doing the work in advance is impossible due to fault of the instructor (e.g., you can't do a review early because the paper hasn't been assigned yet.)

Reviews should be an independent analysis of the paper - collusion between reviewers is poor practice. Therefore I ask that reviewers of a paper not discuss the paper with the other reviewers before writing their own review. This will help bring a healthy difference of opinion into classroom discussions. One exception to this: If you are presenting a paper and have difficulty understanding it, you are encouraged to talk to the people reviewing the paper to see if they have insights that may help you in your presentation.

Evaluation/Grading:

Evaluation will be a subjective process, however it will be based primarily on your understanding of the material as evidenced in:

I will evaluate presentations and reviews on a five point scale:

5
Exceptional work. So good that it makes up for substandard work elsewhere in the course. These will be rare.
4
What I'd expect of a Ph.D. candidate. This corresponds to an A grade.
3
Good enough for a Master's degree, but not what I'd like to see for a Ph.D. candidate. This corresponds to a B grade.
2
Okay for a Master's candidate who does extremely well in other courses. This corresponds to a C grade.
1
Not good enough for a graduate student. But something.
0
Missing work, or so bad that you needn't have bothered.

If the number of students is in the right range (allowing between two and three classroom presentations for each student), you will have the option of doing a final project: a research proposal for work in this area. This will be done instead of presentations and reviews in the final two weeks of the course. Students opting not to write a proposal will have one additional presentation and additional review (during the final two weeks) giving equal opportunity to demonstrate knowledge of the material.

Note: The time may be changed to 7:30-8:45 or 4:30-5:45 if there are no conflicts. This would be done to get a room where the lectures can be videotaped - I'd like you to have a chance to see yourself presenting. This will be done later in the term, if necessary. For now, the 9:00-10:15 slot is the one we will use.

Please add yourself to the course mailing list. Send mail to mailer@cs.purdue.edu containing the line:

add your email to cs590m

List of Papers to be covered

You may want to use the Purdue Libraries proxy server to get on-line access to more papers.

Syllabus:


Valid XHTML 1.0!