The semester project is a significant undertaking that will allow you to experience the entire process of data mining. You will choose a dataset, a task, apply one or more models/algorithms to the data, and evaluate the modeling results.
Choose an area (data, model, or algorithm) that is interesting to you, with a project scope that is likely to be doable in a semester. The only broad restriction on topic choice is that it must be a data mining application. It is not necessary to design/code your own data mining algorithm, but you can if you choose. If you choose to use existing software and modeling techniques, you will need to compare more than one model in your evaluation and explore reasons why one model performs best.
You must formulate and test at least two specific hypotheses in your project. If you investigating a new algorithm, the hypotheses can involve comparing different versions of your system (e.g., with a component turned off). If you are comparing existing algorithms the hypotheses should involve both aspects of the data and the algorithms (e.g., as data characteristics vary how does relative performance change?).
Here are a couple of ideas for projects. They need to be fleshed out with more detail; the list should be viewed as ideas for inspiration. If none of these interests you, feel free to propose your own topic.
Proposal: Due Oct 5
Before a project is undertaken, the key idea must be approved by the instructor. You must submit a brief proposal (few paragraphs) that includes the following:
Final report: Due Dec 15th (by 4pm to CS mailroom)
The final report should be a 6-8 page report that includes: