CS590N Statistical Relational Learning
Spring 2007: Project information
Your project is a significant undertaking that allows you to explore one or more issues within statistical relational learning (SRL). Choose a topic area that is interesting to you, with a scope that is likely to be doable in a semester, and preferably that is sufficiently new and interesting that it could form the basis of some additional research. The only broad restriction on topic choice is that it must deal with statistical relational learning. Better projects might be preliminary investigations of something that could ultimately become a conference submission. That level of originality is encouraged but not required.
Here are a couple of ideas for projects. They need to be fleshed out with more detail; the list should be viewed as ideas for inspiration. If none of these interests you, feel free to propose your own topic. Before a project is undertaken, the key idea must be approved by the instructor. Approval can be oral and can be based on an oral description of the project's goals. You are also welcome to send the instructor a few paragraphs by email describing what you want to do.
- Analyze a relational dataset using existing algorithms. If you have some relational data that provides an interesting classification or clustering problem, modeling those data to discover patterns and/or knowledge might be a good topic. Here are some possible examples of datasets:
- Bioinformatics data
- Social network data
- Citation data
- Blogs or other Web data
- Marketing data
- ...
- Extend an SRL model/algorithm to handle a novel relational learning task. Here are some possible examples:
- Classification of streaming data
- Classification of temporal-relational data
- Link prediction
- Group prediction
- Collaborative classification of two relational concepts
- ...
- A rigorous comparison of the performance of SRL algorithms on real-world data while varying the properties of the data (e.g., size), the task (e.g., linkage between training and test set), and/or algorithm settings (e.g., number of clusters).
- Comparison of different inference techniques (probably in a single modeling framework). What relationship is there between the characteristics of the data and the accuracy/efficiency of inference?
- An analysis of feature construction and its impact on model performance.
- ...
You may work on this project individually or in small groups. A group's project is expected to be larger in scope than an individual's project, and will be graded on that assumption.
Proposal: Due Feb 18
The project proposal should be a 2-3 page report that includes:
- An overview of the project's goals and/or hypotheses,
- A description of the data and/or algorithms that you will investigate,
- A brief review of related work.
Preliminary report: Due Mar 27
The preliminary report should be a 2-3 page report that includes:
- A recap of the project's goals and/or hypotheses,
- An outline of the methodology (i.e., data, algorithm(s), experimental evaluation) that you will employ,
- A description of your implementation and initial data exploration or experiments,
- A timeline for the remaining work.
If there are any problems, the report should include ideas for how to address them and/or specific questions to the instructor regarding how to address them.
Presentation: Due Apr 17-19
The oral presentation to the class should provide an overview of the project's goals, necessary background information (including work done by others), a description of any experiments, and a summary of your results and findings.
Final report: Due Apr 26
The final report should be a 6-8 page report that includes:
- Introduction
- Algorithm/data description
- Methodology
- Results
- Discussion
- Related Work
The final report should resemble a conference or workshop submission regardless of the quality of the final results (i.e., even if the project did not succeed). The final report can (and should) include material from the proposal and preliminary report.