CS 54701: Information Retrieval

Project 2: Collaborative Recommendation Algorithm

Due 07:00 AM EST Thursday, 24 March 2016

Begin now. We are estimating this will take 2 weeks, so if you dont start now, you may have to work during the Spring break. Don't expect much response from the instructors in the last eight hours before it is due, either.
Late Policy:Late work will be penalized 10% per day (24 hour period). This penalty will apply except in case of documented emergency (e.g., medical emergency), or by prior arrangement.


In this assignment, you will develop different algorithms to make recommendations for movies. You are free to choose any programming language that you like such as C/C++ or Java. However, Matlab is highly recommended. Please check the following tutorial for more detailed information of matlab: http://www.math.mtu.edu/~msgocken/intro/intro.html. You can access Matlab from the computers in the lab by: /p/matlab/bin/matlab

Movie Recommendation System

The Training Data

The training data: a set of movie ratings by 200 users (userid: 1-200) on 1000 movies (movieid: 1-1000). The data is stored in a 200 row x 1000 column table. Each row represents one user. Each column represents one movie. A rating is a value in the range of 1 to 5, where 1 is "least favored" and 5 is "most favored". Please NOTE that a value of 0 means that the user has not explicitly rated the movie.

Please download the training data here: train.txt.


Your task is to design and develop collaborative filtering algorithms that predict the unknown ratings in the test data by learning users' preference from the training data.

Please implement the following tasks:

1. Implement a memory-based Collaborative Filtering Algorithm (30 pts)

Please implement a memory-based (user-based) collaborative filtering algorithm based on vector similarity method.

For more detailed information you can refer to Breese J. S., Heckerman D., Kadie C. (1998). Empirical Analysis of Predictive Algorithms for Collaborative Filtering. (pdf)

2. Implement a model-based Collaborative Filtering Algorithm (30 pts)

Please implement a model-based (item-based) collaborative filtering algorithm based on correlation-based Similarity.

For more detailed information you can refer to Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. (2001) Item-Based Collaborative Filtering Recommendation Algorithms (pdf)

3. Implement your own algorithm (20 pts)

You can try different extensions of the memory-based or model-based algorithm (e.g., algorithms in the above paper).

You can also try different model-based methods. Some references can be found:

Hofmann, T., & Puzicha, J. (1999). Latent Class Models for Collaborative Filtering. In the Proceedings of International Joint Conference on Artificial Intelligence. (pdf)

Pennock, D. M., Horvitz, E., Lawrence, S., & Giles, C. L. (2000). Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and Model-Based Approach. In the Proceeding of the Sixteenth Conference on Uncertainty in Artificial Intelligence. (pdf)

Si, L. & Jin. R. (2003). Flexible mixture model for collaborative filtering. In the Proceeding of the International Conference of Machine Learning. (pdf)

4. Results Discussion (20 pts)

Please provide the following information

The accuracy of the algorithms; Do you think the values are reasonable? How can you justify the results by analyzing the advantages and disadvantages of the algorithms. How long each algorithm takes to complete the prediction? Discuss the efficiency of the algorithms.

What to submit

You will need to turn in your code and report on your evaluation (2-4 pages.)

How to turn in your project

SSH to a Purdue CS machine and run the following command to turn in your project.

turnin -v -c cs547 -p collab name_of_directory

where name_of_directory is the directory that you want to submit.

Valid XHTML 1.1