CS473 Fall 2016 Time: TTh 1:30-2:45pm Location: EE 117Schedule Textbook Piazza Blackboard
Professor Jennifer Neville
Lawson 2142D neville[at]cs.purdue.edu 6-9387
Office hours: Fri 1-2pm, LWSN 2142D
Jihwan Lee, Mahak Goindani
Office hours: Wed 5-6pm, HAAS G50
Questions: We will use Piazza for class questions/discussion. Instead of sending email to the ta list, please post your questions on Piazza.
Email: cs473-ta [at] cs.purdue.edu
The explosive growth of available data on the Internet demands intelligent information systems that can sift through all available information and find out the most valuable and relevant information. This course studies the basic principles and practical algorithms used for those Web Information Systems. The contents include: Web search, recommendation system, Web information extraction, etc. The course emphasizes both the above applications and solid modeling techniques that can be extended for other applications.
Prerequisites: CS251. A reasonable background in Java programming is preferred.
The text below is recommended. Additional reading materials will be distributed as necessary. Reading assignments will be posted on the schedule, please check regularly.
There will be five homework/programming assignments that will be posted on the schedule. Homework assignments should be submitted in class, unless otherwise noted. Programming assignments should written in Java, unless otherwise noted, and should be submitted on data.cs.purdue.edu using Turnin. Details will be provided in the assignments.
In general, questions about the details of homework assignments should be directed to the TA on Piazza, though you should feel free to mail the instructor whenever you have a question. Example solutions, when applicable, will be made available after homework is returned to students.
There will be several in-class quizzes as well as a midterm and comprehensive final exam. Exams will be closed book and closed notes.
Assignments are to be submitted by the due date listed. Each person will be allowed four days of extensions which can be applied to any combination of assignments during the semester without penalty. After that a late penalty of 15% per day will be assigned. Use of a partial day will be counted as a full day. Use of extension days must be stated explicitly in the late submission (either directly in the submission header or by accompanying email to the TA), otherwise late penalties will apply. Extensions cannot be used after the final day of classes (ie., Dec 10 midnight). Extension days cannot be rearranged after they are applied to a submission. Use them wisely!
Assignments will NOT BE accepted if they are more than five days late. Additional extensions will be granted only due to serious and documented medical or family emergencies.
Introduction (1 week)
High-level review of the field of information retrieval, its relationship to search engines, and basic architectures of search engines.
Background and basics (2 weeks)
Types of data, how to acquire data (e.g., crawling), how to process data, and how to prepare it for indexing.
Indexing and queries (2 weeks)
How to create indexes for efficient search, how those indexes are used to process queries, techniques to transform and process queries.
Retrieval and ranking (2 weeks)
Boolean and vector space retrieval models. Probabilistic relevance methods. Basic ranking algorithms.
Evaluation measures (1 week)
Overview of evaluation and performance metrics that are used to compare and tune search engines.
Machine learning for text (2 weeks)
Text categorization. Document clustering. Topic detection.
Collaborative filtering and link analysis (2 weeks)
Recommender systems, collaborative filtering, content-based filtering, link discovery, social recommendations.
Extra topics (2 weeks)
Social search, spam detection, information extraction, computational advertising.