CS590N • Spring 2007 • Tuesday-Thursday 12:00-1:15 • HAAS G66

Schedule • Project • Resources

Professor Jennifer Neville

Lawson 2142D • neville[at]cs.purdue.edu • 496-9387

Office hours: By appointment (arrange by email)

Statistical relational learning (SRL) is revolutionizing the field of automated learning and discovery by moving beyond the conventional analysis of entities in isolation to analyze networks of interconnected entities. In relational domains such as bioinformatics, citation analysis, epidemiology, fraud detection, intelligence analysis, and web analytics, there is often limited information about any one entity in isolation, instead it is the connections among entities that are of crucial importance to pattern discovery. Conventional machine learning techniques have two primary assumptions that limit their application in relational domains. First, algorithms for propositional data assume that data instances are recorded in homogeneous structures (i.e., a fixed number of attributes for each entity) but relational data instances are usually more varied and complex (e.g., molecules have different numbers of atoms and bonds). Second, the algorithms assume that data instances are independent but relational data often violate this assumption---dependencies may occur either as a result of direct relations or through chaining multiple relations together. For example, scientific papers have dependencies through both citations (direct) and authors (indirect).

This course will provide an introduction to recent research in statistical relational learning. The course will survey recent approaches that combine probabilistic and logical representations to model relational and network datasets, focusing on fundamental challenges in representation, learning, and inference. We will review conventional graphical models and inductive logic programming approaches as needed for background.

Classes will consist of instructor presentations, student presentations, and group discussions. Students will be required to (1) read, discuss, and present research papers, and (2) complete a semester-long class project. Potential projects include: investigating the performance of SRL algorithms, analyzing data with SRL models, design and implementation of SRL model/algorithm extensions.

Mathematical maturity, a basic course in statistics (e.g., STAT511, STAT516), and basic programming skills (e.g., CS180/CS381, STAT598G) required. Students without this background should discuss their preparation with the instructor.

*Introduction to Statistical Relational Learning*, L. Getoor and B. Taskar, editors, MIT Press, 2007. (Preprint copies will be distributed in class.)

Other readings from the current research literature: see course schedule.

- Response papers (Example)

Students are required to write a response to one of the papers that we read in each class. The response papers should be emailed to the instructor and class discussant by**12:00 noon**the day before class (Mon/Wed). Papers should be at least a half-page and include:- A brief summary of the main contribution of work,
- Two or more primary points that critique, praise, or question the findings of the work.

- Paper presentations

Students are required to give a 10-15 minute presentation of the details of the paper(s) in two different class periods. - Leading class discussion (Example)

Students are required to give a 5-10 minute presentation to start the discussion in two different class periods. This will include:- A top 10 list summarizing the class response papers,
- Posing 2-3 initial discussion questions.

- Research project

See project page for more details.

- Class participation: 10%
- Paper presentations: 10%
- Leading class discussion: 10%
- Response papers: 20%

Each response paper will be graded on a 3 point scale. (3: excellent, 2: good, 1: fair, 0: poor/not submitted.)

The lowest two grades will be dropped. - Research project: 50%
- Project proposal: 10%
- Preliminary report: 10%
- Class presentation: 10%
- Final report: 20%