Statistical Relational Learning

CS590N • Spring 2007 • Tuesday-Thursday 12:00-1:15 • HAAS G66

ScheduleProjectResources


Instructor

Professor Jennifer Neville
Lawson 2142D • neville[at]cs.purdue.edu • 496-9387
Office hours: By appointment (arrange by email)

Description

Statistical relational learning (SRL) is revolutionizing the field of automated learning and discovery by moving beyond the conventional analysis of entities in isolation to analyze networks of interconnected entities. In relational domains such as bioinformatics, citation analysis, epidemiology, fraud detection, intelligence analysis, and web analytics, there is often limited information about any one entity in isolation, instead it is the connections among entities that are of crucial importance to pattern discovery. Conventional machine learning techniques have two primary assumptions that limit their application in relational domains. First, algorithms for propositional data assume that data instances are recorded in homogeneous structures (i.e., a fixed number of attributes for each entity) but relational data instances are usually more varied and complex (e.g., molecules have different numbers of atoms and bonds). Second, the algorithms assume that data instances are independent but relational data often violate this assumption---dependencies may occur either as a result of direct relations or through chaining multiple relations together. For example, scientific papers have dependencies through both citations (direct) and authors (indirect).

This course will provide an introduction to recent research in statistical relational learning. The course will survey recent approaches that combine probabilistic and logical representations to model relational and network datasets, focusing on fundamental challenges in representation, learning, and inference. We will review conventional graphical models and inductive logic programming approaches as needed for background.

Classes will consist of instructor presentations, student presentations, and group discussions. Students will be required to (1) read, discuss, and present research papers, and (2) complete a semester-long class project. Potential projects include: investigating the performance of SRL algorithms, analyzing data with SRL models, design and implementation of SRL model/algorithm extensions.

Prerequisites

Mathematical maturity, a basic course in statistics (e.g., STAT511, STAT516), and basic programming skills (e.g., CS180/CS381, STAT598G) required. Students without this background should discuss their preparation with the instructor.

Text

Introduction to Statistical Relational Learning, L. Getoor and B. Taskar, editors, MIT Press, 2007. (Preprint copies will be distributed in class.)

Other readings from the current research literature: see course schedule.

Assignments

Grading