STAT59800-016/CS59000-059 • Spring 2016 • Monday-Wednesday 3:00-4:15 • UNIV 201

Schedule • Project • Resources

Professor Jennifer Neville

Lawson 2142D • neville[at]cs.purdue.edu • 496-9387

Office hours: By appointment (arrange by email)

Many modern data analysis problems involve large data sets of artificial, social, and biological networks that can be represented as graphs. In these settings, traditional IID assumptions are inappropriate; the analyses must take into account the structure of relationships between the data instances. As a result, there has been increasing amount of research developing techniques for incorporating network and graph structures into machine learning and statistics.

Network modeling is an active area of research in several domains. Statisticians have mostly concentrated on models of static networks, which focus on predicting the existence of edges between individual nodes, and do not attempt to model aggregate properties of the graph. In contrast, physicists have developed techniques to model global properties of large complex networks. Their models describe average statistics of the network and focus less on the individual links between particular nodes.

This course will provide an introduction to probabilistic methods for network analysis, paying special attention to model design and computational issues of learning and inference. We will survey statistical network modeling research in multiple communities, including statistics, computer science, and physics.

Classes will consist of instructor presentations, student presentations, and group discussions. Students will be required to (1) read, discuss, and present research papers, and (2) complete a semester-long class project. Potential projects include: a survey paper of research in a subtopic of interest, an empirical investigation of the performance of graph generation algorithms, an analysis of real-world data to determine local and global network characteristics, design and implementation of a new network model/algorithm.

Mathematical maturity and a graduate introductory Statistics course (e.g., STAT516/519).

Readings from the current research literature, 1-2 papers per class. See course schedule.

- Response papers

Students are required to write a summary of one paper per class.

See response page for more details. - Class presentation/discussion leading

Students are required to participate in paper presentations several times during the semester. There will be two students to lead each class, one for each of the following roles:- Summarizer: Presents a 5-10 min summary of paper from each of the three points of view (Message Box, Abstract Summary, Precise Formulation).
- Technical presenter: Presents a 20-30 min overview of the paper(s), covering the technical formulation, analysis, and findings.

- Research project

See project page for more details.

- Class participation: 10%
- Class presentations: 30%
- Response papers: 20%

Each response paper will be graded on a 5 point scale. The lowest three grades will be dropped. - Research project: 40%
- Preliminary report: 10%
- Class presentation: 10%
- Final report: 20%