STAT59800-JN1/CS59000-030 • Spring 2010 • Tuesday-Thursday 3:00-4:15 • REC 309

Schedule • Project • H2O Project • Resources

Professor Jennifer Neville

Lawson 2142D • neville[at]cs.purdue.edu • 496-9387

Office hours: By appointment (arrange by email)

Many modern data analysis problems involve large data sets of artificial, social, and biological networks that can be represented as graphs. In these settings, traditional IID assumptions are inappropriate; the analyses must take into account the structure of relationships between the data instances. As a result, there has been increasing amount of research developing techniques for incorporating network and graph structures into machine learning and statistics.

Network modeling is an active area of research in several domains. Statisticians have mostly concentrated on models of static networks, which focus on predicting the existence of edges between individual nodes, and do not attempt to model aggregate properties of the graph. In contrast, physicists have developed techniques to model global properties of large complex networks. Their models describe average statistics of the network and focus less on the individual links between particular nodes.

This course will provide an introduction to probabilistic methods for network analysis, paying special attention to model design and computational issues of learning and inference. We will survey statistical network modeling research in multiple communities, including statistics, computer science, and physics.

Classes will consist of instructor presentations, student presentations, and group discussions. Students will be required to (1) read, discuss, and present research papers, and (2) complete a semester-long class project. Potential projects include: a survey paper of research in a subtopic of interest, an empirical investigation of the performance of graph generation algorithms, an analysis of real-world data to determine local and global network characteristics, design and implementation of a new network model/algorithm.

Mathematical maturity and an introductory Statistics course (e.g., STAT416/511/516).

Readings from the current research literature, 1-2 papers per class. See course schedule.

Kolaczyk, Eric. D. (2009). *Statistical Analysis of Network Data*. Springer. (available for download from Purdue library.)

Easley, D. and J. Kleinberg (2010). *Networks, Crowds, and Markets*. Cambridge University Press. online

Jackson, M. (2008). *Social and Economic Networks*. Princeton Press.

- Response papers (Example)

Students are required to write a response to one of the papers that we read in each class. See this example for illustration. Papers should be approximately a half-page long (500 words max) and include:- A brief summary of the main contribution of work,
- One "negative" discussion point that critiques the work, outlining a weakness or questioning the findings of the work,
- One "positive" discussion point that praises the work, outlining a strength or discussing the broader implications of the work.

**5:00pm**the day before class (Mon/Wed). This will allow the discussants to view the responses and summarize by class time.

In addition, after the response submission deadline, each student will be assigned another student's response to read, rate, and provide feedback on. This will be counted towards the class participation grades. - Class presentation/discussion leading

Students are required to participate in leading 2-3 times during the semester. There will be two students assigned to lead each class, one for each of the following roles:- Technical presenter: Starts class with 20-30 minute presentation of the details of the paper(s)
- Discussant: Presents a 10-15 minute summary of the
**positive**and**negative**discussion points raised in the response papers.

- Research project

See project page for more details.

- Class participation: 10% (5% for online comments, 5% for in-class participation)
- Class presentations: 30%
- Response papers: 20%

Each response paper will be graded on a 3 point scale (1 pt for summary, 2pts for discussion points). The lowest four grades will be dropped. - Research project: 40%
- Preliminary report: 10%
- Class presentation: 10%
- Final report: 20%