Predictive Modeling with Social Networks

ICWSM 2009 Tutorial, San Jose, CA, USA

Jennifer Neville, Purdue University
Foster Provost, New York University, Stern School of Business


Recently there has been a surge of interest in methods for analyzing complex social networks: from communication networks, to friendship networks, to professional and organizational networks.  For predictive modeling, the dependencies among linked entities in the networks present an opportunity to improve inference about properties of individuals, as birds of a feather do indeed flock together. For example, when deciding whom to target with a product offer, it may be helpful to consider whether a person's MySpace or Facebook friends have expressed interest in the product.

This tutorial will explore the unique opportunities and challenges for predictive modeling with social network data. We will begin with a description of the problem setting, including examples of various applications of social network mining (e.g., targeted marketing, on-line advertising, fraud detection).  We will then present a number of characteristics of social network data that differentiate it from the traditional settings for inference and learning, and outline the resulting opportunities for significantly improved inference and learning. We will discuss specific techniques for capitalizing on each of the opportunities in statistical models, and outline both methodological issues and potential modeling pathologies that are unique to network data.

The focus in this tutorial will be to cover the basics, discuss real applications and results where possible, and provide a framework for understanding the more advanced concepts.  We also will provide supplemental material on more advanced concepts, including links to the recent literature.

Prerequisites: The tutorial assumes a basic knowledge of AI-style inference and machine learning, equivalent to an introductory graduate or advanced undergraduate class.


Additional Resources


  1. Introduction to social network mining.
    Illustration of various social network mining tasks with real-world examples. Discussion of data characteristics unique to these settings.
  2. Collective inference.
    General description of problem, overview of current techniques, and discussion of experimental results.
  3. Relational learning.
    General description of problem, overview of current techniques, and discussion of experimental results.
  4. Methodologies and pathologies.
    Discussion of how to evaluate relational learning and collective inference technologies. Outline of potential biases due to unique characteristics of data.
  5. Open questions and future work.


Jennifer Neville is an assistant professor at Purdue University. She received her PhD from the University of Massachusetts Amherst in 2006. She received a DARPA IPTO Young Investigator Award in 2003 and was selected as a member of the DARPA Computer Science Study Group in 2007. Recently she was chosen by IEEE as one of "AI's 10 to watch" for 2008. Her research focuses on data mining techniques for relational and network domains.

Foster Provost is Professor, NEC Faculty Fellow, and Paduano Fellow in Business Ethics at New York University's Stern School.  He is Editor-in-Chief of the journal Machine Learning, a founding board member of the International Machine Learning Society, and was program chair of the ACM SIGKDD Conference in 2001.  He has received Faculty Awards from IBM and a President's Award from NYNEX Science and Technology.  His recent research has focused on inference and learning with network data, utility-based data mining, and on-line advertising.