Predictive Modeling with Social Networks

KDD 2008 Tutorial, Las Vegas, NV, USA

Jennifer Neville, Purdue University
Foster Provost, New York University, Stern School of Business


Recently there has been a surge of interest in methods for analyzing complex social networks: from communication networks, to friendship networks, to professional and organizational networks. The dependencies among linked entities in the networks present an opportunity to improve inference about properties of individuals, as birds of a feather do indeed flock together. For example, when deciding how to market a product to people in MySpace or Facebook, it may be helpful to consider whether a person's friends are likely to purchase the product.

This tutorial will explore the unique opportunities and challenges for modeling social network data. We will begin with a description of the problem setting, including examples of various applications of social network mining (e.g., marketing, fraud detection).  We will then present a number of characteristics of social network data that differentiate it from traditional inference and learning settings, and outline the resulting opportunities for significantly improved inference and learning. We will discuss specific techniques for capitalizing on each of the opportunities in statistical models, and outline both methodological issues and potential modeling pathologies that are unique to network data. We will give links to the recent literature to guide study, and present results demonstrating the effectiveness of the techniques.


Additional Resources

OUTLINE (preliminary, subject to change)

  1. Introduction to social network mining.
    Illustration of various social network mining tasks with real-world examples. Discussion of data characteristics unique to these settings.
  2. Collective inference.
    General description of problem, overview of current techniques, and discussion of experimental results.
  3. Relational learning.
    General description of problem, overview of current techniques, and discussion of experimental results.
  4. Methodologies and pathologies.
    Discussion of how to evaluate relational learning and collective inference technologies. Outline of potential biases due to unique characteristics of data.
  5. Open questions and future work.


The target audience for this tutorial includes KDD researchers interested in studying problems with social network data, practitioners interested in taking advantage of social networked data, and researchers generally interested in the state of the art of this timely topic.

The tutorial assumes a basic knowledge of AI-style inference and machine learning, equivalent to an introductory graduate or advanced undergraduate class.


Jennifer Neville is an assistant professor at Purdue University. She received her PhD from the University of Massachusetts Amherst in 2006. She received a DARPA IPTO Young Investigator Award in 2003 and was selected as a member of the DARPA Computer Science Study Group in 2007. Recently she was chosen by IEEE as one of "AI's 10 to watch" for 2008. Her research focuses on data mining techniques for relational and network domains.

Foster Provost is an associate professor, NEC Faculty Fellow, and Paduano Fellow of Business Ethics at New York University's Stern School. He is Editor-in-Chief of the journal Machine Learning, a founding board member of the International Machine Learning Society, and was program chair of the ACM SIGKDD Conference in 2001. He has received a KDD best paper award, Faculty Awards from IBM, and a President's Award from NYNEX Science and Technology. His recent research has focused on inference and learning with network data and "active" data acquisition strategies, such as repeated low-cost labeling to improve data quality and data mining.