Jennifer Neville
NSF Career Grant
Machine Learning Methods and Statistical Analysis Tools for Single Network Domains
Machine learning researchers focus on two distinct learning scenarios for structured network data (i.e., where there are statistical dependencies among the attributes of linked nodes). In the first scenario, the domain consists of a population of structured examples (e.g., chemical compounds) and we can reason about learning algorithms asymptotically, as the number of structured examples increases. In the second scenario, the domain consists of a single, potentially infinite-sized network (e.g., the World Wide Web). In these "single network" domains, an increase in data corresponds to acquiring a larger portion of the underlying network. Even when there are a set of network samples available for learning and prediction, they correspond to subnetworks drawn from the same underlying network and thus may be dependent.
Although estimation and inference methods from the field of statistical relational learning have been successfully applied in single-network domains, the algorithms were initially developed for populations of networks, and thus the theoretical foundation for learning and inference in single networks is scant. This work focuses on the development of robust statistical methods for single network domains---since many large network datasets about complex systems rarely have more than a few subnetworks available for model estimation and evaluation. Specifically, the aims of the project include (1) strengthening the theoretical foundation for learning in single network domains, (2) creating accurate methods for determining the significance of discovered patterns and features, (3) formulating novel model selection and evaluation methods, and (4) developing improved approaches for network learning and prediction based on the unique characteristics of single network domains.
The research will enhance our understanding of the mechanisms that influence the performance of network analysis methods and drive the development novel methods for complex network domains. Expanding the applicability of machine learning techniques for single network domains could have a transformational impact across a broad range of areas (e.g., psychology, communications, education, political science) where current methods limit research to the investigation of processes in dyad or small group settings. Also, the project results will serve as an example application of computer science in broader network science context, which will attract and retain students that might not otherwise be engaged by conventional CS topics.