Office: Suite 220, Felix Haas Hall Email: jithinks (at) purdue (dot) edu
Academic Background
PhD in Computer Science from INRIA, France (Dec 2016) MS in Electrical Communication Engineering from IISc Bangalore, India (Aug 2012)
Research Interests
My central research interest is in solving real-world problems in data science with a network perspective. I mostly follow a multifaceted approach of problem solving: theoretical modeling, designing and analyzing low complexity algorithms, and verifying it on real-world data. My research focus on data mining algorithms for large networks with probabilistic guarantees, statistical modeling and inference on networks, and distributed techniques for analyzing big network matrices. I am also interested in application of machine learning to various inference problems on networks, and, in general, stochastic modeling of complex systems.
Publications
*This paper has an alphabetical author-list and I am a primary contributer.
Preprints
Temporal Ordered Clustering in Dynamic Networks
Krzysztof Turowski (co-primary author), Jithin K. Sreedharan (co-primary author), and Wojciech Szpankowski arXiv:1905.00672, 2019 [paper][data and code]
In temporal ordered clustering, given a single snapshot of a dynamic network, we aim at partitioning its nodes into $K$ ordered clusters $C_1 \prec \cdots \prec C_K$ such that for $i <j$ nodes in cluster $C_i$ arrive to the dynamic graph before nodes in cluster $C_j$. The problem of inferring evolution of a dynamic network is of considerable significance in many applications ranging from predicting the age of proteins to track the expansion of fake news in online social networks. We first formulate our problem for a general dynamic graph, and propose an integer programming framework that finds the optimal partial order, which describes the clusters achieving the best precision (i.e., fraction of successfully ordered node pairs in the partial order) for a particular density (i.e., fraction of comparable node pairs in the partial order). We provide a method to solve a linear programming relaxation of the original optimization using importance sampling on a Markov chain corresponding to the graph evolution. Inspired by this solution, we design unsupervised and supervised algorithms to find temporal ordered clusters. Finally, we instantiate our model to the duplication-divergence model (also known as the vertex copying model) which turns out to present a real challenge when compared to other network models, as explained in the paper. We validate the proposed algorithms tailored to the duplication-divergence model on synthetic data and various real-world networks.
Peer-reviewed papers
Revisiting Parameter Estimation in Biological Networks: Influence of Symmetries
Jithin K. Sreedharan (co-primary author), Krzysztof Turowski (co-primary author), and Wojciech Szpankowski BioKDD - in conjunction with KDD 2019 [paper (extended version in bioRxiv)][data and code]
Graph models often give us a deeper understanding of real-world networks. In the case of biological networks they help in predicting the evolution and history of biomolecule interactions, provided we map properly real networks into the corresponding graph models. In this paper, we show that for biological graph models many of the existing parameter estimation techniques overlook the critical property of graph symmetry (also known formally as graph automorphisms), thus the estimated parameters give statistically insignificant results with respect to the observed network. To demonstrate it and to develop accurate estimation procedures, we focus on the biologically inspired duplication-divergence model, and the up-to-date data of protein-protein interactions of six species including human and yeast. Using exact recurrence relations of some prominent graph properties, we devise a parameter estimation technique that provides right order of number of symmetries, and use phylogenetically old proteins as the choice of seed graph nodes. We also find that our results are consistent with the ones obtained from maximum likelihood estimation (MLE). However, the MLE approach is significantly slower than our methods in practice.
Inferring Temporal Information from a Snapshot of a Dynamic Network Jithin K. Sreedharan (co-primary author), Abram Magner (co-primary author), Ananth Grama, and Wojciech Szpankowski Nature Scientific Reports 2019 [paper (with Supplementary Material)][code][brain network dataset]
The problem of reverse-engineering the evolution of a dynamic network, known broadly as network archaeology, is one of profound importance in diverse application domains. In analysis of infection spread, it reveals the spatial and temporal processes underlying infection. In analysis of biomolecular interaction networks (e.g., protein interaction networks), it reveals early molecules that are known to be differentially implicated in diseases. In economic networks, it reveals flow of capital and associated actors. Beyond these recognized applications, it provides analytical substrates for novel studies - for instance, on the structural and functional evolution of the human brain connectome. In this paper, we model, formulate, and rigorously analyze the problem of inferring the arrival order of nodes in a dynamic network from a single snapshot. We derive limits on solutions to the problem, present methods that approach this limit, and demonstrate the methods on a range of applications, from inferring the evolution of the human brain connectome to conventional citation and social networks, where ground truth is known.
TIMES: Temporal Information Maximally Extracted from Structures Abram Magner (co-primary author), Jithin K. Sreedharan (co-primary author), Ananth Grama, and Wojciech Szpankowski ACM International Conference on World Wide Web (WWW) 2018, (acceptance rate: 14.8%) [paper][slides][code]
Inferring the node arrival sequence from a snapshot of a dynamic network is an important problem, with applications ranging from identifying sources of contagion to flow of capital in financial transaction networks. Variants of this problem have received
significant recent research attention, including results on infeasibility of solution for prior formulations. We present a new formulation of the problem that admits probabilistic solutions for broad classes of dynamic network
models. Instantiating our framework for a preferential attachment model, we present effectively computable and practically tight bounds on the tradeoff curve between optimal achievable precision and density/recall. We also present
efficient algorithms for partial recovery of node arrival orders and derive theoretical and empirical performance bounds on the precision and density/recall of our methods in comparison to the best possible. We validate our methods
through experiments on both synthetic and real networks to show that their performance is robust to model changes, and that they yield excellent results in practice. We also demonstrate their utility in the context of a novel application
in analysis of the human brain connectome to draw new insights into the functional and structural organization and evolution of the human brain.
Revisiting Random Walk based Sampling in Networks: Evasion of Burn-in Period and Frequent Regenerations, Konstantin Avrachenkov, Vivek S. Borkar, Arun Kadavankandy and Jithin K. Sreedharan* Computational Social Networks, Springer, 2018 [paper]
In the framework of network sampling, random walk (RW) based estimation techniques provide many pragmatic solutions while uncovering the unknown network as little as possible. Despite several theoretical advances in this area, RW based sampling techniques
usually make a strong assumption that the samples are in stationary regime, and hence are impelled to leave out the samples collected before the burn-in period. This work proposes two sampling schemes without burn-in constraint
to estimate the average of an arbitrary function defined on the network nodes, for e.g. the average age of users in a social network. The central idea of the algorithms lies in exploiting regeneration of RWs at revisits to an aggregated
super-node or to a set of nodes and in strategies to enhance the frequency of such regenerations either by contracting the graph or by making the hitting set larger. Our first algorithm, which is based on Reinforcement Learning
(RL), takes advantage of the regeneration of RWs, and it uses stochastic approximation to derive an estimator. This method can be seen as intermediate between purely stochastic Markov Chain Monte Carlo iterations and deterministic
relative value iterations. We study this method via simulations on real networks and observe that its trajectories are much more stable than those of standard random walk based estimation procedures, and its error performance is
comparable to that of respondent driven sampling (RDS) which has a smaller asymptotic variance than many other estimators. The second algorithm, which we call the RT estimator, is a modified form of RDS that accommodates the idea
of regeneration. Simulation studies show that the mean squared error of RT estimator decays much faster than that of RDS with time.
Hamiltonian System Approach to Eigenvalue-Eigenvector Problem in Networks Konstantin Avrachenkov, Philippe Jacquet and Jithin K. Sreedharan* IEEE International Workshop on Multidimensional (nD) Systems (nDS) 2017 [paper]
Because of the significant increase in size and complexity of the networks, the distributed computation of eigenvalues and eigenvectors of graph matrices has become very challenging and yet it remains as important as before. In this paper we develop efficient
distributed algorithms to detect, with higher resolution, closely situated eigenvalues and corresponding eigenvectors of symmetric graph matrices. We model the system of graph spectral computation as physical systems with Lagrangian
and Hamiltonian dynamics. The spectrum of Laplacian matrix, in particular, is framed as a classical spring-mass system with Lagrangian dynamics. The spectrum of any general symmetric graph matrix turns out to have a simple connection
with quantum systems and it can be thus formulated as a solution to a Schr\"odinger-type differential equation. Taking into account the higher resolution requirement in the spectrum computation and the related stability issues
in the numerical solution of the underlying differential equation, we propose the application of symplectic integrators to the calculation of eigenspectrum. The effectiveness of the proposed techniques is demonstrated with numerical
simulations on real-world networks of different sizes and complexities.
Recovery of Vertex Orderings in Dynamic Graphs Abram Magner, Ananth Grama, Jithin K. Sreedharan, and Wojciech Szpankowski IEEE International Symposium on Information Theory (ISIT) 2017 [paper]
Many networks in the real world are dynamic in nature: nodes enter, exit, and make and break connections with one another as time passes. Several random graph models of these networks are such that nodes have well-defined arrival times. It is natural
to ask if, for a given random graph model, we can recover the arrival order of nodes, given information about the structure of the graph. In this work, we give a rigorous formulation of the problem in a statistical learning framework
and tie its feasibility, for a broad class of models, to several sets of permutations associated with the symmetries of the random graph model and graphs generated by it. Moreover, we show how the same quantities are fundamental
to the study of the information content of graph structures. We then apply our general results to the special cases of the Erd˝os-R´enyi and preferential attachment models to derive strong inapproximability results.
Inference in OSNs via Lightweight Partial Crawls Konstantin Avrachenkov, Bruno Ribeiro and Jithin K. Sreedharan* ACM SIGMETRICS / PERFORMANCE 2016, (acceptance rate: 13.5%) [paper][slides] [code]
Are Online Social Network (OSN) A users more likely to form friendships with those with similar attributes? Do users at an OSN B score content more favorably than OSN C users? Such questions frequently arise in the context of Social Network Analysis (SNA)
but often crawling an OSN network via its Application Programming Interface (API) is the only way to gather data from a third party. To date, these partial API crawls are the majority of public datasets and the synonym of lack
of statistical guarantees in incomplete-data comparisons, severely limiting SNA research progress. Using regenerative properties of the random walks, we propose estimation techniques based on short crawls that have proven statistical
guarantees. Moreover, our short crawls can be implemented in massively distributed algorithms. We also provide an adaptive crawler that makes our method parameter-free, significantly improving our statistical guarantees. We then
derive the Bayesian approximation of the posterior of the estimates, and in addition, obtain an estimator for the expected value of node and edge statistics in an equivalent configuration model or Chung-Lu random graph model of
the given network (where nodes are connected randomly) and use it as a basis for testing null hypotheses. The theoretical results are supported with simulations on a variety of real-world networks.
Distributed Spectral Decomposition in Networks by Complex Diffusion and Quantum Random Walk Konstantin Avrachenkov, Philippe Jacquet and Jithin K. Sreedharan* IEEE International Conference on Computer Communication (INFOCOM) 2016, (acceptance rate: 18.25%) [paper][slides][poster (Bell Labs Future days, Paris)]
In this paper we address the problem of finding top $k$ eigenvalues and corresponding eigenvectors of symmetric graph matrices in networks in a distributed way. We propose a novel idea called complex power iterations in order to decompose the eigenvalues
and eigenvectors at node level, analogous to time-frequency analysis in signal processing. At each node, eigenvalues correspond to the frequencies of spectral peaks and respective eigenvector components are the amplitudes at those
points. Based on complex power iterations and motivated from fluid diffusion processes in networks, we devise distributed algorithms with different orders of approximation. We also introduce a Monte Carlo technique with gossiping
which substantially reduces the computational overhead. An equivalent parallel random walk algorithm is also presented. We validate the algorithms with simulations on real-world networks. Our formulation of the spectral decomposition
can be easily adapted to a simple algorithm based on quantum random walks. With the advent of quantum computing, the proposed quantum algorithm will be extremely useful.
Comparison of random walk based techniques for estimating network averages Konstantin Avrachenkov, Vivek S. Borkar, Arun Kadavankandy and and Jithin K. Sreedharan* 5th International Conference on Computational Social Networks (CSoNet) 2016 [paper][slides][code]
Function estimation on Online Social Networks (OSN) is an important field of study in complex network analysis. An efficient way to do function estimation on large networks is to use random walks. We can then defer to the extensive theory of Markov chains
to do error analysis of these estimators. In this work we compare two existing techniques, Metropolis-Hastings MCMC and Respondent-Driven Sampling, that use random walks to do function estimation and compare them with a new reinforcement
learning based technique. We provide both theoretical and empirical analyses for the estimators we consider.
Distribution and Dependence of Extremes in Network Sampling Processes Konstantin Avrachenkov, Natalia M. Markovich and Jithin K. Sreedharan* [paper][slides]
Computational Social Networks, Springer, 2015
Third International IEEE Workshop on Complex Networks and their Applications, Nov 2014
We explore the dependence structure in the sampled sequence of complex networks. We consider randomized algorithms to sample the nodes and study extremal properties in any associated stationary sequence of characteristics of interest like node degrees,
number of followers or income of the nodes in Online Social Networks etc, which satisfy two mixing conditions. Several useful extremes of the sampled sequence like $k$th largest value, clusters of exceedances over a threshold,
first hitting time of a large value etc are investigated. We abstract the dependence and the statistics of extremes into a single parameter that appears in Extreme Value Theory, called Extremal Index (EI). In this work, we derive
this parameter analytically and also estimate it empirically. We propose the use of EI as a parameter to compare different sampling procedures. As a specific example, degree correlations between neighboring nodes are studied in
detail with three prominent random walks as sampling techniques.
Spectrum sensing using distributed sequential detection via noisy reporting MAC Jithin K. Sreedharan and Vinod Sharma Signal Processing (Elsevier), Jan 2015 [paper]
This paper considers cooperative spectrum sensing algorithms for Cognitive Radios which focus on reducing the number of samples to make a reliable detection. We propose algorithms based on decentralized sequential hypothesis testing in which the Cognitive
Radios sequentially collect the observations, make local decisions and send them to the fusion center for further processing to make a final decision on spectrum usage. The reporting channel between the Cognitive Radios and the
fusion center is assumed more realistically as a Multiple Access Channel (MAC) with receiver noise. Furthermore the communication for reporting is limited, thereby reducing the communication cost. We start with an algorithm where
the fusion center uses an SPRT-like (Sequential Probability Ratio Test) procedure and theoretically analyze its performance. Asymptotically, its performance is close to the optimal centralized test without fusion center noise.
We further modify this algorithm to improve its performance at practical operating points. Later we generalize these algorithms to handle uncertainties in SNR and fading.
Nonparametric distributed sequential detection via universal source coding Jithin K. Sreedharan and Vinod Sharma Information Theory and Applications Workshop (ITA), Feb 2013 [paper]
We consider nonparametric or universal sequential hypothesis testing when the distribution under the null hypothesis is fully known but the alternate hypothesis corresponds to some other unknown distribution. These algorithms are primarily motivated from
spectrum sensing in Cognitive Radios and intruder detection in wireless sensor networks. We use easily implementable universal lossless source codes to propose simple algorithms for such a setup. The algorithms are first proposed
for discrete alphabet. Their performance and asymptotic properties are studied theoretically. Later these are extended to continuous alphabets. Their performance with two well known universal source codes, Lempel-Ziv code and KT-estimator
with Arithmetic Encoder are compared. These algorithms are also compared with the tests using various other nonparametric estimators. Finally a decentralized version utilizing spatial diversity is also proposed and analysed.
Spectrum Sensing via Universal Source Coding Jithin K. Sreedharan and Vinod Sharma IEEE Global Communications Conference (GLOBECOM), Dec 2012 [paper]
We consider nonparametric sequential hypothesis testing when the distribution under null hypothesis is fully known and the alternate hypothesis corresponds to some other unknown distribution. We use easily implementable universal lossless source codes
to propose simple algorithms for such a setup. These algorithms are motivated from spectrum sensing application in Cognitive Radios. Universal sequential hypothesis testing using Lempel Ziv codes and Krichevsky-Trofimov estimator
with Arithmetic Encoder are considered and compared for different distributions. Cooperative spectrum sensing with multiple Cognitive Radios using universal codes is also considered.
Novel algorithms for distributed sequential hypothesis testing K. S. Jithin and Vinod Sharma 49th Annual Allerton Conference on Communication, Control and Computing, Sep 2011 [paper]
This paper considers sequential hypothesis testing in a decentralized framework. We start with two simple decentralized sequential hypothesis testing algorithms. One of which is later proved to be asymptotically Bayes optimal. We also consider composite
versions of decentralized sequential hypothesis testing. A novel nonparametric version for decentralized sequential hypothesis testing using universal source coding theory is developed. Finally we design a simple decentralized
multihypothesis sequential detection algorithm.
A novel algorithm for cooperative distributed sequential spectrum sensing in Cognitive Radio Jithin K. Sreedharan and Vinod Sharma IEEE Wireless Communications and Networking Conference (WCNC), Mar 2011. [paper]
This paper considers cooperative spectrum sensing in Cognitive Radios. In our previous work we have developed DualSPRT, a distributed algorithm for cooperative spectrum sensing using Sequential Probability Ratio Test (SPRT) at the Cognitive Radios as
well as at the fusion center. This algorithm works well, but is not optimal. In this paper we propose an improved algorithm- SPRT-CSPRT, which is motivated from Cumulative Sum Procedures (CUSUM). We analyse it theoretically. We
also modify this algorithm to handle uncertainties in SNR’s and fading.
Cooperative distributed sequential spectrum sensing K. S. Jithin, Vinod Sharma, and Raghav Gopalarathnam IEEE National Conference on Communication (NCC), Jan 2011 [paper]
We consider cooperative spectrum sensing for cognitive radios. We develop an energy efficient detector with low detection delay using sequential hypothesis testing. Sequential Probability Ratio Test (SPRT) is used at both the local nodes and the fusion
center. We also analyse the performance of this algorithm and compare with the simulations. Modelling uncertainties in the distribution parameters are considered. Slow fading with and without perfect channel state information at
the cognitive radios is taken into account.
Theses
Ph.D. in Computer Science Institut national de recherche en informatique et en automatique (INRIA), INRIA-Bell Labs joint lab, and Univ. of Nice Sophia Antipolis, France Thesis title: Sampling and Inference in Complex Networks Thesis supervisor: Dr. Konstantin Avrachenkov (INRIA, France) Thesis jury: Reviewers - Prof. Don Towsley and Prof. Nelly Litvak, Examinators -
Dr. Alain Jean-Marie and Dr. Philippe Jacquet [abstract][thesis][slides][report of reviewers]
M.Sc. (Engg.) Dept. of Electrical Communication Engineering, Indian Institute of Science (IISc), Bangalore, India. Thesis title: Spectrum Sensing in Cognitive Radios using Distributed Sequential Detection Thesis
supervisor: Prof. Vinod Sharma External reviewer: Prof. Shankar Prakriya [abstract][thesis][slides][report of reviewer]
(Best MS thesis medal from the Dept. of Electrical Communication Engineering, IISc)