Download
You can download the source code of Metabolic Pathway Mining software in C programming language from here.
Related Publication
Algorithms for detecting frequent subgraphs in metabolic pathways are described in the paper "An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks" by Mehmet Koyuturk, Ananth Grama and Wojciech Szpankowski, which was published in ISMB 2004 special issue of Bioinformatics. You can download this paper in pdf and ps formats.
Description of Software
There are two main components of the software.
p2g is for converting metabolic pathways in KEGG's metabolic pathways in XML format to uniquely-labeled graphs. It can be used to convert either a single file or a directory of XML files. It requires two inputs pathway file name and graph file name, which will be interpreted as directory names if the first argument (pathway file name) does not have an ".xml" extension. A file that describes a uniquely labeled graph (with extension ".gr") format looks like the following:
name
3 4
a
1 2
b
c
0 2
"name" is simply a description of what the graph corresponds to. The second line displays the number of nodes and number of edges, respectively. This sample graph contains three nodes labeled "a", "b", and "c". The rest of the graph contains two lines for each node. The first line for a node contains the node label. After the declaration of the node label, the out-edges of that node in the graph are provided. For instance, "a" is linked to nodes 1 and 2, which are labeled "b" and "c", while the node labeled "b" has no outgoing edges. Observe that node labeled "c" has an outgoing edge to itself. Indeed, self-loops are allowed. Node index starts from 0.
pwmine takes a directory of uniquely-labeled graphs as input, along with two integer arguments; "minimum frequency threshold" and "maximum subgraph size" in terms of number of edges. It writes the resulting set of "maximal frequent subgraphs" in a file named "fsg.[mft]". Here, [mft] represents the minimum frequency threshold.
p2g can be complied using the command "make p2g". Similarly "make pwmine" compiles pwmine. "make" compiles both.
A comprehensive collection of metabolic pathways to be mined using this software can be found at the KEGG website, which is publicly accessible.
Questions & Bugs
Please direct any questions you have about this software to koyuturk@cs.purdue.edu. We will also be glad if you let us know about any bugs you discover or share any improvements you make in the code. We are hoping to provide a web interface for online mining of metabolic pathways in the near future.