Programs and data used for the ICSE'09 paper:

Listening to Programmers: Taxonomies and Characteristics of Comments in Operating System Code


We make publicly available our programs and data in order for other people to:

We have chosen to publish only the binary versions of our program, mainly because the source code does not satisfy yet the quality required by open source software. Still, the source version of our programs may be requested by email to the authors. In such case you will need to install an OCaml programming environment to compile the source.

The installation procedure is for the moment quite difficult. Also parts of the program may still contain some hard-coded constant specific to our OS study. We apologize for any inconvenience and welcome bug reports.


Taxonomy file in original format and html-friendly format

Samples file in original format and in html-friendly format with the expanded code and taxonomy headers.

Taxonomy overview with a few examples of comments pdf


There is two x86 Linux binaries. The first binary, CComment-builder is used to build some meta-data and indexing information about the comments in the source code. Those meta-information can then be used by the second binary, CComment-browser to navigate among those comments across multiple dimensions and to possibly use the taxonomy file and samples file mentioned before. Here is a screenshot of the CComment browser screenshot1.


After getting the 2 previous binaries you will first need to set some environment variables to specify where to find the configuration files used by the binaries. Here is the tarfile containing those files. Untar it, go in the CComment directory and then execute under bash source to set the appropriate environment variables. You should now be able to run the 2 binaries.

To benefit from all the features of the CComment browser you may optionally need to install the R statistical tool and ghostview.

We also assume that the Linux environment contains Unix command line utilities such as find.


To build the meta-data for a C project contained in the /home/pad/foo/ directory and store the resulting meta-data in /tmp/foo_metadata do: ./CComment-builder /home/pad/foo/ -lfs_metapath /tmp/foo_metadata

To navigate among the comments of this C project then do ./CComment-browser /tmp/foo_metadata

You can then use the menu to for instance load a samples file

Reproducing our OS study numbers

To reproduce the numbers in the paper based on our taxonomy and samples file, you first need to store in the hard-coded directory /home/yyzhou/pad/software-os-src the Linux, Freebsd, and OpenSolaris source code retrieved the 12 February 2008 and arrange them in the way specified in the samples file mentioned above. You will also need the version control system information from Linux, FreeBSD and OpenSolaris to benefit from the time-tracking ability of the CComment browser. We plan to soon make available a (huge) tar-file containing the preceding information as soon as our web-server bandwidth problem will be solved. In any case you can still contact us directly to get this tar-file.

Then you need to build the meta-data information as explained before. It can take more than 10 hours for the 3 OS, mainly because of the time-tracking extraction that requires to communicate with the FreeBSD CVS server.

Then you can navigate among the comment and load our OS sample file mentioned above.

In case of graphical problem, you can also use the command line to print some of our statistical information in textual format. Use: ./CComment-builder -lfs_metapath /tmp/os_metadata -print_stat data/samples/comments.sample

Other Publications

Taxonomy overview with a few examples of comments pdf

Technical report (draft) on the taxonomy and samples in pdf


Please email the authors ( for more complete data sets.


The binaries may require some dynamic libraries not compatible with some Linux distributions.

License and Disclaimer