Journal of Proteomics & Bioinformatics

Journal of Proteomics & Bioinformatics
Open Access

ISSN: 0974-276X


GeneNarrator: Mining the Literaturome for Relations Among Genes

Jing Ding, Daniel Berleant, Jun Xu, Kenton Juhlin, Eve Wurtele and Andy Fulmer

The rapid development of microarray and other genom ic technologies now enables biologists to monitor t he expression of hundreds, even thousands of genes in a single experiment. Interpreting the biological m eaning of the expression patterns still relies largely on biologist's domain knowledge, as well as on information collected from the literature and various public databases. Yet i ndividual experts’ domain knowledge is insufficient for large data sets, and collecting and analyzing this information manually from the literature and/or public databases is tedious and time-consuming. Computer-aided functional analy sis tools are therefore highly desirable.

We describe the architecture of GeneNarrator, a tex t mining system for functional analysis of microarr ay data. This system’s primary purpose is to test the feasib ility of a more general system architecture based o n a two-stage clustering strategy that is explained in detail. Gi ven a list of genes, GeneNarrator collects abstract s about them from PubMed, then clusters the abstracts into funct ional topics in a first clustering stage. In the s econd clustering stage, the genes are clustered into groups based on similarities in their distributions of occurrence across topics. This novel two-stage architecture, the primary cont ribution of this project, has benefits not easily p rovided by one- stage clustering.