+44 1223 790975
Nam Ninh Nguyen, Wanwipa Vongsangnak, Bairong Shen, Phi-Vu Nguyen and Hon Wai Leong
Background: A bottleneck in investigating the cellular metabolism and physiology of organisms is the presence of metabolic gaps in the genome-scale metabolic networks. Metabolic gaps are reactions in the network that the corresponding genes have not yet been identified. Previous gap filling methods are generally based on identifying protein family in related organisms and then use this family to help for finding the target gene in a given genome. However, these methods fail when the protein family is not well-defined. There are therefore still many gaps in current metabolic networks. Here, we attempt to fill these gaps via an indirect approach by retrofitting protein function predictors and post-processing their results to identify the candidate genes.
Results: We developed a novel method for metabolic gap filling, called MeGaFiller that uses an ensemble of multiple retrofitted state-of-the-art protein function predictors. The ensemble scheme was adopted to boost the prediction performance. MeGaFiller can propose the candidate genes for 35% of the metabolic gaps in different metabolic networks (i.e. yeast, three filamentous fungi and bacterium). MeGaFiller can predict novel candidate up to hundreds genes for earlier annotated functions in the metabolic networks. MeGaFiller can also provide novel candidate genes for novel putative reactions throughout the metabolic networks.
Conclusions: MeGaFiller method demonstrates our first effort for filling metabolic gaps in the metabolic networks by retrofitted protein function predictors. It serves as a bioinformatics tool assisting for improved annotation through metabolic network reconstruction at a genome-scale.