Abstract

MoDa-A Data Warehouse for Multi-“Omics” Data

Sudeshna Guha Neogi, Pauls Vasilis, Maria Krestyaninova, Misha Kapushesky, Ibrahim Emam, Alvis Brazma and Ugis Sarkans

The range of various “omics” technologies for measuring properties of biomolecular entities (e.g. transcripts, proteins, metabolites) in biological samples in a high throughput manner is continuing to increase. Information systems enabling integrative exploration of results of such experiments are needed. We have developed a system, MoDa (Molecular Data warehouse), that provides a unified framework for finding and visualizing results of various experimental techniques of molecular biology. The warehouse architecture is optimized for various types of filtering and querying annotations of samples, experimental results and properties of genes and other molecular entities. The implementation is based on the BioMart technology, with enhanced means for manipulating multidimensional data. The user interface is a web-based application. An important consideration for every data warehousing project is data acquisition and cleaning. To ensure that the data uploaded into the warehouse is consistent and sufficiently well-annotated for further statistical analyses, we implemented a repository for sample and research subject data, experimental metadata, and experimental results. A gene re-annotation pipeline was used to provide a uniform reference system for the collected data along the bioentity (“gene”) dimension. We expect that the developed data warehousing infrastructure can be useful for collaborative projects employing high throughput molecular biology technologies.