Ahsan Huda and Pierre R Bushel
Background: Transposable Elements (TEs) have long been regarded as selfish or junk DNA having little or no role in the regulation or functioning of the human genome. However, over the past several years this view came to be challenged as several studies provided anecdotal as well as global evidence for the contribution of TEs to the regulatory and coding needs of human genes. In this study, we explored the incorporation and epigenetic regulation of coding sequences donated by TEs using gene expression and other ancillary genomics data from two human hematopoietic cell-lines: GM12878 (a lymphoblastoid cell line) and K562 (a Chronic Myelogenous Leukemia cell line). In each cell line, we found several thousand instances of TEs donating coding sequences to human genes. We compared the transcriptome assembly of the RNA sequencing (RNA-Seq) reads with and without the aid of a reference transcriptome and found that the percentage of genes that incorporate TEs in their coding sequences is significantly greater than that obtained from the reference transcriptome assemblies using Refseq and Gencode gene models. We also used histone modifications chromatin immunoprecipitation sequencing (ChIP-Seq) data, Cap Analysis of Gene Expression (CAGE) data and DNAseI Hypersensitivity Site (DHS) data to demonstrate the epigenetic regulation of the TE derived coding sequences. Our results suggest that TEs form a significantly higher percentage of coding sequences than represented in gene annotation databases and these TE derived sequences are epigenetically regulated in accordance with their expression in the two cell types.