Abstract

Whole-Proteome Tree of Insects: An Information-Theory-Based “Alignment-Free” Phylogeny and Grouping of “Proteome Books”

JaeJin Choi, Byung-Ju Kim and Sung-Hou Kim*

Background: An “organism tree” of insects, the largest and most species-diverse group of all living animals, can be considered as a metaphorical and conceptual tree to capture a simplified narrative of the complex and unpredictable evolutionary courses of the extant insects. Currently, the most common approach has been to construct a “gene tree”, as a surrogate for the organism tree, by selecting a group of highly alignable regions of each of the select genes/proteins to represent each organism. However, such selected regions account for a small fraction of all genes/ proteins and even smaller fraction of whole genome of an organism. During last decades, whole-genome sequences of many extant insects became available, providing an opportunity to construct a “whole-genome or whole-proteome tree” of insects using Information Theory without sequence alignment (alignment-free method).

Results: A whole-proteome tree of the insects shows that (a) the demographic grouping-pattern is similar to those in the gene trees, but there are notable differences in the branching orders of the groups, thus, the sisterhood relationships between pairs of the groups; and (b) all the founders of the major groups have emerged in an “explosive burst” near the root of the tree.

Conclusion: Since the whole-proteome sequence of an organism can be considered as a “book” of amino-acid alphabets, a tree of the books can be constructed, without alignment of sequences, using a text analysis method of Information Theory. Such tree provides an alternative view-point of constructing a narrative of evolution and kinship among the extant insects.

Published Date: 2021-10-12; Received Date: 2021-09-21