ISSN: 0974-276X
+44 1223 790975
Sima Naghizadeh, Vahid Rezaeitabar, Hamid Pezeshk and David Matthews
One of the important tools in analyzing and modeling biological data is the Hidden Markov Model (HMM), which is used for gene prediction, protein secondary structure and other essential tasks. An HMM is a stochastic process in which a hidden Markov chain called; the chain of states, emits a sequence of observations. Using this sequence, various questions about the underlying emission generation scheme can be addressed. Applying an HMM to any particular situation is an attempt to infer which state in the chain emits an observation. This is usually called posterior decoding. In general, the emissions are assumed to be conditionally independent from each other. In this work we consider some dependencies among the states and emissions. The aim of our research is to study a certain relationship among emissions, with a focus on the Markov property. We assume that the probability of observing an emission depends not only on the current state but also on the previous state and one of the previous emissions. We also use additional environmental information, and classify amino acids into three groups, using the Relative Solvent Accessibility (RSA). We also investigate how this modification might change the current algorithms for ordinary HMMs, and introduce modified Viterbi and Forward-Backward algorithms for the new model. We apply our proposed model to an actual dataset concerning prediction of the protein secondary structure and demonstrate improved accuracy compared to the ordinary HMM. In particular, the overall accuracy of our modified HMM, which uses the RSA information, is 63.95%. This is 5.9% higher than the prediction accuracy realized by using an ordinary HMM on the same dataset, and 4% higher than the corresponding prediction accuracy of a modified HMM that simply accounts for the dependencies among the emissions.