Sequence Homology Search Based on Database Indexing Using the Profile Hidden Markov Model.pdf
文本预览下载声明
Sequence Homology Search Based on Database
Indexing Using the Profile Hidden Markov Model
Qiang Xue , James Cole , Sakti Pramanik
Department of Computer Science and Engineering Department of Microbiology
Michigan State University, East Lansing, Michigan 48824, USA
Email: xueqiang@, pramanik@, colej@
Abstract— The Profile Hidden Markov Model (PHMM) has pairwise methods (e.g., BLAST or FASTA) that use a position-
received increasing attention in the field of protein homology independent scoring system.
detection, since profile-based methods are much more sensitive in
detecting distant homologous relationships than pairwise meth- B. The Profile Hidden Markov Model
ods. Pure dynamic-programming-based systems are often used
for PHMM searches. However, these dynamic-programming- Functional biological sequences typically come in families.
based systems are very time consuming for a large database. For Just as a pairwise alignment captures the relationship between
instance, it may take approximately 15 minutes to search a short two sequences, a multi-sequence alignment can show how the
model of length 12 in the GenBank protein sequence database. sequences in a family relate to each other. It is desirable to
Instead of searching the database sequentially, we search the
provide a consensus model for a multi-sequence alignment, so
database based on a tree-structured database indexing, called
the HD-tree. The HD-tree is able to reduce the PHMM search that the relationship between a new sequence and the family
time significantly with
显示全部