Alignment-free comparison of genome sequences by a new numerical characterization.pdf
文本预览下载声明
Journal of Theoretical Biology 281 (2011) 107–112Contents lists available at ScienceDirectJournal of Theoretical Biology0022-51
doi:10.1
n Corr
E-mjournal homepage: /locate/yjtbiAlignment-free comparison of genome sequences by a new
numerical characterizationGuohua Huang a,n, Houqing Zhou a, Yongfan Li b, Lixin Xu a
a Department of Mathematics, Shaoyang University, Shaoyang, Hunan 422000, China
b Hunan First Normal College, Changsha, Hunan 410002, Chinaa r t i c l e i n f o
Article history:
Received 4 December 2010
Received in revised form
1 April 2011
Accepted 2 April 2011
Available online 28 April 2011
Keywords:
Alignment-free comparison
Graphical representation
DNA sequence
Numerical characterization
Phylogenetic tree93/$ - see front matter 2011 Elsevier Ltd. A
016/j.jtbi.2011.04.003
esponding author.
ail address: guohuahhn@163.com (G. Huang)a b s t r a c t
In order to compare different genome sequences, an alignment-free method has proposed. First, we
presented a new graphical representation of DNA sequences without degeneracy, which is conducive to
intuitive comparison of sequences. Then, a new numerical characterization based on the representation
was introduced to quantitatively depict the intrinsic nature of genome sequences, and considered as a
10-dimensional vector in the mathematical space. Alignment-free comparison of sequences was
performed by computing the distances between vectors of the corresponding numerical characteriza-
tions, which define the evolutionary relationship. Two data sets of DNA sequences were constructed to
assess the performance on sequence comparison. The results illustrate well validity of the method. The
new numerical characterization provides a powerful tool for genome comparison.
2011 Elsevier Ltd. All rights reserved.1. Introduction
Nucleotide molecules are basic data material to explore the
origin of life and metabolism of tissue, while comparative method
is a more important strategy to investigate sequences. Conven-
tional
显示全部