A probabilistic measure for alignment-free sequence comparison.pdf
文本预览下载声明
BIOINFORMATICS
Vol. 20 no. 18 2004, pages 3455–3461
doi:10.1093/bioinformatics/bth426
A probabilistic measure for alignment-free
sequence comparison
Tuan D. Pham1,? and Johannes Zuegg2
1School of Computing and Information Technology, Griffith University,
Nathan Campus, QLD 4111, Australia and 2Alchemia Ltd, PO Box 6242,
Upper Mount Gravatt, QLD 4122, Australia
Received on March 1, 2004; revised on June 28, 2004; accepted on July 26, 2004
Advance Access publication July 22, 2004
ABSTRACT
Motivation: Alignment-free sequence comparison methods
are still in the early stages of development compared to those
of alignment-based sequence analysis. In this paper, we
introduce a probabilistic measure of similarity between two bio-
logical sequences without alignment. The method is based on
the concept of comparing the similarity/dissimilarity between
two constructed Markov models.
Results: The method was tested against six DNA sequences,
which are the thrA, thrB and thrC genes of the threonine oper-
ons from Escherichia coli K-12 and from Shigella flexneri ;
and one random sequence having the same base composi-
tion as thrA from E.coli. These results were compared with
those obtained from CLUSTAL W algorithm (alignment-based)
and the chaos game representation (alignment-free). The
method was further tested against a more complex set of 40
DNA sequences and compared with other existing sequence
similarity measures (alignment-free).
Availability: All datasets and computer codes written in
MATLAB are available upon request from the first author.
Contact: t.pham@.au
INTRODUCTION
There have been a number of computational and statistical
methods for the comparison of biological sequences
developed over the past decade. It still remains a challen-
ging problem for the research community of computational
biology (Ewens and Grant, 2001; Miller, 2001). Two dis-
tinct bioinformatic methodologies for studying the similarity/
dissimilarity of sequences are known as alignment-based
and ali
显示全部