文档详情

A probabilistic measure for alignment-free sequence comparison.pdf

发布:2017-04-09约3.12万字共7页下载文档
文本预览下载声明
BIOINFORMATICS Vol. 20 no. 18 2004, pages 3455–3461 doi:10.1093/bioinformatics/bth426 A probabilistic measure for alignment-free sequence comparison Tuan D. Pham1,? and Johannes Zuegg2 1School of Computing and Information Technology, Griffith University, Nathan Campus, QLD 4111, Australia and 2Alchemia Ltd, PO Box 6242, Upper Mount Gravatt, QLD 4122, Australia Received on March 1, 2004; revised on June 28, 2004; accepted on July 26, 2004 Advance Access publication July 22, 2004 ABSTRACT Motivation: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two bio- logical sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models. Results: The method was tested against six DNA sequences, which are the thrA, thrB and thrC genes of the threonine oper- ons from Escherichia coli K-12 and from Shigella flexneri ; and one random sequence having the same base composi- tion as thrA from E.coli. These results were compared with those obtained from CLUSTAL W algorithm (alignment-based) and the chaos game representation (alignment-free). The method was further tested against a more complex set of 40 DNA sequences and compared with other existing sequence similarity measures (alignment-free). Availability: All datasets and computer codes written in MATLAB are available upon request from the first author. Contact: t.pham@.au INTRODUCTION There have been a number of computational and statistical methods for the comparison of biological sequences developed over the past decade. It still remains a challen- ging problem for the research community of computational biology (Ewens and Grant, 2001; Miller, 2001). Two dis- tinct bioinformatic methodologies for studying the similarity/ dissimilarity of sequences are known as alignment-based and ali
显示全部
相似文档