speeding up the consensus clustering methodology for microarray data analysis加快微阵列数据分析的一致聚类方法.pdf
文本预览下载声明
Giancarlo and Utro Algorithms for Molecular Biology 2011, 6:1
/content/6/1/1
RESEARCH Open Access
Speeding up the Consensus Clustering
methodology for microarray data analysis
Raffaele Giancarlo1*, Filippo Utro2
Abstract
Background: The inference of the number of clusters in a dataset, a fundamental problem in Statistics, Data
Analysis and Classification, is usually addressed via internal validation measures. The stated problem is quite
difficult, in particular for microarrays, since the inferred prediction must be sensible enough to capture the inherent
biological structure in a dataset, e.g., functionally related genes. Despite the rich literature present in that area, the
identification of an internal validation measure that is both fast and precise has proved to be elusive. In order to
partially fill this gap, we propose a speed-up of Consensus (Consensus Clustering), a methodology whose
purpose is the provision of a prediction of the number of clusters in a dataset, together with a dissimilarity matrix
(the consensus matrix) that can be used by clustering algorithms. As detailed in the remainder of the paper,
Consensus is a natural candidate for a speed-up.
Results: Since the time-precision performance of Consensus depends on two parameters, our first task is to
show that a simple adjustment of the parameters is not enough to obtain a good precision-time trade-off. Our
second task is to provide a fast approximation algorithm for Consensus. That is, the closely related algorithm FC
(Fast Consensus) that would have the same precision as Consensus with a substantially better time performance.
The performance of FC has been assessed via extensive experiments on twelve benchmark datasets that
summarize key features of microarray applications, such as cancer studies, gene expression with up and down
p
显示全部