A comparison of different strategies for computing confidence intervals of the linkage dise.pdf
文本预览下载声明
A Comparison of Different Strategies for Computing Confidence Intervals of the Linkage
Disequilibrium Measure
S.K. Kim, K. Zhang, and F. Sun
Pacific Symposium on Biocomputing 9:128-139(2004)
A COMPARISON OF DIFFERENT STRATEGIES FOR COMPUTING
CONFIDENCE INTERVALS OF THE LINKAGE DISEQUILIBRIUM
MEASURE
D
S.K. KIM, K. ZHANG, F. SUN*
Molecular and Computational Biology Program, Department of Biological Sciences,
University of Southern California, Los Angeles, CA, 90089, USA
Email: {sungkkim, kuizhang, fsun}@usc.edu
Many linkage disequilibrium (LD) measures have been used to study LD patterns and for
haplotype block partitioning. We examine the properties of one of these measures, Lewontin’sD
, in order to understand the dependency of its confidence interval (CI) to allele frequency
and sample size as well as its applications in defining haplotype blocks. This measure and its
CIs were used to partition haplotypes into blocks by Gabriel et al. [1] as well as in many other
applications. Gabriel et al. [1] utilized a bootstrap approach to calculate the CI for
D
. Under
this method, over 1,000 bootstrap samples may be needed to obtain an accurate estimate of the
CI for each pair of single nucleotide polymorphism (SNP) markers which can be very
computationally intensive, particularly when many SNP markers are involved. We develop two
alternative methods for calculating the CI for
D
without bootstrap: one based on the
approximate variance of
D
given by Zapata et al. [2] and the other based on a maximum
likelihood estimate (MLE) of
D
together with Fisher Information theory. Both methods
depend on normal approximation for the estimates of
D
for large sample sizes. We assess and
compare the coverage of the CIs using the three methods through extensive simulations. We
define the coverage as the fraction of times the estimated CI contains the true value of
D
. In
general, the average coverage of the bootstrap method is less than the pre-specified coverage.
When the sample
显示全部