constructing endophenotypes of complex diseases using non-negative matrix factorization and adjusted rand index构建复杂疾病的表型使用非负矩阵分解和调整兰德指数.pdf
文本预览下载声明
Constructing Endophenotypes of Complex Diseases
Using Non-Negative Matrix Factorization and Adjusted
Rand Index
1 2 2 2 2
Hui-Min Wang , Ching-Lin Hsiao , Ai-Ru Hsieh , Ying-Chao Lin , Cathy S. J. Fann *
1 Institute of Public Health, Yang-Ming University, Taipei, Taiwan, 2 Institute of BioMedical Science, Academia Sinica, Nankang, Taipei, Taiwan
Abstract
Complex diseases are typically caused by combinations of molecular disturbances that vary widely among different
patients. Endophenotypes, a combination of genetic factors associated with a disease, offer a simplified approach to dissect
complex trait by reducing genetic heterogeneity. Because molecular dissimilarities often exist between patients with
indistinguishable disease symptoms, these unique molecular features may reflect pathogenic heterogeneity. To detect
molecular dissimilarities among patients and reduce the complexity of high-dimension data, we have explored an
endophenotype-identification analytical procedure that combines non-negative matrix factorization (NMF) and adjusted
rand index (ARI), a measure of the similarity of two clusterings of a data set. To evaluate this procedure, we compared it with
a commonly used method, principal component analysis with k-means clustering (PCA-K). A simulation study with gene
expression dataset and genotype information was conducted to examine the performance of our procedure and PCA-K. The
results showed that NMF mostly outperformed PCA-K. Additionally, we applied our endophenotype-identification analytical
procedure to a publicly available dataset containing data derived from patients with late-onset Alzheimer’s disease (LOAD).
NMF distilled informati
显示全部