building a statistical model for predicting cancer genes建立统计模型预测癌症的基因.pdf
文本预览下载声明
Building a Statistical Model for Predicting Cancer Genes
1 1 2 3 4
Ivan P. Gorlov *, Christopher J. Logothetis , Shenying Fang , Olga Y. Gorlova , Christopher Amos
1 Department of Genitourinary Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America, 2 Department of Cancer
Genetics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America, 3 Department of Epidemiology, The University of Texas MD
Anderson Cancer Center, Houston, Texas, United States of America, 4 Department of Community and Family Medicine, Geisel School of Medicine, Dartmouth College,
Lebanon, New Hampshire, United States of America
Abstract
More than 400 cancer genes have been identified in the human genome. The list is not yet complete. Statistical models
predicting cancer genes may help with identification of novel cancer gene candidates. We used known prostate cancer
(PCa) genes (identified through KnowledgeNet) as a training set to build a binary logistic regression model identifying PCa
genes. Internal and external validation of the model was conducted using a validation set (also from KnowledgeNet),
permutations, and external data on genes with recurrent prostate tumor mutations. We evaluated a set of 33 gene
characteristics as predictors. Sixteen of the original 33 predictors were significant in the model. We found that a typical PCa
gene is a prostate-specific transcription factor, kinase, or phosphatase with high interindividual variance of the expression
level in adjacent normal prostate tissue and differential expression between normal prostate tissue and primary tumor. PCa
genes are likely to have an antiapoptotic effect and to play a rol
显示全部