文档详情

基于直推学习的蛋白质亚细胞定位预测预处理方法.doc

发布:2017-09-10约1.12万字共7页下载文档
文本预览下载声明
 基于直推学习的蛋白质亚细胞定位预测预 处理方法# 曹隽喆,顾宏** 5 10 15 20 25 30 35 40 (大连理工大学控制科学与工程学院,大连 116023) 摘要:本文提出一种新的蛋白质亚细胞定位预测预处理方法,用于预先鉴定待测蛋白质具有 单定位点还是多定位点。基于直推学习技术,该方法同时利用待测蛋白质和已知蛋白质的信 息来估计每个待测蛋白质的亚细胞位置数量,从而识别并区分单定位点和多定位点蛋白质, 并且利用相应的单标签或多标签分类器来处理各个待测蛋白质以取得高精度的预测结果。该 方法的性能在三组蛋白质序列数据集上经行了测试,仿真实验表明本文的方法能有效鉴别出 单定位点与多定位点蛋白质,且能有效提高蛋白质亚细胞定位预测的整体精度。 关键词:控制理论与控制工程;生物信息学;蛋白质亚细胞定位;单定位点蛋白质;多定位 点蛋白质;直推学习 中图分类号:TP3-05 Pretreatment Method Based on Transductive Llearning for Protein Subcellular Localization Prediction CAO Junzhe, GU Hong (School of Control Science and Engineering, Dalian University of Technology, Dalian 116023) Abstract: In this paper, a new pretreatment method is proposed to identify whether a query protein is singleplex or multiplex in advance for improving the quality of protein subcellular localization prediction. Based on the transductive learning technique, this approach utilizes the information from the both query proteins and known proteins to estimate the subcellular location number of every query protein so that the singleplex and multiplex proteins can be recognized and distinguished, and then each query protein is dealt with by a targeted single-label or multi-label predictor to achieve a high-accuracy prediction result. The performance of the proposed approach is assessed by applying it to three groups of protein sequences datasets. Simulation experiments show that the proposed approach can effectively identify the singleplex and multiplex proteins, and the reliably of this method for improving the overall accuracy of predicting protein subcellular localization can also be verified. Key words: control theory and control engineering; bioinformatics; protein subcellular localization; singleplex protein; multiplex protein; transductive learning 0 引言 蛋白质亚细胞定位预测是目前生物信息学中一个重要的研究课题,基于计算技术和蛋白 质序列信息,人们提出了许多方法用于预测蛋白质亚细胞位置[1]。目前绝大部分相关研究都 是建立在“每个蛋白质只存在于一个亚细胞位置中”这样一个传统的生物学观点基础之上, 这样的蛋白质也称为单定位点蛋白质(Singleplex Proteins)。但最近的研究表明[2],有很多 具有特殊的生物功能的蛋白质能够同时存在于多个亚细胞位置中,这类蛋白质被称为多定位 点蛋
显示全部
相似文档