面向不平衡数据集的改进型SMOTE算法.docx
文本预览下载声明
ISSN1673-9418CODENJKYTA8JournalofFrontiersofComputerScienceandTechnology1673-9418/2014/08(06)-0727-08doi:10.3778/j.issn.1673-9418.1403003E-mail:fcst@Tel:+86-10向不平衡数据集的改进型SMOTE算法王超学1,张涛1+,马春森21.西安建筑科技大学信息与控制工程学院,西安7100552.中国农业科学院植物保护研究所,北京100193ImprovedSMOTEAlgorithmforImbalancedDatasetsWANGChaoxue1,ZHANGTao1+,MAChunsen21.SchoolofInformationandControlEngineering,Xi’anUniversityofArchitectureandTechnology,Xi’an710055,China2.ChinaInstituteofPlantProtection,ChineseAcademyofAgriculturalSciences,Beijing100193,China+Correspondingauthor:E-mail:200445043@163.comWANGChaoxue,ZHANGTao,MAChunsen.ImprovedSMOTEalgorithmforimbalanceddatasets.JournalofFrontiersofComputerScienceandTechnology,2014,8(6):727-734.Abstract:BasedonanalyzingtheshortagesofSMOTE(syntheticminorityover-samplingtechnique)inthesynthesisofminorityclasssamples,thispaperpresentsanimprovedSMOTE(GA-SMOTE).ThekeyofGA-SMOTEliesonleadingthreebasicgeneticoperatorsofgeneticalgorithm(GA)intoSMOTE,makinguseoftheselectionoperatortoachievethedifferentsamplesfromtheminorityclassanddependingoncrossoveroperatorandmutationoperatortorealizethefinecontrolofthesynthesisqualitytotheminorityclasssamples.GA-SMOTEandSVM(supportvectormachine)arecombinedtohandletheclassificationproblemonimbalanceddatasets.AlargeamountofexperimentsontheUCIdatasetsshowthatGA-SMOTEpromisesprominentsynthesiseffecttotheminorityclasssamples,andbringsbetterclassificationperformanceonimbalanceddatasetswithSVM.Keywords:imbalanceddataset;classification;geneticoperator;syntheticminorityover-samplingtechnique(SMOTE)摘要:针对SMOTE(syntheticminorityover-samplingtechnique)在合成少数类新样本时存在的不足,提出了一种改进的SMOTE算法GA-SMOTE。该算法的关键将是遗传算法中的3个基本算子引入到SMOTE中,利用*TheNationalNaturalScienceFoundationofChinaunderGrantNo国家自然科学基金);theNaturalScienceFoundationofShaanxiProvinceofChinaunderGrantNo.2012JM8023(陕西省自然科学基金);theNaturalScienceFundofEducationDepart-mentofShaanxiProvinceunderGrantNo.12JK0726(陕西省教育厅自然科学基金专项).Received2014-02,Accepted2014-04.CNKI网络优先出版:2014-04-24,/kcms/doi/10.37
显示全部