文档详情

基于贝叶斯算法分类的反垃圾邮件系统的改进-学士学位论文.doc

发布:2017-08-13约3.88万字共46页下载文档
文本预览下载声明
学位论文题目:基于贝叶斯算法分类的反垃圾邮件系统的改进 摘  要 电子邮件成为一种快捷、经济的现代通信技术手段,极大地方便了人们的通信与交流。然而,垃圾邮件的产生,影响了正常的电子邮件通信,占用了传输带宽,对系统安全造成了严重的威胁。因此,研究反垃圾邮件问题已经成为全球性的具有重大现实意义的课题。 目前,应对垃圾邮件的主要方法和手段是通过反垃圾邮件立法和使用邮件过滤技术进行处理,现已相继出现了多种邮件过滤技术。常用的包括黑/白名单技术、基于内容的分析方法以及基于规则的方法等。基于内容分析的技术正逐步进入邮件过滤技术当中,并成为当前研究热点,其中,基于内容分析的邮件过滤方法中的典型方法是基于贝叶斯算法的垃圾邮件过滤模型。 本论文对中文垃圾邮件的特点进行了比较系统的分析和研究,结合贝叶斯(Bayes)(CERNET) 95.8%和 %。结果表明基于贝叶斯算法的垃圾邮件过滤系统对拦截垃圾邮件有很好的作用。 关键词:电子邮件,垃圾邮件,邮件过滤,贝叶斯理论 Abstract The e-mail has become a quick and economical means of modern communication technology, which enormously facilitates peoples communication and exchanges. However, the emergence of spam has affected the normal email correspondence, and taken the transmission band width, even posed the serious threat to the system safety. Therefore, the study of anti-spam has become a global problem of great practical significance of the topic. At present, the main ways and means of the response to spam are the anti-spam legislation and the use of mail filtering technology. But now a variety of mail filtering technologies have appeared in succession, which are usually used including black / white list technologies, content-based analysis methods, and rule-based methods. Content-based analysis techniques are gradually entering the mail filtering technology which has become hot spots of current research. The typical method of content-based analysis mail filtering methods is based on Bayesian algorithm for spam filtering model. In this paper, the Chinese characteristics of spam has been studied and analyzed systematically. Combining with Bayesian (Bayes) theory, this paper constructs the spam filtering model which is based on Bayesian classification. In feature extraction, mutual information values are used. In the classification method, a classification method is introduced which is suitable in this article, and a more suitable expression in the Bayesian calculation method is adopted; the standard sample data sets of a large number of Chinese spam and r
显示全部
相似文档