搜索引擎设计与实现.doc
文本预览下载声明
web搜索引擎的设计与实现
摘 要
随着网络的迅猛发展。网络成为信息的极其重要的来源地,越来越多的人从网络上获取自己所需要的信息,这就使得像Google[40],百度[39]这样的通用搜索引擎变成了人们寻找信息必不可少的工具。
本文在深入研究了通用搜索引擎基本原理、架构设计和核心技术的基础上,结合小型搜索引擎的需求,参照了天网,lucene等搜索引擎的原理,构建了一个运行稳定,性能良好而且可扩充的小型搜索引擎系统,本文不仅仅完成了对整个系统的设计,并且完成了所有的编码工作。
本文论述了搜索引擎的开发背景以及搜索引擎的历史和发展趋势,分析了小型搜索引擎的需求,对系统开发中的一些问题,都给出了解决方案, 并对方案进行详细设计,编码实现。论文的主要工作及创新如下:
1.在深刻理解网络爬虫的工作原理的基础上,使用数据库的来实现爬虫部分。
2.在深刻理解了中文切词原理的基础之上,对lucene的切词算法上做出了改进的基础上设计了自己的算法,对改进后的算法实现,并进行了准确率和效率的测试,证明在效率上确实提高。
3.在理解了排序索引部分的原理之后,设计了实现索引排序部分结构,完成了详细流程图和编码实现,对完成的代码进行测试。
4.在完成搜索部分设计后,觉得效率上还不能够达到系统的要求,于是为了提高系统的搜索效率,采用了缓存搜索页面和对搜索频率较高词语结果缓存的两级缓存原则来提高系统搜索效率。
关键词:搜索引擎,网络爬虫,中文切词,排序索引
ABSTRACT
With the rapidly developing of the network. Network became a vital information source, more and more people are obtaining the information that they need from the network,this making web search engine has become essential tool to people when they want to find some information from internet.
In this paper, with in-depth study of the basic principles of general search engines, the design and core technology architecture, combining with the needs of small search engine and in the light of the tianwang, lucene search engine, I build a stable, good performance and can be expanded small-scale search engine system, this
article not only completed the design of the entire system, but also basically completed all the coding work.
This article describle not only the background of search engines, but also the history of search engine developing and developing trends,and analyse the needs of small search engines and giving solutionsthe to the problems which was found in the development of the system ,and making a detailed program design, coding to achieve.
The main thesis of the article and innovation are as follows:
1.with the deep understanding of the working principle of the network spider.I acheived network spider with using database system.
2.with the deep understanding of Chinese segmentation and segmentation algorithm of lucene system,I m
显示全部