毕业设计(论文)基于Lucene和Heritrix的新闻垂直搜索引擎的研究与实现.doc
文本预览下载声明
盐城师范学院毕业设计
基于Lucene和Heritrix的新闻垂直搜索引擎的研究与实现
摘 要
自Web 2.0时代以来,网络的信息数据量呈现出几何倍数的增长,使得搜索引擎成为广大网络用户快速查询和浏览网络信息的最佳选择。目前搜索引擎中比教有影响的且用户量比较大的有谷歌、百度、雅虎等,它们起着链接导航的作用。但是,这些通用搜索引擎也存在着一定的局限性,如:搜索引擎的信息量过大而造成了搜索的深度不够、查询的结果不够准确等问题。垂直搜索引擎便诞生在这样的背景下。本文重点研究并剖析了垂直搜索引擎及其相关的新兴技术。主要研究内容有如下几个方面:1.探讨了垂直搜索引擎的研究背景和实际应用。2.对搜索引擎的相关技术进行了比较深入的研究。3.阐述了Lucene和Heritrix的基本原理和使用方法。4.将Lucene与Heritrix同Web技术融合实现对新闻领域的垂直搜索引擎系统。
【关键词】 Lucene,网络爬虫,垂直搜索引擎,中文分词
Research and implementation of news vertical search engine based on Lucene and Heritrix
Abstract
Since the era of Web2.0, the data network presents exponentially, search engines have become the best choice for the majority of Internet users to query and browse the network information. The current search engine users have more influence than teaching are Google, Baidu, YAHOO and so on, they play this role in navigation links. However, these general search engines also have certain limitations, such as search engine information overload caused by the problem of search depth, query results are not accurate. Vertical search engine was born in this background. This paper focuses on the research and analysis of the vertical search engine and its related emerging technologies. The main research tasks are as follows: 1 discusses the background and the practical application of the vertical search engine. 2 The key technologies of search engine are studied. 3 describes the basic principle and method of using Lucene and Heritrix. 4 Lucene and Heritrix with Web technology to realize the information integration framework of vertical search engine system on the field of news.
[Key words] Lucene,web crawler,vertical search engine,Chinese word segmentation
盐城师范学院毕业设计
目 录
TOC \o 1-3 \h \z \u HYPERLINK \l _Toc452220487 1 绪论 PAGEREF _Toc452220487 \h 1
HYPERLINK \l _Toc452220488 1.1 研究背景与应用前景 PAGEREF _Toc452220488 \h 1
HYPERLINK \l _Toc452220489 1.2 本文的主要工作 PAGEREF _Toc452220489 \h 1
HYPERLINK \l _Toc452220490 1.3 论文的结构安排 PAGEREF _To
显示全部