文档详情

Spidering and Filtering Web Pages for Vertical Search Engines Porter, M.F. “An algorithm f.pdf

发布：2015-09-25约3.08万字共7页下载文档

文本预览下载声明

Chau – Spidering and Filtering Web Pages for Vertical Search Engines Spidering and Filtering Web Pages for Vertical Search Engines Michael Chau The University of Arizona mchau@ 1 Introduction The size of the Web is growing exponentially. The number of indexable pages on the web has exceeded 2 billion (Lyman Varian, 2000). It is more difficult for search engines to keep an up-to-date and comprehensive search index, resulting in low precision and low recall rates. Users often find it difficult to search for useful and high-quality information on the Web. Domain-specific search engines (or vertical search engines) alleviate the problem to some extent, by allowing users to perform searches in a particular domain and providing customized features. However, it is not easy to build these search engines. There are two major challenges to building vertical search engines: (1) Locating relevant documents from the Web; (2) Filtering irrelevant documents from a collection. This study tries to address these issues and propose new approaches to the problems. 2 Research Background 2.1 Creating Vertical Search Engines Web search engines such as Google and AltaVista provide the most popular way to look for information on the Web. Many users begin their Web activities by submitting a query to a search engine. However, because of the enormous size of the Web, general-purpose search engines can no longer satisfy the needs of most users searching for specific information on a given topic. Many vertical search engines, or domain-specific search engines, have been built to facilitate more efficient search in various domains. These sea

显示全部

相似文档