Spidering and Filtering Web Pages for Vertical Search Engines Porter, M.F. “An algorithm f.pdf
文本预览下载声明
Chau – Spidering and Filtering Web Pages for Vertical Search Engines
Spidering and Filtering Web Pages
for Vertical Search Engines
Michael Chau
The University of Arizona
mchau@
1 Introduction
The size of the Web is growing exponentially. The number of indexable pages on the web has exceeded 2 billion (Lyman
Varian, 2000). It is more difficult for search engines to keep an up-to-date and comprehensive search index, resulting in low
precision and low recall rates. Users often find it difficult to search for useful and high-quality information on the Web.
Domain-specific search engines (or vertical search engines) alleviate the problem to some extent, by allowing users to
perform searches in a particular domain and providing customized features. However, it is not easy to build these search
engines. There are two major challenges to building vertical search engines: (1) Locating relevant documents from the Web;
(2) Filtering irrelevant documents from a collection. This study tries to address these issues and propose new approaches to
the problems.
2 Research Background
2.1 Creating Vertical Search Engines
Web search engines such as Google and AltaVista provide the most popular way to look for information on the Web. Many
users begin their Web activities by submitting a query to a search engine. However, because of the enormous size of the Web,
general-purpose search engines can no longer satisfy the needs of most users searching for specific information on a given
topic. Many vertical search engines, or domain-specific search engines, have been built to facilitate more efficient search in
various domains. These sea
显示全部