1. Introduction To search and summarize on Internet with Human Language Technology.pdf
文本预览下载声明
To search and summarize on Internet
with Human Language Technology
Hercules DALIANIS
Department of Computer and System Sciences
KTH and Stockholm University, Forum 100, 164 40 Kista, Sweden
Email:hercules@kth.se
Abstract. More and more text are available on the Internet and we need tools to tame
this flow. Automatic text summarization is one solution, a text is given to the computer
and it returns a non-redundant shorter text. Automatic text summarization can also be
used in search engines to decrease time finding documents. To further improve search
engines one can use human language technology in form of word analysis as stemming
and spell checking. Other methods that can be used are multilingual or cross language
information retrieval in searching and finding documents written in other languages
than the languages one has knowledge in. In understanding foreign languages one can
use machine translation techniques that today had become good enough for practical
use. Machine translation (MT) is the technique where the computer translates
automatically between natural languages. The MT-techniques have been developed
since the early 50’ies.
1. Introduction
The rapid change of our environment in form of more and more information available on the
Internet increased the speed of development of highly advanced tools to extract, filter,
retrieve and translate documents. Three research areas are automatic text summarization,
information retrieval tools and machine translation.
In automatic text summarization, the most relevant parts of a document are extracted
and put together into a non-redundant summary that is shorter than the original document. A
good overview of the area can be found in [1]. A more advanced form of summarization is
multi-text summarization where several documents are condensed into one summary.
2. Application areas of automatic text summarization
The application areas for automatic text summarization are extensive. As the amount of
information on the
显示全部