ABSTRACT Automatic Summarization of English Broadcast News Speech.pdf
文本预览下载声明
Automatic Summarization of
English Broadcast News Speech
Chiori Hori?, Sadaoki Furui?, Rob Malkin?, Hua Yu? and Alex Waibel?
?Department of Computer Science, Tokyo Institute of Technology,
2-12-1, O-okayama, Meguro-ku, Tokyo, 152-8552 Japan
{chiori,furui}@furui.cs.titech.ac.jp
?Interactive Systems Labs, Carnegie Mellon University, Pittsburgh, PA 15213, USA
{malkin,hua,ahw}@
ABSTRACT
This paper proposes an automatic speech summarization technique
for English. In our proposed method, a set of words maximizing
a summarization score indicating appropriateness of summariza-
tion is extracted from automatically transcribed speech and con-
catenated to create a summary. The extraction process is performed
using a Dynamic Programming (DP) technique according to a tar-
get compression ratio. In this paper, English broadcast news speech
transcribed using a speech recognizer is automatically summarized.
In order to apply our method, originally proposed for Japanese, to
English, the model of estimating word concatenation probabilities
based on a dependency structure in the original speech given by a
Stochastic Dependency Context Free Grammar (SDCFG) is mod-
ified. A summarization method for multiple utterances using two-
level DP technique is also proposed. The automatically summa-
rized sentences are evaluated by a summarization accuracy based
on the comparison with the manual summarization of correctly
transcribed speech by human subjects. Experimental results show
that our proposed method effectively extracts relatively important
information and remove redundant and irrelevant information from
English news speech.
Keywords
Speech summarization, Summarization scores, Two-level Dynamic
Programming, Stochastic Dependency Context Free Grammar, Sum-
marization accuracy
1. INTRODUCTION
Recently, large-vocabulary continuous-speech recognition (LVCSR)
technology has made significant advancement. Real time systems
can now achieve word accuracy of 90 % and above for speech dic-
tated
显示全部