文档详情

Blind Extraction of Dominant Target Sources Using ICA and Time-Frequency Masking.pdf

发布:2017-04-08约3.73万字共9页下载文档
文本预览下载声明
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 6, NOVEMBER 2006 2165 Blind Extraction of Dominant Target Sources Using ICA and Time-Frequency Masking Hiroshi Sawada, Senior Member, IEEE, Shoko Araki, Member, IEEE, Ryo Mukai, Senior Member, IEEE, and Shoji Makino, Fellow, IEEE Abstract—This paper presents a method for enhancing target sources of interest and suppressing other interference sources. The target sources are assumed to be close to sensors, to have domi- nant powers at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e., without knowing the po- sition and active time of each source. We consider a general case where the total number of sources is larger than the number of sensors, and neither the number of target sources nor the total number of sources is known. The method is based on a two-stage process where independent component analysis (ICA) is first em- ployed in each frequency bin and then time-frequency masking is used to improve the performance further. We propose a new so- phisticated method for deciding the number of target sources and then selecting their frequency components. We also propose a new criterion for specifying time-frequency masks. Experimental re- sults for simulated cocktail party situations in a room, whose rever- beration time was 130 ms, are presented to show the effectiveness and characteristics of the proposed method. Index Terms—Blind source extraction, blind source separation (BSS), convolutive mixture, frequency domain, independent com- ponent analysis, permutation problem, time-frequency masking. I. INTRODUCTION THE technique for estimating individual source componentsfrom their mixtures at sensors is known as blind source separation (BSS) [1]–[4]. With some applications such as brain imaging or wireless communications, it makes sense to extract as many source components as possible, because many sources are equally important. However, with audio applications s
显示全部
相似文档