文档详情

terms and document representation with generalized latent semantic analysis.pdf

发布:2017-08-13约4.52万字共8页下载文档
文本预览下载声明
Terms and Document Representation with Generalized Latent Semantic Analysis Abstract resent orthogonal dimensions which makes an unre- alistic assumption about the independence of terms Document indexing and representation of within documents. term-document relations are very impor- Modifications of the representation space, such tant issues for document clustering and re- as representing dimensions with distributional term trieval. In this paper, we present General- clusters (Bekkerman et al., 2003) and expanding the ized Latent Semantic Analysis as a frame- document and query vectors with synonyms and re- work for computing semantically moti- lated terms as discussed in (Levow et al., 2005), im- vated term and document vectors. Our fo- prove the performance on average. However, they cus on term vectors is motivated by recent also introduce some instability and thus increased success of co-occurrence based measures variance (Levow et al., 2005). The language mod- of semantic similarity obtained from very elling approach (Ponte and Croft, 1998; Berger and large corpora. Our experiments demon- Lafferty, 1999) used in information retrieval uses strate that GLSA term vectors efficiently bag-of-words document vectors to model document capture semantic relations between terms and collection based term distributions. and outperform related approaches on the Since the document vectors are constructed in a synonymy test. We also show that term-
显示全部
相似文档