文档详情

modeling documents by combiningsemantic concepts withunsupervised statistical learning.ppt

发布:2017-08-13约6.41千字共24页下载文档
文本预览下载声明
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth Mark Steyvers Source: ISWC2008 Reporter: Yong-Xiang Chen Abstract Problem: mapping of an entire document or Web page to concepts in a given ontology Proposing a probabilistic modeling framework Based on statistical topic models (also known as LDA models) The methodology combines: human-defined concepts data-driven topics The methodology can be used to automatically tag Web pages with concepts From a known set of concepts Without any need for labeled documents Map all words in a document, not just entities Concepts from ontologies Ontology includes Ontological concepts Associated vocabulary The hierarchical relations between concepts Topics from statistical models and concepts from ontologies both represent “focused” sets of words that relate to some abstract notion Words that relate to “FAMILY” Statistical Topic Models LDA is a state-of-the-art unsupervised learning technique for extracting thematic information Topic model Words in a document arise via a two-stage process: Word-topic distribution: p(w|z) Topic-document distributions: p(z|d) Both learned in a completely unsupervised manner Gibbs sampling assign topics to each word in the corpus The topic variable z plays the role of a low-dimensional representation of the semantic content of a document Topic A topic is in the form of a multinomial distribution over a vocabulary of words Topic can in a loose sense be viewed as a probabilistic representation of a semantic concept How the statistical topic modeling techniques “overlay” probabilities on concepts within ontology? C: amount of human-defined concepts Each concept cj consists of a finite set of Nj unique words A corpus of documents such as Web pages How to merge these two sources of information based on topic model? “tagging” documents with concepts from the on
显示全部
相似文档