文档详情

L-ISALearning Domain Specific ISA relations.ppt

发布:2017-03-27约7.69千字共22页下载文档
文本预览下载声明
L-ISA Learning Domain Specific ISA relations from the WEB Alessandra Potrich and Emanuele Pianta Fondazione Bruno Kessler - IRST Trento, Italy Overview Learning ISA relations in the patent processing domain (the PatExpert Project) The L-ISA algorithm Evaluation Future Work Ontology Learning/Population Ontology Learning: acquisition of new concepts and relations between them e.g., a device is an artifact Ontology Population: acquisition of factual knowledge about specific instances e.g. Einstein is an instance of a scientist e.g. Einstein was born in 1879 PATExpert Funded by the European Union Aim: improving patent retrieval, summarization, paraphrasing, classification and valuing through shallow and deep semantic analysis Main semantic analysis task: recognizing occurrences of KB concepts and relations Proof of the concept on two domains Optical Recording Machine Tools Focus of the presentation: Ontology Learning in the Optical Recording domain Optical Recording Domain Ontology (ORDO) Based on the Owl formalism Built in three stages 200 hundreds manually crafted concepts: starting from a list of the most frequent terms in a reference corpus Pro-ISA: ontology learning algorithm based on projection of WordNet fragments onto ORDO L-ISA: ontology learning algorithm based on acquisition of isa templates from the Web Patent Concept Annotation Given a target word: disambiguate it, by assigning a WN synset whose domain is compatible with the optical recording domain (exploiting WORDNET-DOMAINS) If the synset is linked to an ORDO concept annotate the target word with the ORDO concept Otherwise: apply Pro-ISA Otherwise: apply L-ISA Choosing the right sense Senses for the word “CD”: 1. cadmium, Cd, atomic_number_48 (CHEMISTRY) 2. candle, candela, cd, standard candle (PHISYCS) 3. certificate of deposit, CD (MONEY) 4. compact disk, compact disc, CD (COMPUTER, MUSIC) Direct Concept Annotation Pro-ISA 1: Looking for
显示全部
相似文档