文档详情

中文分词应用研究现状.ppt

发布:2016-11-11约字共54页下载文档
文本预览下载声明
* Bakeoff 2007 – 法国电信北京研发中心 Problems of NER with only local information “Many empirical approaches…make decision only on local context for extract inference, which is based on the data independent assumption. But often this assumption does not hold because non-local dependencies are prevalent in natural language.” Observation from Experiments: There are many seen named entities are missed; At least 10% of unseen and missed named entities have been labeled out correctly for at least once. “If the context surrounding one occurrence of a token sequence is very indicative of it being an entity, then this should also influence the labeling of another occurrence of the same token sequence in a different context that is not indicative of entity”. * Bakeoff 2007 – 法国电信北京研发中心 * Bakeoff 2007 – 法国电信北京研发中心 Local Features Unigram:Cn(n=-2,-1,0,1,2) Bigram:CnCn+1(n=-2,-1,0,1) and C-1C1 0/1 Features Assign 1 to all the characters which are labeled as entity and 0 to all the characters which are labeled as NONE in training data. In such way, the class distribution can be alleviated greatly , taking Bakeoff 2006 MSRA NER training data for example, if we label the corpus with 10 classes, the class distribution is: 0.81(B-PER), 1.70(B-LOC), 0.95(BORG), 0.81(I-PER), 0.88(I-LOC), 2.87(I-ORG), 0.76(EPER), 1.42(E-LOC), 0.94(E-ORG), 88.86(NONE) if we change the label scheme to 2 labels(0/1), the class distribution is: 11.14 (entity), 88.86(NONE) * Bakeoff 2007 – 法国电信北京研发中心 Non-local Features Token-position features(NF1) These refer to the position information(start, middle and last) assigned to the token sequence which is matched with the entity list exactly. These features enable us to capture the dependencies between the identical candidate entities and their boundaries. Entity-majority features(NF2) These refer to the majority label assigned to the token sequence which is matched with the entity list exactly. These features enable us to capture the dependencies between the identical e
显示全部
相似文档