文档详情

An evaluation exercise for word alignment.pdf

发布:2017-04-10约2.87万字共10页下载文档
文本预览下载声明
An Evaluation Exercise for Word Alignment Rada Mihalcea Department of Computer Science University of North Texas Denton, TX 76203 rada@ Ted Pedersen Department of Computer Science University of Minnesota Duluth, MN 55812 tpederse@ Abstract This paper presents the task definition, re- sources, participating systems, and compara- tive results for the shared task on word align- ment, which was organized as part of the HLT/NAACL 2003 Workshop on Building and Using Parallel Texts. The shared task in- cluded Romanian-English and English-French sub-tasks, and drew the participation of seven teams from around the world. 1 Defining a Word Alignment Shared Task The task of word alignment consists of finding correspon- dences between words and phrases in parallel texts. As- suming a sentence aligned bilingual corpus in languages L1 and L2, the task of a word alignment system is to indi- cate which word token in the corpus of language L1 cor- responds to which word token in the corpus of language L2. As part of the HLT/NAACL 2003 workshop on ”Build- ing and Using Parallel Texts: Data Driven Machine Translation and Beyond”, we organized a shared task on word alignment, where participating teams were provided with training and test data, consisting of sentence aligned parallel texts, and were asked to provide automatically derived word alignments for all the words in the test set. Data for two language pairs were provided: (1) English- French, representing languages with rich resources (20 million word parallel texts), and (2) Romanian-English, representing languages with scarce resources (1 million word parallel texts). Similar with the Machine Transla- tion evaluation exercise organized by NIST 1 , two sub- tasks were defined, with teams being encouraged to par- ticipate in both subtasks. 1 /speech/tests/mt/ 1. Limited resources, where systems are allowed to use only the resources provided. 2. Unlimited resources, where systems are allowed to use any resources in addition to th
显示全部
相似文档