基于图论和模式识别的表格结构分析方法-计算机应用与技术专业论文.docx
文本预览下载声明
垒堕!!!!
垒堕!!!! 一
Tab I e Structure Ana I ys i S based on Graph Theory and Pattern RecognitiOn
Writed by:HeYan(Computer Application)
Directed by:Cui Zhe
ABSTRACT
For the Iast three decades.the researches on document image analysis have brought US a fot of successfuI methods on character recognition and page segmentation of text—based documents.However,most of these methods were not designed to handle documents containing complex objects。such as table.
TSA.that iS to say,table structure analysis,iS comparatively difficult among the branches of document recognition,mainly because of the complexity of table image and the diversity of table content。The problem of TSA having been under research since 1 981.today there are really some practical and remarkable solutions.Meanwhile,few or n0 solutions take time complexity jnlo account.They are designed for a smalI amount of table-forms SO as to be unfit for numerous forms.
In order to recognize votes which are classified as a specific document, C1CA(ChengDu I nstitute of Computer Application Chinese Academy of Sciences)has devised a new vote system.A lot of national and cantonal congresses have witnessed the high efficiency and precision of it,
0n the basis of the vote system.we begin to contrive a form—reading system.which should be able tO recognize various forms including votes. Moreover,jt must have the ability to process a Iarge number of forms within an acceptable time.These aims lead US to pay most attention to robustness。 precision,and Iast not the Ieast.real-time processing.
I am honored to be engaged in the research on the form-reading system, and finally solve the problem of TSA bv a method based on graph theory and patlern recognition.This method iS iust the main content of this dissertation.
Keywords:Table Recognition,Form Reading,Layout Analysis,Structure
Analysis,Pattern Recognition
II
第一章引言第一章引言
第一章引言
第一章引言
1.1文档识别和表格识别
20世纪70年代中期,对文档识别的研究刚刚起步,当时人们的注意力主要 集中于全文本的文档,考虑如何划分版面并识别字符,当时的研究成果有根据文 档空白间隔特征划分版面和根据字符的轮廓线
显示全部