文档详情

Text compression using recency rank with context and relation to context sorting, block sor.pdf

发布:2017-04-08约4.09万字共10页下载文档
文本预览下载声明
Text Compression using Recency Rank with Context and Relation to Context Sorting, Block Sorting and PPM* Kunihiko Sadakane  Department of Information Science, University of Tokyo Abstract Recently block sorting compression scheme was devel- oped and relation to statistical scheme was studied, but theoretical analysis of performance has not been studied well. Context sorting is a compression scheme based on context similarity and it is regarded as an on- line version of the block sorting and it is asymptotically optimal. However, the compression speed is slower and the real performance is not so better. In this paper, we propose a compression scheme us- ing recency rank code with context (RRC), which is based on the context similarity. The proposed method encodes characters to recency ranks according to their contexts. It can be implemented using sux tree and the recency rank code is realized by move-to-front transformation of edges in the sux tree. It is faster than the context sorting and is also asymptotically op- timal. The performance is improved by changing mod- els according to the length of the context and by com- bining some characters into a code. However, it is still inferior to the block sorting in both performance and speed. We investigate the reason of bad performance and we also prove asymptotical optimality of a varia- tion of the block sorting and make relation among the RRC, the context sorting, the block sorting and PPM* clear. 1 Introduction Text data compression scheme can be divided into two schemes: dictionary method and statistical method. Recently a new compression scheme, called block sort- ing [3], was developed by Burrows and Wheeler. It seems not to belong to both methods, but in reality it is related to the statistical method [7, 6]. Compression speed of it is fast and decompression speed is much faster than compression. Moreover, the required mem- ory is small, therefore the block sorting is a promising scheme. The practical performa
显示全部
相似文档