Text compression using recency rank with context and relation to context sorting, block sor.pdf
文本预览下载声明
Text Compression using Recency Rank with Context and Relation to
Context Sorting, Block Sorting and PPM*
Kunihiko Sadakane
Department of Information Science, University of Tokyo
Abstract
Recently block sorting compression scheme was devel-
oped and relation to statistical scheme was studied,
but theoretical analysis of performance has not been
studied well. Context sorting is a compression scheme
based on context similarity and it is regarded as an on-
line version of the block sorting and it is asymptotically
optimal. However, the compression speed is slower and
the real performance is not so better.
In this paper, we propose a compression scheme us-
ing recency rank code with context (RRC), which is
based on the context similarity. The proposed method
encodes characters to recency ranks according to their
contexts. It can be implemented using sux tree
and the recency rank code is realized by move-to-front
transformation of edges in the sux tree. It is faster
than the context sorting and is also asymptotically op-
timal. The performance is improved by changing mod-
els according to the length of the context and by com-
bining some characters into a code. However, it is still
inferior to the block sorting in both performance and
speed. We investigate the reason of bad performance
and we also prove asymptotical optimality of a varia-
tion of the block sorting and make relation among the
RRC, the context sorting, the block sorting and PPM*
clear.
1 Introduction
Text data compression scheme can be divided into two
schemes: dictionary method and statistical method.
Recently a new compression scheme, called block sort-
ing [3], was developed by Burrows and Wheeler. It
seems not to belong to both methods, but in reality it
is related to the statistical method [7, 6]. Compression
speed of it is fast and decompression speed is much
faster than compression. Moreover, the required mem-
ory is small, therefore the block sorting is a promising
scheme. The practical performa
显示全部