We propose to design and analyze algorithms for data compression problems. We use results springing from new analytical and probabilistic analyses of digital trees and their generalizations, approximate pattern matching, and algorithms on strings. The problems we investigate range from the off-line Lempel-Ziv compression scheme, lossy extensions of the Lempel-Ziv algorithm, to the shortest superstring problem (lossless and lossy versions). Our main goal is to explore second-order properties (e.g., variances, limiting distributions, large deviations, etc.) of several existing and new data compression schemes in order to obtain more refined information about typical behavior of these schemes. In deriving our results we use a technique that belongs to the toolkit of the "analytical analysis of algorithms" which is novel in the context of data compression. We have already applied it to solve some open problems in the area.
CS Annual Report - 19 APR 1996