New Method to Reduce the Size of Codebook in Vector Quantization of Images

The vector quantization method for image compression inherently requires the generation of a codebook which has to be made available for both the encoding and decoding processes. That necessitates the attachment of this codebook when a compressed image is stored or sent. For the purpose of improving the overall efficiency of the vector quantization method, the need arose for improving a means for the reduction of the codebook size. In this paper, a new method for vector quantization is presented by which the suggested algorithm reduces the size of the codebook generated in vector quantization. This reduction is performed by sorting the codewords of the codebook then the differences between adjacent codewords are computed. Huffman coding (lossless compression) is performed on the differences in order to reduce the size of the codebbook.


Introduction
A fundamental goal of data compression is to reduce the bit rate for transmission or data storage while maintaining an acceptable fidelity or image quality [3]. Image data compression has been pushed to the forefront of the image processing field. This is largely a result of the rapid growth in computer power, the corresponding growth in the multimedia market, and the advent of the world wide web, which makes the Internet easily accessible for everyone. Additionally, the advances in video technology, including high-definition television, are creating a demand for a new, better, and faster image compression technology [5]. Another area for the application of efficient coding is where pictures are stored in a database, such as archiving medical images, multispectral images, finger prints, and drawing [3].
There are two primary types of image compression methods. The first type is called lossless methods because no data are lost, and the original image can be retrieved exactly from the compressed data. The second type is called lossy methods because they allow a loss in the actual image data, so the original uncompressed image cannot be retrieved exactly from the compressed file [5].
Image data compression using vector quantization(lossy compression) has received a considerable attention because of its simplicity and adaptability [4]. The first empirical design scheme is suggested by Linde,Buzo, and Gray [2], and thus named LBG algorithm, This scheme is a generalization of Lioyd PCM design technique [1].Vector quantization(VQ) requires the input image to be processed as vectors and finds the best or closest match, based on some distortion criterion, from its stored codebook. The address of the best match is then transmitted to the decoder. The decoder accesses an entry from an identical codebook, thus obtaining the reconstructed vector. Data compression is achieved in this process because the transmission of the address requires fewer bits than transmitting the vector itself [4]. A review of vector quantization techniques used for image coding is presented by Nasrabadi and King [3].
The codebook is needed in the encoding and decoding process. In this paper the reduction of the codebook size is achieved by taking the differences between the closest codewords. It is noted that small differences occurred more frequently than large differences, so Huffman coding is applied on the differences.
The Huffman code, developed by D.Huffman in 1952, is a minimum length code. This means that, given the statistical distribution of the gray levels(histogram), the Huffman algorithm will generate a code that is as close as possible to the minimum bound, the entropy. This method results in a variable length code, a property useful in reducing the size of the codebook. A small length code should be given to the differences occurring frequently.

Vector Quantization
A vector quantizer can be defined as a mapping Q of k-dimensional Euclidean space into a finite subset Y of R k . A mapping Q is defined as follows: is the set of reproduction vectors (codewords) and N the number of vectors in Y [9]. This finite set Y is called a VQ codebook or VQ table. By choosing the size of codebook Y, we can control the transmission rate of a VQ coding process. Our goal is to select an optimal codebook Y of size N that results in the lowest possible distortion among all possible codebooks of the same size [1]. VQ requires the input image to be processed as vectors or blocks of image pixels [4]. The encoder views the input vector X and generates the address of the codeword specified by Q(X). The address is then transmitted to the decoder. The decoder uses this address to generate the reconstructed vector  X see figure (1). If a distortion measured ) , (  X X d which represents the penalty or cost associated with reproducing vectors X by  X is defined, then the best mapping Q is the one which minimized ) , (  X X d [3]. The Most Common measure of distortion between X and  X is the sequared Euclidean distance, given by [8].

Codebook Design
The goal of designing an optimal vector quantizer is to obtain a codebook consisting of N codewords, such that it minimizes the expected distortion.
This algorithm generates a codebook Y containing N codewords as Where i=1,2,3,…………….,N; N=number of codewords. The algorithm for LBG is given in [3] as follows: 1-Let N=number of levels(codewords); distortion threshold є >=0. Assume an initial N level reproduction alphabet (codebook) Y0, and a training sequence (Xj:j=0,1,--------,n-1), and m=number of iteration, set to zero.  , increment m to m+1, and go to (2). In the above iteration algorithm, initial codewords must be assumed. As reported by Equitis, the performance of his algorithm is better than that of the LBG algorithm with a randomly selected initial codebook. As suggested by Equits, an even better codebook can be constructed by first using the Equitis algorithm to obtain the initial codebook and then using the LBG algorithm to refine the codebook.
At the beginning of the Equitis NN algorithm, all the training vectors are viewed as initial codewords. Then, the two nearest codewords are merged and replaced by a new codeword which is a weighted average of the two merged codewords. Hence, the number of codewords after merging is reduced by one. This process is repeated until the desired number of codewords is reached. If all the possible pairs are to be compared at each merging step, the total computation becomes too numerous [1].
In this work, to ease the computation, the initial codebook is specified as follows: at the beginning all the training vectors are viewed as initial codewords. These codewords are sorted depending on the squared Euclidean distance criteria . Then, the two adjacent codewords are merged and replaced by the average of the two codewords, so at the end the number of codewords is divided by two. The merging process is repeated until the desired size of the codebook is reached.
After the design of the codebook is completed, the enhancement of the codebook size is obtained as follows: 1.Sort the codewords of the codebook depending on the squared Euclidian distance criteria(2.2). Hence, we obtain the successive codewords as close as possible. 2.Compute the differences between successive codewords, It is found that these differences are frequently of small values. 3.Apply Huffman coding(variable length code) on the differences obtained in step2.

Results and Discussion
The VQ of images are performed on different images of size 512*512 pixels and 256 gray levels(8 bit per pixel). The initial codebook is specified as suggested in this paper, sorting all training vectors then merging every successive vector. The merging process is repeated until the desired size of the codebook is reached.The size of codebook is 1024 codewords. The training vectors are obtained by scanning the image with a fixed size window(4*4). The fidelity criteria used to measure the amount of error in the reconstructed (decompressed) image is the peak signal-to-noise ratio which is defined in an NN decompressed image as follows (Umbaugh, 1998  The codebook would be stored or transmitted as well as the codebook addresses. We obtain the reduction in the codebook size by ordering the codewords, then taking the differences between successive ones. After that Huffman coding (Lossless compression) is performed. The number of bits required for each codeword pixel element are tabulated in table (2). The decoder accesses the codewords specified by the addresses. Each codeword pixel element represents a pixel value of the decoded image. We have 256 gray level images, so 8 bits per codeword pixel element are needed. The bit per codeword pixel element obtained in the enhanced codebook, shown in table (2), is less than that of the LBG codebook.

Conclusion
Vector quantization is one of the most important methods in the field of image compression, this method is applied onto three images and the result provides a high compression ratio with a good image quality. In this paper, it had been shown that the size of the codebook can be reduced by sorting the codeword in the codebook depending on the squared Euclidean distance criteria, then the differences between successive codewords are computed. After that, Huffman coding is applied on the differences.