Simulation of Real Time 2D DWT Structure

This research contains an introduction to the first works in this field, and describes the Discrete Wavelet Transform DWT which is the only kind that can be implemented in the digital computer. The research lists the important structures that are used to implement the digital filters which are the hart of DWT. The research contains problems that face us in my M.Sc thesis to implement DWT image structure processor. In my M.Sc thesis we solve the problem of zero padding but we solve the second problem (waiting the column processor until the row processor finish its process) by using pipelineing technique. The pipeline technique solves the problem partially. This research solve the second problem completely by proposing a new structure which make the Two Dimentional DWT (2D-DWT) structure process the video in real time without waiting the row processor. The research was build fast structure that can decompose image by using DWT. The speed performance of the structure was tested using (Simulink) in (Matlab7). The results obtained from simulation of the proposed 2D DWT structure are compared with my M.Sc thesis structure and the traditional structure to show how we improve the process speed of 2D DWT structure.


Introduction
Vishiwanath is a researcher that suggest a structure that decompose a signal into multi-levels analyses (DWT levels), the proposed structure analyses the input signal and then re-input only the approximation signal to the same structure and so on in order to obtain the final level (the requested level). This structure is called the recursion structure. This structure take care the implementation cost but it cannot analysis the second signal until the process of the first one is complete (the delay time of the process of the first signal caused from the repetitive processing of the signal to receive the requested DWT level). For this reason Seung-kwon propose the semi recursive structure. This structure consists of two direct form structures, the first one analysis the signal one time and the other structure takes care of the other analysis to receive the requested DWT level. This structure is expensive but it process the consecutive signals in real time (16).
Fan Wenbing, Gao Yingmin suggest a 2D-DWT structure. It mainly consists of two one dimensional DWT units (1D-DWT) for horizontal and vertical transforms, a control unit realized as a finite state machine, and an internal memory block. For illustration see Fig 1. To process a sub-image, all rows are transferred to the FPGA over the PCI bus and transformed on the fly in the horizontal 1D-LWT unit using pipelining. The coefficients computed in this way are stored in internal memory of different types. The coefficients corresponding to the rows of the sub image itself are stored in single port RAM. Now the vertical transform levels can take place. This is done by the vertical 1D-LWT unit. The control unit coordinates these steps in order to process a whole sub image and is responsible for generating enable signals, address lines, and so on(6).

Type of Wavelet Transform
There are three types of wavelet transforms according to the type of the basis used in the transform and the type of the processed signal (4)(11) 1. Continuous wavelet transform (CWT) The important property in DWT transform is containing only multiplication and addition operations and this property is very suitable to digital computers (4). For this reason we concentrate in this type, and for more specific the fast one.

Fast Wavelet Transform (FWT)
Mallat in 1989 prove that there is an opportunity to use digital filters instead of the two functions (Wavelet and Scaling functions) in DWT to increase the processing speed in this transform (14)(5). Mallat replace the Scaling function with low pass filter and the Wavelet function with the high pass filter (5).

Analysis Stage of DWT
The following two equations represents the analysis stage of FWT: Where ha represents the low pass filter and ga represents the high pass filter. The following figure represents the analysis stage We can compute the Detail or Scaling coefficients by the following equation: Where (floor) represents the division quotient, L number of the samples in the discrete signal, N the length of the impulse response of the digital filter and M represents the number of the Scaling or Detail coefficients (9)(3).

2D Discrete Wavelet Transform (2D-DWT)
This kind of transformation processes each row in the picture as onedimensional signal. After the whole rows are processed in the picture, the same process is applied to the each column of the two pictures that resulted form the previous process. This process is illustrated in the figure (3)

1-D FIR Filter Structures
DWT uses only FIR filters therefore we must know some structure to implement the hardware of these kinds of filters. Note in this paper the symbol N represents the number of the sample in the impulse response of the filter and h is the impulse response of the filter.
We can implement the FIR filter using one of the following structures:

Direct Form Structure
An implementation can be directly derived from the definition of convolution in the time domain (12).

Linear Phase Structure
A variation of the direct form structure is the linear phase structure, which takes advantage of the symmetry in the impulse response coefficients for linear phase FIR filters to reduce the computational complexity of the filter implementation.(7).

Polyphase Filter Structure
If the filter coefficients are split into several individual filters through sampling of the impulse response, the derived filters are termed polyphase filters (12).

Decimation Filters
The principle configuration for decimation is shown if figure below: An input sequence x is filtered, and every m-1 filtered value is used for the output sequence. The symbol (↓M) used is meant to represent sampling with a ratio of m:1 (12).

Interpolation Filter
The structure of the interpolation filter is shown below: Where the symbol (↑M) represents up sampling with ratio of 1:M (15) .

The M.Sc 2D structure
My M.Sc thesis was use two processors (two 1D DWT structures) to construct the 2D DWT structure to process an image. This thesis introduced two different problems after simulation process, the first one was known the zero padding problem and The second problem is the 2nd processor cannot be operate until the 1st processor finishes its job. The zero padding was needed to separate the successive rows or columns and the number of zero padded was equal to the impulse response of the filter minus one, this means the processing time is grow up when the impulse response of the filter increased and vies-versa. The two problems prevent the 2D DWT structure from decomposing a movie. The first problem was completely solved by using a proposed  (10). The pipeline technique used the hardware professionally by making the 1st processor operate on the current image and the 2nd processor operates on the previous image, this means using pipeline technique hide the second problem not solve it (10).

proposed 2D DWT structure
The 2D-DWT structure consists of two 1D structures, the first one responsible for processing the row of the image and the second one is responsible for processing the column of the image. We use the same M.Sc proposed processor in the first processor. Our work is focused on the second processor (proposed structure) to let it process the partial results came from the first processor without waiting the whole column to be complete (whole image completion).
The image is growing up during convolution processing. The fig (6) illustrates the image grow up caused by convolution process applied to each row and column of an image. The section A represent the original image and the sections B,C and D represents the samples added to the original image due to convolution process. The samples that added to image make it very difficult to process movie on the same processor using traditional structures so that the M.Sc 1D structure is split in to two parts, one for processing the row of the image and the second process parallel the tail of the previous row.
The proposed structure consists of two parts, the first part receives its input from the row processor and works on each row sample one after another without waiting the completion of the row and produce two outputs, the first output the part A of the current image and the second output the part C of previous image. The second part receives its input row tail processor and work on each tail sample without waiting the completion of the tail and have two outputs, the first output part B of current image and the second output part D of the previous image. The four sections are operating on some times in parallel (when the image end and the structure receive new image).
The proposed processor have the ability to process each sample without completion of the column because it makes only the first step of the convolution and store the sample until the next sample come, on the other word the first step of the convolution applied onto the whole row, the second step of the convolution is applied onto the second row with the stored row (the first row) and so on. The number of the stored rows is equal to the number of the filter impulse response minus one and there is no need to store the whole image rows.

Results
We simulate the 2D DWT processor to process 256x256 images. The obtained results show the proposed structure decrease the response time of the 2D DWT processor to only the propagation delay time of two multipliers and two adders instead of the whole image receive time (65536 time unit) plus the propagation delay time of two multipliers and two adders.
The compression between the two responses time we saw that the proposed 2D DWT processor save the image receive time and that mean it save a lot of time. Figure  (7) illustrate the response time of the two 2D processors.
The processing time of the image is also decreased, Figure (8) illustrate the processing time of three processors (proposed processor, the M.Sc processor and the traditional 2D processor) that process four images. The results in Figure (8) show that the processing time in proposed 2D DWT processor seem worse than M.Sc processor in the second image and so on images. The pipeline used in the M.Sc processor is making an illusion that the processing time decreased but in true the processing time still equal to the processing time of the first image. The M.Sc processor still consist from two stages one for rows and the other for columns, the columns stage still wait for completion of rows stage. The waiting problem make the M.Sc processor output the decomposition of previous image while the proposed processor output the decomposition of the current image, this makes the proposed processor useful for processing real time movies. Table (1) shows the processing times of the three processors and which of them can process movies. Table 1 show the processing time equal to the receive time (image size) plus 1534 time units. The additional delay time did not affect to the processing time of the next image because the proposed processor uses parallel processing technique. The proposed processor eliminate the storage unit to (N-1*row size) instead of the (2*decomposed row size*column size) in my M.Sc processor.

Conclusions
The parallel processing was used in the proposed processor to make the processor process real time movies with the same clock used in the M.Sc processor and traditional processor. Using parallel processing eliminates the storage number used and there is no need to use high speed components. The proposed processor was constructed from components that have propagation delay time approximately equal to sampling time this means the processor use cheapest component.