

### International Journal of Innovative Research in Computer and Communication Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: <u>www.ijircce.com</u> Vol. 6, Issue 10, October 2018

#### High Speed 2-D DWT using Modified Distributed Arithmetic and Brent Kung Adder Technique

Asmita Gupta, Prof. Rahul Shrivastava, Dr. Paresh Rawat M. Tech. Scholar, Department of EC, SISTec, Gandhi Nagar, Bhopal, India Assistant Professor, Department of EC, SISTec, Gandhi Nagar, Bhopal, India Head of Dept, Department of EC, SISTec, Gandhi Nagar, Bhopal, India

**ABSTRACT:** The DWT is expressed in a generalized form know as discrete wavelet transform which analyzes both the low and high sub bands with equal priority at every decomposition level. The DWT is a mathematical technique that provides a new method for signal processing. Due to various useful features like adaptive time-frequency window, lower aliasing distortion and efficient computational complexity, it is widely used in many signal and image processing applications. 2-D DWT is widely used in image and video compression. But flipping scheme introduces some design complexities in selected DWT structures.

So in our proposed work, we have implemented BK adder and MDA technique that provides multiplier-less implementation and also will work for every bit.

The proposed MDA and BK adder based 1-D and 2-D DWT algorithm shows good performance as compared to previous algorithm. The proposed architecture for DWT implementation reduces the chip area, less computation time and also minimizes the maximum combinational path delay.

KEYWORDS: 2-D DWT, MDA, Low-pass Sub-band (LPSB), High-pass Sub-band (HPSB), VHDL Simulation

#### I. INTRODUCTION

The Multi-Resolution Analysis (MRA) ability and time-scale region qualities of the DWT have built up it as a ground-breaking apparatus for various applications, for example, flag examination, picture pressure and numerical investigation, as expressed by Mallat (1989). This has driven various research gatherings to create calculations and equipment models to execute the DWT. DWTs are as a rule progressively utilized for picture coding. This is because of the way that the DWT bolsters highlights, similar to dynamic picture transmission, simplicity of compacted picture control, district of enthusiasm coding [1].

The intrinsic points of interest of the Discrete Wavelet Transform over different changes, similar to the DCT, DST and DHT make it reasonable for JPEG2000 pressure norms [2]. The multiresolution highlight of the wavelets defeats the blocking antiques issues in the DCT [3]. The convolutional DWT utilizes Finite Impulse Response (FIR) channel banks for actualizing sub-band deterioration. This requires higher computational many-sided quality and equipment, making it unacceptable for ongoing picture/video handling applications.

In the customary convolution technique for DWT, a couple of FIR is connected in parallel, to determine HPSB and LPSB coefficients. The models are for the most part collapsed, and can be comprehensively grouped into serial and parallel structures as talked about [4].

The normal for VLSI framework are that they offer more prominent potential for expansive measure of simultaneousness and offer a gigantic measure of processing power inside a little territory. The calculation is extremely shoddy as the equipment isn't a snag for VLSI framework [5]. Be that as it may, the non-confined worldwide correspondence isn't just costly however requests high power dissemination. In this manner, a high level of parallelism



## International Journal of Innovative Research in Computer and Communication Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: <u>www.ijircce.com</u> Vol. 6, Issue 10, October 2018

and a closest neighbor correspondence are urgent for acknowledgment of superior VLSI framework [6]. Keeping this in see, unreasonable execution utility particular VLSI structures are hurriedly advancing lately. The extraordinary reason VLSI frameworks augment handling simultaneousness by parallel/pipeline preparing and gives financially savvy other option to ongoing application. Subsequently, 2-D DWT is at present done in a VLSI framework to meet the worldly prerequisite of ongoing application. Keeping up this reality in see, various format plans have been forewarned inside the last a very long time for proficient usage of 2-D DWT in a VLSI framework. Analysts have embraced distinctive calculation definition, mapping plan, and compositional outline techniques to decrease the computational time, number juggling many-sided quality or memory multifaceted nature of 2-D DWT structures. Nonetheless, the territory defers execution of the current structures changes barely.

#### II. DISCRETE WAVELET TRANSFORM

The LL sub-band speaks to a guess of the first picture; the LPSB can be selected at fourth part of image. This procedure is rehashed for the same number of levels of disintegration as wanted. JPEG2000 format indicates four parts of disintegration, as talked about in, albeit three are typically viewed as worthy in equipment. With a specific end goal to stretch out the 1-D channel to register 2-D DWT in JPEG2000, two focuses must be considered [7].

$$Y_{l}^{j}(n) = \sum_{i=0}^{k-1} h(i)Y_{l}^{j-1}(2n-i)$$
 (1)

$$Y_h^j(n) = \sum_{i=0}^{k-1} g(i) Y_h^{j-1}(2n-i)$$
 (2)

Furthermore, impermanent outcomes should be put away, which are produced by the 2-D segment channel. The measure of the outer memory get to, and the territory possessed by the implanted interior cradle, are viewed as the most basic issues for the execution of the 2-D-DWT. Be that as it may, the interior cushion would possess a vast zone and expend a lot of intensity. In the divisible strategy, the coveted channel coefficients at each level are subject to the past yield level, and this presents postponement or dormancy in DWT decay.



Figure 1: Two Level Diagram of Discrete Wavelet Transform

#### III. PROPOSED ARCHITECTURE

The flow chart of proposed algorithm is shown in figure 2. In this stream graph, the double information is connected to the serial in serial out register. All whole numbers connected to the twofold frame in DWT design. Parallel information



### International Journal of Innovative Research in Computer and Communication Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: <u>www.ijircce.com</u> Vol. 6, Issue 10, October 2018

is relying upon the word length i.e. assume word length of the twofold info (3 down to 0) implies the information go is 0 to 15.



Figure 2: Block Diagram of 2-D DWT

If takes the LPS coefficients  $h_0$ ,  $h_1$ ,  $h_2$ ,  $h_3$ , and  $h_4$  multiply by  $u_1$ ,  $u_2$ ,  $u_3$ ,  $u_4$  and  $u_5$  then multiplier-less 1-D DWT LPS output is

$$Y_{LPS} = \begin{bmatrix} h_0 & h_1 & h_2 & h_3 & h_4 \end{bmatrix} \bullet \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \\ u_5 \end{bmatrix}$$

Where,

$$u_1 = X(n) + X(n-8)$$

$$u_2 = X(n-1) + X(n-7)$$

$$u_3 = X(n-2) + X(n-6)$$

$$u_4 = X(n-3) + X(n-5)$$



### International Journal of Innovative Research in Computer and Communication Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: <u>www.ijircce.com</u> Vol. 6, Issue 10, October 2018

$$u_5 = X(n-4)$$

$$Y_{LPS} = \begin{bmatrix} 77 & 34 & -10 & -2 & 3 \end{bmatrix} \bullet \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \\ u_5 \end{bmatrix}$$

So,

All the LPS coefficient arranges down to up is below:

$$Y_{H} = \begin{bmatrix} 1 & 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \end{bmatrix} \bullet \begin{bmatrix} u_{1} \\ u_{2} \\ u_{3} \\ u_{4} \\ u_{5} \end{bmatrix}$$

All rows pass through look up table and replace LPS coefficient to input

$$Y_{H} = \begin{bmatrix} 1 & 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \end{bmatrix} \bullet \begin{bmatrix} u_{1} \\ u_{2} \\ u_{3} \\ u_{4} \\ u_{5} \end{bmatrix} = \begin{bmatrix} u_{1} + u_{5} \\ u_{2} + u_{3} + u_{4} + u_{5} \\ u_{1} + u_{3} + u_{4} \\ u_{2} + u_{3} + u_{4} \\ u_{1} + u_{3} + u_{4} \\ u_{1} + u_{3} + u_{4} \\ u_{1} + u_{3} + u_{4} \\ u_{2} + u_{3} + u_{4} \\ u_{1} + u_{3} + u_{4} \\ u_{2} + u_{3} + u_{4} \end{bmatrix}$$

Let  $u_1 = 4$ ,  $u_2 = 4$ ,  $u_3 = 4$   $u_4 = 4$  and  $u_5 = 2$  and put above equation and last row value is represented by 2's complement value

Then

$$K_1 = u_1 + u_5 = 0110$$
,  $K_2 = u_2 + u_3 + u_4 + u_5 = 1110$ ,  $K_3 = u_1 + u_3 + u_4 = 1100$ ,  $K_4 = u_1 + u_4 = 1000$ ,  $K_5 = u_3 + u_4 = 1000$ ,  $K_6 = u_2 + u_3 + u_4 = 1100$ ,  $K_7 = u_1 + u_3 + u_4 = 1100$ ,  $K_8 = u_3 + u_4 = not (1000) + "0001" = 1000$ 



## International Journal of Innovative Research in Computer and Communication Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: <u>www.ijircce.com</u> Vol. 6, Issue 10, October 2018

All  $K_1 to K_8$  value passed through sign extension block then

 $K_1 = u_1 = 00110$ ,  $K_2 = u_2 + u_3 + u_4 + u_5 = 01110$ ,  $K_3 = u_1 + u_3 + u_4 = 01100$ ,

 $K_4=u_1+u_4=01000,\,K_5=u_3+u_4=01000,\,K_6=u_2+u_3+u_4=01100,$ 

 $K_7 = u_1 + u_3 + u_4 = 01100, K_8 = u_3 + u_4 = not (01000) + "00001" = 11000$ 

 $K_1$  is left shift one bit and add  $K_2$  and store output  $Y_1$ 

= 0.00110

+01110

 $Y_1 = 100010$ 

Y<sub>1</sub> is left shift one bit and add K<sub>3</sub> and store output Y<sub>2</sub>

= 0'100010

+ 0 1100

 $Y_2 = 1010010$ 

Y<sub>2</sub> is left shift one bit and add K<sub>4</sub> and store output Y<sub>3</sub>

= 0'1010010

+ 0 1000

 $Y_3 = 10010010$ 

 $Y_3$  is left shift one bit and add  $K_5$  and store output  $Y_4$ 

= 0'10010010

+ 01000

 $Y_4 = 1 \ 00010010$ 

 $Y_4$  is left shift one bit and add  $K_6$  and store output  $Y_5$ 

= 0'100010010

+ 0 1100

 $Y_5 = 1 \ 010010010$ 

Y<sub>5</sub> is left shift one bit and add K<sub>7</sub> and store output Y<sub>6</sub>

= 0.10100100010

+ 0 1100

 $Y_6 = 1 \ 0110010010$ 

Y<sub>6</sub> is left shift one bit and add K<sub>8</sub> and store output Y<sub>7</sub>

= 0'10110010010

+ 1 1000

 $Y_7 = 10000110010010$ 

Final output  $Y_{LPS} = Y_7 = (0000110010010)_2$  (Carry Reject)



### International Journal of Innovative Research in Computer and Communication Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: <u>www.ijircce.com</u> Vol. 6, Issue 10, October 2018



Figure 3: Block Diagram of 9/7 Wavelet Coefficient based Discrete Wavelet Transform

#### IV. BRENT KUNG ADDER

In previous design, Kogge stone adder was used for designing DWT but it was having the drawback that it was not working for every bit [8, 9]. So to remove this drawback, we have used BK adder which is an advanced binary adder. Its advantage is that it reduces the cost and the complexities of wire and is much quicker than Ripple Carry adder. So it provides better performance and less area to implement in comparison to KS adder. The main advantage of BK adder is that it works on every bit and consumes less space which was the main problem of previous design.



Figure 4: Block Diagram of Brent Kung Adder

#### V. SIMULATION RESULT

Synthesis result of the 1-D DWT and 2-D DWT using MDA and BKA Technique is shown in this chapter. In this topic, it explains the RTL view, hardware utilization, synthesis utilization, VHDL test bench, and comparison of 1-D and 2-D DWT architecture for existing architecture. 1-D DWT architecture is consisted of shift registers, different bits of adder and wavelet coefficients i.e. HPSB and LPSB.



# International Journal of Innovative Research in Computer and Communication Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: <u>www.ijircce.com</u>
Vol. 6, Issue 10, October 2018

This figure 5 shows the RTL view of first level DWT. This view includes the shift registers, BK adder, D-flip flops and all its components. First the input passes through D-flip flops and then symmetrically addition is performed through BK adder and then after using MDA technique, the final output comes.



Figure 5: RTL for 1-D DWT

This figure 6 shows the waveform of first level DWT. Here 'e' is having the input '0011' which is of four bit. After all the operation is performed, finally the output comes of 12 bit. The output for high pass filter is '000011000000' and the output for low pass filter is '101001011011'.



Figure 6: VHDL Test-bench in 1-D DWT



# International Journal of Innovative Research in Computer and Communication Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: <u>www.ijircce.com</u> Vol. 6, Issue 10, October 2018

Table I: 1-D DWT using MDA and Brent Kung Adder Technique Utilization

| 4vfx12sf363-12             | Used         | Available |
|----------------------------|--------------|-----------|
| Number of Slice            | 150          | 5472      |
| Number of Slice Flip Flop  | 32           | 10944     |
| Number of 4 input LUTs     | 254          | 10944     |
| Minimum Period             | 0.678 nsec   |           |
| Maximum Frequency          | 1474.274 MHz |           |
| Maximum Combinational Path | 17.316 ns    |           |
| Delay                      |              |           |

This figure 7 shows the RTL view of second level DWT. It has all the components of 2-D DWT. It contains all the shift registers, D-flip flops, BK adder. This RTL schematic depends on the view technology.



Figure 7: RTL for 2-D DWT



## International Journal of Innovative Research in Computer and Communication Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: <u>www.ijircce.com</u> Vol. 6, Issue 10, October 2018

This figure 8 shows the waveform of the second level DWT. Here the input is given as '0011' and the output finally comes for both the filters. 'yh' is '1111101000000000000' for high pass filter output and 'yl' is '000001100000000000' for low pass filter output.



Figure 8: VHDL Test-bench in 2-D DWT

Table II: 2-D DWT using MDA Technique and Brent Kung Adder Technique

| 4vfx12sf363-12             | Used        | Available |  |
|----------------------------|-------------|-----------|--|
| Number of Slice            | 514         | 5472      |  |
| Number of Slice Flip Flop  | 224         | 10944     |  |
| Number of 4 input LUTs     | 899         | 10944     |  |
| Minimum Period             | 8.741 nsec  |           |  |
| Maximum Frequency          | 114.409 MHz |           |  |
| Maximum Combinational Path | 17.905 ns   |           |  |
| Delay                      |             |           |  |

In this table 3, when it comes to 1-D DWT in case of 'number of slices' the proposed design is 56.6% better than the previous design, in case of 'number of flip flops' the proposed design is 33.3% better than the previous design, in case of 'number of LUTs' the proposed design is 29.2% better than the previous design, in case of 'maximum delay' the proposed design is 25.19% better than the proposed design. Similarly when it comes to 2-D DWT, the proposed design is much better than the previous design

Table III:
Comparison result of Existing Algorithm and Proposed Algorithm

| Parameter     | 1-D DWT  |          | 2-D DWT  |           |
|---------------|----------|----------|----------|-----------|
|               | Previous | Proposed | Previous | Proposed  |
|               | Design   | Design   | Design   | Design    |
| Number of     | 346      | 150      | 752      | 514       |
| Slice         |          |          |          |           |
| Number of     | 48       | 32       | 454      | 224       |
| Slice Flip    |          |          |          |           |
| Flop          |          |          |          |           |
| Number of     | 359      | 254      | 1393     | 899       |
| LUTS          |          |          |          |           |
| Maximum       | 23.146   | 17.316   | 28.998   | 17.905 ns |
| Combinational | ns       | ns       | ns       |           |
| Path Delay    |          |          |          |           |



### International Journal of Innovative Research in Computer and Communication Engineering

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Website: <u>www.ijircce.com</u> Vol. 6, Issue 10, October 2018

#### VI. CONCLUSION

The application of DWT has been seen in image processing, signal coding, audio and video processing, pattern recognition etc. DWT contains various techniques but here multiplier-less based technique is used as in previous design the same technique was used. For this MDA technique is used which provides approach for multiplier less implementation. It contains adder, shift registers and free of multiplier.

In previous design, Kogge stone adder was used for designing DWT but it was having the drawback that it was not working for every bit. So to remove this drawback, we have used BK adder which is an advanced binary adder. Its advantage is that it reduces the cost and the complexities of wire and is much quicker than Ripple Carry adder. So it provides better performance and less area to implement in comparison to KS adder. The main advantage of BK adder is that it works on every bit and consumes less space which was the main problem of previous design.

Finally we have designed the 1-D and 2-D DWT using BK adder and MDA technique which provide better efficiency and shows better results than the previous design.

#### REFERENCES

- [1] Samit Kumar Dubey, Arvind Kumar Kourav and Shilpi Sharma, "High Speed 2-D Discrete Wavelet Transform using Distributed Arithmetic and Kogge Stone Adder Technique", International Conference on Communication and Signal Processing, April 6-8, 2017, India
- [2] Maurizio Martina, Guido Masera, Massimo Ruo Roch, and Gianluca Piccinini, "Result-Biased Distributed-Arithmetic-Based Filter Architectures for Approximately Computing the DWT", IEEE Transactions on Circuits and Systems—I: Regular Papers, Vol. 62, No.8, and August 2015.
- [3] S.G. Mallat, "A Theory for Multiresolution Signal Decomposition: The Wavelet Representation", IEEE Trans. on Pattern Analysis on Machine Intelligence, 110. July1989, pp. 674-693.
- [4] M. Alam, C. A. Rahman, and G. Jullian, "Efficient distributed arithmetic based DWT architectures for multimedia applications," in Proc. IEEE Workshop on SoC for real-time applications, pp. 333 336, 2003.
- [5] X. Cao, Q. Xie, C. Peng, Q. Wang and D. Yu, "An efficient VLSI implementation of distributed architecture for DWT," in Proc. IEEE Workshop on Multimedia and Signal Process., pp. 364-367, 2006.
- [6] Archana Chidanandan and Magdy Bayoumi, "Area-Efficient MDA Architecture for the 1-D DCT/IDCT," ICASSP 2006.
- [7] M. Martina, and G. Masera, "Low-complexity, efficient 9/7 wavelet filters VLSI implementation", IEEE Trans. on Circuits and Syst. II, Express Brief vol. 53, no. 11, pp. 1289-1293, Nov. 2006.
- [8] M. Martina, and G. Masera, "Multiplierless, folded 9/7-5/3 wavelet VLSI architecture," IEEE Trans. on Circuits and syst. II, Express Brief vol. 54, no. 9, pp. 770-774, Sep. 2007.
- [9] Gaurav Tewari, Santu Sardar, K. A. Babu, "High-Speed & Memory Efficient 2-D DWT on Xilinx Spartan3A DSP using scalable Polyphase Structure with DA for JPEG2000 Standard", 2011 IEEE.