# International Journal of Innovative Research in Computer and Communication Engineering 

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Issue 8, August 2016

# Area and Power Efficient Multiplier Design Using Bz-Fad 

Shabbir Hassan Shaikh ${ }^{1}$, N.Manikanda Devarajan ${ }^{2}$, G.Ramachandran ${ }^{3}$, Dr.T.Muthumanickam ${ }^{4}$<br>II year, M.E.(EST), VMKVEC, Salem, India ${ }^{1}$<br>Associate Professor, Dept of ECE, VMKVEC, Salem, India ${ }^{2}$<br>Assistant Professor, Dept of ECE, VMKVEC, Salem, India ${ }^{3}$<br>Professor \& Head, Dept of ECE, VMKVEC, Salem, India ${ }^{4}$


#### Abstract

A low power structure considerably lowers the switching activity of conventional multipliers. The modifications to the conventional multiplier which multiplies A by B includes the removal of the shifting the B register, direct feeding of A to the adder, bypassing the adder whenever possible, using a ring counter .A low-power structurefor shift-and-add multipliers called BZ-FAD (Bypass Zero, Feed ADirectly) is proposed. The architecture consbinary counter and removal of the partial product shift. The architecture makes use of a low-power ring counter.The proposed multiplier can be used for low-power applications where the speed is not a primary design parameter. The proposed architecture is described using HDL, Simulated in ISE simulator and synthesized using Xilinx ISE 10.1. The power simulation is done using Cadence RTL compiler.


KEYWORDS: NeuroSky Mind Wave Mobile headset, Bio-signal computer command; mind-reading device

## I. INTRODUCTION

Advances in microelectronic technology have led to more effective encoding of data, more reliable transmission of information, and more embedded intelligence in systems. In particular, to meet the increasing market demand for portable applications, these microelectronic devices consume very low power. Consequently, various digital signal processing chips are now designed with low-power dissipation. In such systems, a multiplier is a fundamental arithmetic unit.Multiplication can be considered as a series of repeated additions. The number to be added is the multiplicand, the number of times that it is added is the multiplier, and the result is the product. Each step of addition generates a partial product.For portable applications where the power consumption is the most important parameter, one should reduce the power dissipation as much as possible. One of the best way to reduce the dynamic power dissipation, hence forth referred to as power dissipation in this project work, is to minimize the total switching activity, i.e., the total number of signal transitions of the system. Lowering down the power consumption and enhancing the processing performance of the circuit designs are undoubtedly the two important design challenges of wireless multimedia and digital signal processor (DSP) applications. Because of high circuit complexity, the power consumption and the layout area are another two design considerations of the multiplier.

Multiplication is an essential arithmetic operation for common DSP applications, such as filtering and fast Fourier transform (FFT). To achieve high execution speed, parallel array multipliers are widely used. These multipliers tend to consume most of the power in DSP computations, and thus power-efficient multipliers are very important for the design of low-power DSP systems. CMOS is currently the dominant technology in digital VLSI. Two components contribute to the power dissipation in CMOS circuits. The static dissipation is due to leakage current, while dynamic power dissipation is due to switching transient current as well as charging and discharging of load capacitances. Since the amount of leakage current is usually small, the major source of power dissipation in CMOS circuits is the dynamic power dissipation. Dynamic power dissipation appears only when a CMOS gate switches from one stable state to
another. Thus, the power consumption can be reduced if one can reduce the switching activity of a given logic circuit without changing its function.

# International Journal of Innovative Research in Computer and Communication Engineering 

(An ISO 3297: 2007 Certified Organization)<br>Vol. 4, Issue 8, August 2016

## LITERATURE SURVEY:

The aim of this project is to design and implement a low power binary multiplier and implement it on FPGA Kit and also to estimate the power consumption and compare it with existing designs.

The major modifications have to do be done for the shift and add multiplier are, the removal of the multiplier shifting, eliminating feeding of multiplicand to the adder, Reduction in partial product shifting, bypassing the adder whenever possible and to use a hot block ring counter which reduces switching activity compared to a binary counter.

## II. ARCHITECTURE OF MIND READING SYSTEM

- To design and implement area and power efficient multiplier on FPGA and also to estimate the power and area requirement. The estimated power and area is compared with the existing multiplier designs. Spartan3 FPGA kit is chosen for the project implementation and power and area is estimated using Cadence RTL Compiler.
- To minimize switching activity in the multiplier by reducing the switching activities in the adder, multiplexer, counter etc.

A Binary multiplier is an electronic hardware device used in digital electronics or a computer or other electronic device to perform rapid multiplication of two numbers in binary [3]. It is built using binary adders. The rules for binary multiplication can be stated as follows
$>$ If the multiplier digit is a 1 , the multiplicand is simply copied down and represents the product.
$>$ If the multiplier digit is a 0 the product is also 0 .
For designing a multiplier circuit we should have circuitry to provide or do the following three things:
$>$ It should be capable identifying whether a bit is 0 or 1 .
$>$ It should be capable of shifting left partial products.
$>$ It should be able to add all the partial products to give the products as sum of partial products.
$>$ It should examine the sign bits. If they are alike, the sign of the product will be a positive, if the sign bits are opposite product will be negative. The sign bit of the product stored with above criteria should be displayed along with the product.
From the above discussion we observe that it is not necessary to wait until all the partial products have been formed before summing them. In fact the addition of partial product can be carried out as soon as the partial product is formed. In conventional arithmetic the trivial multiplication algorithm is the transposition in binary representation of the multiplication that we do by hand.

## III. FUTURE PLANS, LIMITATIONS

## A. ARRAY MULTIPLIER

An array multiplier is a parallel multiplier [7] which does shift and adds all at once. This multiplier is called an array because it has array of adders. An array multiplier also uses shift and adds operation as in binary multiplier but it adds the partial products parallel.

## B. WALLACE TREE MULTIPLIER

The Wallace tree multiplier is considerably faster than a simple array multiplier because its height is logarithmic in word size, not linear. However, in addition to the large number of adders required, the Wallace tree's wiring is much less regular and more complicated. As a result, Wallace trees are often avoided by designers, while design complexityis a concern to them.

Though Wallace Tree multipliers were faster than the traditional Carry Save Method [5], it also was very irregular and hence was complicated while drawing the Layouts. Slowly when multiplier bits gets beyond 32-bits large numbers of logic gates are required and hence also more interconnecting wires which makes chip design large and
slows down operating speed is used as target device for implementation. Cadence's RTL compiler is used for power and area estimation.

# International Journal of Innovative Research in Computer and Communication Engineering 

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Issue 8, August 2016

A general $\mathrm{M} x \mathrm{~N}$ parallel multiplier operates by computing the partial products in parallel and by shifting and accumulating the partial products. Switching activity is poorly correlated with the input coefficient. In particular, reducing the switching Activity of the component used in the design can minimize the power dissipation.

Among multipliers, tree multipliers are used in high speed applications such as filters, but these require large area. The carry-select-adder (CSA)-based radix multipliers, which have lower area overhead, employ a greater number of active transistors for the multiplication operation and hence consume more power. Among other multipliers, shift-and-add multipliers [2] have been used in many applications for their simplicity and relatively small area requirement

The architecture of a conventional shift-and-add multiplier, which multiplies $A$ by $B$ is shown in Figure 1.1.


Figure 3.2: Basic Shift and Add Multiplier
There are six major sources of switching activity in the multiplier. These sources, which are marked with dashed ovals in the figure, are:

- Shifts of the B register,
- Activity in the counter,
- Activity in the adder,
- Switching between ' 0 ' and A in the multiplexer,
- Activity in the mux-select controlled by $\mathrm{B}(0)$, and
- Shifts of the partial product ( $P P$ ) register.

By removing or minimizing any of these switching activity sources, we can lower the power consumption. Since some of the nodes have higher capacitance, reducing their switching will lead to more power reduction.

## C. PROPOSED ARCHITECTURE

Finally in the proposed architecture, we describe how to minimize or possibly eliminate these sources of switching activity to derive a low-power architecture, we concentrate our effort on eliminating or reducing the sources of the switching activity discussed above.

## IV. RESULTS AND DISCUSSION

In this project, a low-power architecture for shift-and-add multipliers was proposed. The modifications to the conventional architecture included the removal of the shift of the $B$ register (in $A \times B$ ), direct feeding of $A$ to the adder, bypassing the adder whenever possible, use of a ring counter instead of the binary counter, and removal of the partial product shift. The results showed an average power reduction of $30 \%$ by the proposed architecture. We also compared our multiplier with SPST, a low-power tree-based array multiplier. The comparison showed that the power saving of BZ-FAD was only $6 \%$ lower than that of SPST whereas the SPST area was five times higher than that of the BZ-FAD. Thus, for applications where small area and high speed are important concerns, BZ-FAD is an excellent choice. Additionally we proposed a low-power architecture for ring counters based on partitioning the counter into blocks of flip flops clock gated with a special clock gating structure the complexity of which was independent of the block sizes.

# International Journal of Innovative Research in Computer and Communication Engineering 

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Issue 8, August 2016

The simulation results shows that in comparison with the conventional architecture, the proposed architecture with 45 nm technology reduces the area by $65 \%$ with power consumption of $97.24 \mu \mathrm{~W}$.

The low power architecture avoids the unwanted switching activities and thus minimizes the switching power dissipation. The conventional and proposed design of 16 bit multiplier is verified using Xilinx ISE 10.1 with VHDL code and power simulation is done using cadence.

In this section, we present experimental results for the proposed ring counter and multiplier. We used Xilinx 10.1 for synthesis and Cadence RTL Simulator tool for the power simulation with the TSMC 45 nm CMOS technology. Since we have used a standard cell library for this technology, all pass-transistors have been replaced with buffers during the synthesis of the implementations.

## A. RING COUNTER

In Figure 4.1, we can see the power consumption of the conventional and Hot Block ring counters [4] of 16-, 32-, 48- and 64- bits. As seen in this diagram, the efficiency of the Hot Block architecture is more pronounced as the width of the ring counter increases; e.g. with the width of 64 bits, the conventional and Hot-Block consume $1591 \mu \mathrm{~W}$ and $389 \mu \mathrm{~W}$ respectively. The maximum power reduction is achieved for a 64 -bit ring counter with blocks of 4 flipflops, where a power reduction of $75 \%$ is achieved. Now, we estimate the area overhead of the proposed ring counter. Note that the hot block clock gating structure (Figure 4.7) can be implemented using 18 transistors, which include 10 for the resettable latch, 4 for the multiplexer, and 4 for the NAND gate. As mentioned earlier, the multiplexers were implemented with transmission gates. Each flip-flop needs 18 transistors and hence for a block size of $f$ ( $f$ flip-flops) the area overhead of the hot block clock gating structure, in terms of the number of transistors is.

(a)

# International Journal of Innovative Research in Computer and Communication Engineering 

(An ISO 3297: 2007 Certified Organization) Vol. 4, Issue 8, August 2016

(b)

Fig 4.1: (a) Power consumption of the conventional ring counter versus that of the Hot-Block ring counter with different block sizes (b) The area overhead for different block sizes.

The area overhead shown in Fig 6.1(b) is dependent on the block size such that as the block size increases the area overhead decreases. The larger the block size, higher the power consumption. The critical path of the Hot Block architecture is the same as that of the conventional architecture except that the clock signal in the Hot Block passes through a NAND gate.

## B. MULTIPLIER

To evaluate the efficiency of the proposed architecture, we implemented three different Radix-2 16-bit multipliers corresponding to the conventional, BZ-FAD and SPST [6] architectures. The SPST (Spurious Power Suppression Technique) architecture is a very low-power tree-based array multiplier. In general, array multipliers offer high speed and low power consumption. However they occupy a lot of silicon area. To determine the effectiveness of the power reduction techniques discussed, we have reported in Table 3 the switching activities of major common blocks of the BZFAD and conventional multipliers. As an example, the adder in BZ-FAD has $38.16 \%$ less switching activity compared to that of the conventional architecture.

Table 3: Comparison of the transition count of the BZ-FAD and conventional multiplier for common components (When applying a subset with 100 operand pairs).

| Component | BZ-FAD | Conventional | Reduction |
| :--- | :--- | :--- | :--- |
| Lower order <br> Partial <br> Products | 6564 (latches) | 82208 (register <br> B) | $92.02 \%$ |
| Adder | 46301 | 74870 | $38.16 \%$ |
| Multiplexer | 56722 | 10013 (mux) | $-82.35 \%$ |
| Counter | 20965 | 22937 | $8.60 \%$ |

ISSN(Online): 2320-9801
ISSN (Print): 2320-9798

## International Journal of Innovative Research in Computer and Communication Engineering

(An ISO 3297: 2007 Certified Organization)
Vol. 4, Issue 8, August 2016

(a)

(b)

Fig 4.2: Comparison of the multipliers in terms of (a) power consumption (b) area.


Fig 4.3: Power reduction and area overhead of BZ-FAD and SPST in comparison with the conventional shift-and-add multiplier.

## V. CONCLUSIONS

The area overhead of the proposed architecture can be minimized by using a serial to parallel converter instead of Plow (latch) but with the area overhead of slightly greater than that of proposed architecture

# International Journal of Innovative Research in Computer and Communication Engineering 

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Issue 8, August 2016

## REFERENCES

[1]M. Mottaghi-Dastjerdi, A. Afzali-Kusha, and M. Pedram " BZ-FAD: A Low-Power Low-Area Multiplier Based on Shift-and-Add Architecture"IEEE Transactions On Very Large Scale Integration (VLSI) Systems, VOL. 17, NO. 2, EBRUARY 2009.
[2]C. N.Marimuthu, Dr. P. Thangaraj, Aswathy Ramesan "LOW POWER SHIFT AND ADD MULTIPLIER DESIGN". International Journal of Computer Science and Information Technology, Volume 2, Number 3, June 2010
[3]"Design and Implementation Of Different Multipliers Using VHDL", Moumita Ghosh Department of Electronics and Communication Engineering National Institute of Technology, Rourkela 2007.
4] Mohammad Dastjerdi-Mottaghi , Anahita Naghilou, Masoud Daneshtalab "Hot Block Ring Counter: A Low Power Synchronous Ring Counter" The 18th International Confernece on Microelectronics (ICM) 2006.
[5]B. Parhami, "Computer Arithmetic Algorithms and Hardware Designs", 1st ed. Oxford, U.K.: Oxford Univ. Press, 2000.
[6] K.-H. Chen and Y.-S. Chu, "A low-power multiplier with the spurious power suppression technique," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 7, pp. 846-850, Jul. 2007.
[7]J. S.Wang, C. N. Kuo, and T. H. Yang, "Low-power fixed-width array multipliers," in Proc. IEEE Symp. Low Power Electron. Des., 2004, pp. 307-312.
[8]Charles H Roth, "Digital System Design using VHDL", The University of Texas at Austin.

