# Low power and area efficient design of fir filter using enhanced clock gating

# technique

DOI:10.36909/jer.11307

# L Mohana Kannan<sup>\*</sup>, D Deepa<sup>\*\*</sup>

\*Department of Electronics and Communication Engineering, RVS Technical Campus (Anna University) Coimbatore, India.

\*\*Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology Coimbatore, India.

<sup>\*</sup>Email: <u>mohancalls2018@gmail.com</u>

## ABSTRACT

The main aim of this approach is to improve the design model of filters for optimal circuit design. The objective of this proposed method is to improve the performance of VLSI circuit like area, power, and delay. In recent days, the filters are most applicable designs in DSP, medical diagnosis and arithmetic computations. In Digital Signal Processing and communication applications, the FIR filter plays an important role. The Finite Impulse Response is designed with number of adders, multipliers, subtraction units, transfer functions and delay elements. The VLSI circuits are applied in various applications, but the number adders and multipliers occupy the design space since it increases the area and delay factors. The main aim is to reduce the number of adders and multiplier by various computational algorithms. The existing research work uses carry save accumulator with ripple carry adder and binary multiplier. In proposed method, the enhanced Vedic multiplication logic and improved carry lookahead adder logic improves the result. In Vedic multiplication algorithm, the number of adder logic is minimized by adding speculative Brent-kung adder logic in it. The fastest adder in VLSI circuit is CLA (Carry look

ahead adder logic), which is improved by utilizing the result of reduced power consumption and delay. In this proposed research work, the power optimization is done by using enhanced clock gating technique. Here, area, power, and delay factors are measured and it is compared with conventional FIR filter design. The proposed method improves the result in the way of area, power, and delay. The whole FIR filter structure is designed and power optimized by connecting with an enhanced clock gating technique. This proposed design and simulate by using Xilinx ISE 14.5 and it is synthesize by ModelSim.

**Keywords**: FIR filter structure, Enhanced Vedic multiplication logic, Improved carry look ahead adder, Enhanced clock gating technique, latch selection by active clock, usage of sub adder blocks.

## INTRODUCTION

In recent days, the VLSI design circuits are used for many real time applications such as medical research, signal processing and image processing. The filters are widely used module in image processing and DSP applications. The digital filters are design by calculating input response, impulse response and output response. After that, the design realizes the filter structure and performs finite word length effect for implementation. The filter design measures some specific parameters that are filter type, design methodology, order of the filter and range of frequency. The filters having two types; IIR and FIR filters. The FIR filters are the stable filter used in many real time applications. This filter designs are used widely in applicable in various applications such as wireless communications, audio, image & video processing. The gate level FIR filter uses genotype gate array of logic cells. The finite impulse responses are used in medical industries for image processing and filtering applications. The genetic algorithm is also used in the gate level implementations for mutating the gene in genotype cells. Based on the probability function, the filter is designed with effective logic cells. In this design the adders, multipliers and delay elements plays a vital role. Number of adder usage in VLSI circuit makes the system's in-efficiency. The FIR filters are design with the help of EDBNS system [1]. Here the multiple constant multiplications (MCM) are used with power of b-generator and the power of 'b' selector. The parallel linear-phase FIR filter is

designed using fast FIR filter by odd length approach is given in [2]. The parallel structure design reduces the delay and power consumption. In this design, the CSA is applied on all the logic blocks utilized on the VLSI circuit. By selecting efficient adder and multiplier blocks will increases the system's efficiency. Jeong-Ho Han et. al. [3] is modelled the design of FIR filter using synthesis algorithm for mapping the coefficient graph. This dependency graph algorithm reduces the delay time by considering the coefficient values of each stage. Multiple driven techniques used for reconfigurable FIR filters [4]. Here the 28nm technology FD-SOI test chip is used to test the CMOS circuits and measures the threshold value near the supply voltage of 600mV. This work uses two kind of multiplication topology that is radix-2 Baugh-Wooley (BW2) and radix-4 booth-recoded (BR4) multipliers. Here the MAC based FIR filter uses power optimization technique. The digital multipliers are designed by various topologies based and the representation and requirement. Here the vector-merging adder (VMA) is used in both BW2 and BR4 multiplier. The reconfigurable and fixed applications uses this [5] FIR filter, which is designed by BK Mohanty and PK Meher. The transpose form of the FIR filter design is constructed and applied techniques used on pipelined structure and MCM technique. The transpose form of construction is not directly support filter and its coefficient selection unit; therefore, the design adopts direct form FIR filter design. The performance result is achieved by the mentioned parameters such as, area-to-delay product and Energy Rate per Samples. Various applications preferred the large order FIR filter to meet the desired performance with conditional frequency range. The adder circuits are playing an important role in FIR filter structure. Because it consumes more area, power and delay. The objective is to reduce those factors via various advance techniques and algorithms in VLSI design. Since the optimized circuit will produce less power, area and delay therefore the system has its efficiency high. The booth multiplication technique is used in MAC based FIR filter structure [6]. The adder block uses shift and add method for reducing number of stages to calculate the result. Low power MAC design is employed for FIR filter design [7]. Optimized adder logic in FIR filter is designed in [8]. Clock gating technique with AND logic implementation is presented in [9] and the Stochastic Computing in FIR filter is presented in [10].

The FIR filter is linear time invariant system. The input p(n) and output q(n) uses the impulse response of l(n). The convolution process is expressed for FIR filter that is,

$$q(n) = p(n) * l(n) \tag{1}$$

The input of the FIR filter is realized with the future value that is given below.

$$y(t) = \sum_{k=0}^{N-1} u(k) \cdot x(k-t)$$
(2)

Where, u(k) is the impulse response with weight factor k. The performance speed of the FIR filter is realized with the multiplier logic model and the number of used adder logic improves the performance of the multiplier and the filter. The carry look-ahead logic is the high-speed adder logic, which improves the performance of adder by calculating with the values of carry bits [11] to [13]. Vraious parallel prefix adders are reviewed in [14] to [15]. The parallel prefix adder is used for error rectification and correction method [16]. Carry speculative adder with VLSI design model is presented in [17] & [18]. For multiplier block, the Vedic multiplication logic is used for efficient analysis because, it partial products in the processing flow. Urdhva tirvagbhvam sutra is the most commonly used technique to design an efficient architectural flow of Vedic multiplier [19]-[25]. The power dissipation is reduced by using traditional approach of clock gating techniques. Based on the clock and enables to the latch circuit, the power dissipation at the point is reduced. Gated clock is act to reduce the power dissipation range. The flexibility of the ASIC design is improved by adding efficient adders and multipliers. The VLSI circuits are established in the FPGA design for improving the applications. Gate level and behavioural level circuit is established to designing the VLSI circuits. In this proposed research work, it focuses to improve the performance by reducing the power factor and area utility of the circuit by using efficient adder & multiplier along with the enhanced clock gating circuits. The carry look-ahead adder is employed with the PFA to utilize the better performance and reduce the delay. Here the vedic multiplier is design to operate with the speculative brent kung adder logic, which reduces the number stages to complete the operation. Since this proposed research work is efficiently designed to provide the better outcomes.

In previous work, the CSA is designed with RCA logic. CMOS adder logic is studied and comparatively select the adder with better analysis [26]-[29]. Recoding process is utilized with the booth multiplier but it increases the area by number of selection process. The floating-point multiplier is designed with the DADDA algorithm and it utilizes the maximum reference point to the floating-point value multiplication. Co-efficient of the FIR filter is used with the sample data, it is multiplied by multiplication block, and it performed with the adder unit. Finally, the DFF provides the registered data as output. Han-Carlson Adder implementation is presented in [30].

The proposed research work summarized with various sections, which are provided below. Section II reviewing the survey analysis of related work of proposed system. The existing work is described in the section III, which utilizes the traditional clock gating technique and RCA based CSA logic. The section IV presents the proposed methodology techniques and design models. This also describes the enhancement and advantage of this research work. Section V describes the result of proposed work, which discussed with the various novel techniques and compare with the existing method. Section VI concludes with the advantages and enhancement of the proposed VLSI construction. It also discusses the future work of proposed research model.

#### LITERATURE SURVEY

Jiajia Chen, et. al. (2015) has presented the design algorithm for FIR filter using EDBNS. This programmable FIR filter represented with the extended double base number systems. A common subexpression block is needed to utilize this memory unit. The direct mapping technique is used with the EDBNS for shifting the partial products. This process improves the performance of the system. Here the logical complexity is reduced by mapping technique for path delay. Quasi-minimum EDBNS is used in the coefficient register block to map the port for balancing the weight. MCM block is used to search the minimum value representations in the EDBNS. The power of b set with selector and generator is used to assign the coefficient to the FIR before performing the multiplications. After that, it performs the shifting and delaying operations. The result is evaluated for 8, 12, and 16 bit coefficients. Yu Chi., et. al (2012) have proposed the VLSI implementation of FIR filter using odd length based algorithm. Here the area efficient FIR filter is designed to process with the DSP using convolution techniques. The post-processing block performs the addition, which increases the performance of the sub-filtering module. In this, the 3x3 FFA algorithms is design to set the odd length of the filter and the parallel structure is enabling to share the subfilter model. Exchanging of multiplier is improves the circuitry because, the weight of multiplier is higher than the adder of the silicon area. The symmetric convolution is used in the FIR structure with three-parallel 591 tap filter block. The FFA and proposed system is compared for various tap of filters. Area, power, and path delay is measured to obtain the performance result.

Jeong-Ho Han et. al. (2008) has designed the FIR filter, which is modeled using nondeterministic synthesis algorithm. Here the coefficients are represented by the graph of the adder block. This dependence graph algorithm implements the coefficient by reordering technique. Temporary set and the graph model of the filter coefficient determines adder cost. The multiple adder graphs are generated by coefficient of the filter, which reduces the computation overhead. Initially, the filter coefficients are represented with the synthesized algorithm in increasing order, which selects the distance of the graph model. Here the BHM algorithm is to reduce the adder-cost and updates the sum value. The hybrid dependence graph improves the performance of the FIR filter on the coefficient registers.

Andrea Bonetti, et. al (2017) has presented the reconfigurable design of FIR filter to perform with the perturb method of multiple driven coefficient model. They propose to reduce the power by approximate computing techniques. This filter design is applicable for the IOT applications. Here the perturbation of the coefficient is performed in the baseline filters, which reduces the trade-off between the power factors. The 28nm technology is implemented to address the threshold model and it chooses the filter coefficients by perturbation configurations. The radix-4 booth multiplication is used in the filter block with carry save adder. The combination of baugh-wooley multiplier is performed with the nonzero partial product operands. The filtering quality is improved by reducing the overhead by FIR accelerator mode of operation. Based on the frequency range of FIR filter, the magnitude is calculated for baseline filter, optimized filter and intermediate filter.

Kazi J Ahmed, et. al. (2018) has presented the design of FIR filter using stochastic computing. This model is proposed with the probability model. The error is reduced by large SC probability factor. Here the numerical analysis is performed to generate the re-ordering parameters. The SC model is computed with the lower value of radix and produces the higher generated values. The fast stochastic system for FIR filter is computed to reduce the numerous stages using probability function. Band pass FIR filter with 12-tap module is implemented to construct the 8-band system. the reordering scheme is employed to swap the parameter to get the higher order values.

Pravin Y Kadul, et. al. (2014) has described about the design model of FIR filter with multiple adders and multipliers block. The Xilinx FPGA design is to construct MAC unit based FIR filter. Here the vedic multiplier block utilizes the linear device to reduce the partial products and the CSA will avoids the error from the addition block. The switching power dissipation is reduced by speed up the FIR performance. This implementation utilizes analysis of timing, power, and resource utilization report. Signed array multiplier, booth multiplier with and without DPDT is analyzed to determine the performance of the system.

N C Sendhilkumar, (2017) has proposed the design of digital FIR filter using enhanced wallace tree multiplication approach using CLA logics. Ere the MAC unit is employed to process with the FIR filter. 16-Bit CLA is designed with the reduced number of usage of multiplication block, which utilizes Wallace tree multiplier. OR gates are connected in sequential order to generate the carry value from the adder block. In this, the parallel multiplier is designed with the full adder block for performing along with the partial product. Each multiplier block is assign with the 8-bit coefficients and it reduces the complexity to assign the bit to the node.

Swetha Kumari, et. al (2014) have proposed the design model of FIR filter block using MAC unit, which reduces the power dissipation. Here the performance of latch-based clock gating technique is employed to optimize the FIR filter structure from the power dissipation. MAC unit utilizes the

pipeline architecture for reducing the power dissipation. Here the glitch reduction is done with the clock gating circuit. Spartan 3E and Cyclone FPGA implementation is done for the digital FIR filter. Here the result is analyzed for original FIR filter, latch based model, pipelined structure, and MAC based filter.

T Radha, and M Velmurugan, (2015) has proposed the design model of FIR filter for optimized circuits of adder and multiplier. Hybrid adders and multipliers are employed to perform efficient process of FIR filter. These adders are performed with the carry input and the multiplication performed with the hierarchical manner of operation. By selecting, the number of taps in the FIR makes the device to perform with the stop band attenuation. Here the windowing function utilizes the frequency response analysis and the PDP is utilized to optimize the power and delay factor.

A Ranga N, et. al. (2016) has presented the clock gating technique for reducing the power dissipation on various VLSI designs. This technique uses the sub-word based signal's range matching approach. SPICE power analyzer is used to implement logic and it is applicable for the various VLSI circuits. Most of the low power VLSI technology utilizes the clock gating technique. in real time basis, the signal correlation is performed in the FSM for realizing the performance. in resultant section, the power analysis report is added to added for various techniques to compare the best performance analysis in the FPGA level implementations.

Bhawana Datwan and Himanshu Joshi, (2016) have proposed the FIR design with high speed and low area performance. This approach implemented in DSP applications and it optimizes the circuit for reducing the area utilization. The FIR filter is constructed using RCA; CSA, and CLA. Here the multiplier block utilized array multiplier, wallace tree, Radix-2 model booth multipliers. From this, the parameter of multiplier variable and multiplicand performs the partial product using booth encoder. After it performing with the Wallace structure of number of compressor blocks. Finally, it performing the CLA and provides the output. Area and delay is reduced in the radix-4 booth multiplier logic. It implements, both radix-2 and radix-4 multiplier with Wallace structure.

Rose Ann Mathew, et. al (2016) have presented the study analysis of vedic multiplier using different adder circuits. Here 2-bit and 4-bit multiplier is designed to study the performance analysis of

#### Journal of Engg. Research Online First Article

vedic multiplication with the concept of urdhva tiryagbhyam sutra. Here the partial element is reduced by comparing other methods and it calculated the multiplier's speed for enhancing the result of VLSI design. Most popularly used sutra to run Vedic mathematics is urdhva-tiryagbhyam and nikhilam sutra. The Vedic multiplier uses the half adder logic after performing the partial products. Various analyses are carried out with author and the BEC based CSA is designed to optimize the result.

Subha Jeyamala and Aswathy B.S, (2016) have proposed the Han-Carlson method and it is enhancement for adder block in VLSI circuits. Here the 16x16 parallel prefix adders designed and it provides the carry generator, and carry propagator. This parallel prefix adder is compared with the simple arithmetic adder for enhancing the performance result. This proposed adder logic provides the fast carry generation technique. The logic level of kogge-stone logic is structured with prefix-adder tree model. The Han-Carlson and Brent –Kung adders are designed to comparing the performance result analysis and error correction stage. The prefix processing stage of adder is used for error detection and correction stage, which increases the fanout. In this, the SPARTAN 3E implementation is done for the kogge-stone implementations. By comparing with the RCA, the kogge-stone adder and Han-Carlson achieves the result.

B Naga Jyothi, et. al (2016) have proposed the prefix- adder for error correction and detection application of VLSI circuits. This parallel prefix adder is sub divided as different sections to perform the number of operations. This adder logic reduces the design complexity and delay. 8 bit and 16-bit adder logic designed for these implementations. The variable latency adder improves the overall performance of the parallel prefix logic. The pre-processing stage utilizes the generation and propagation of carry bit and it computed with different stages. The post-processing stage utilizes the computation of sum logic.

### **EXISTING METHOD**

Analyzing with the above survey of the proposed related works, the FIR filter is designed with various techniques and provides the result. The implementations are also done with the FPGA modules. In previous research work, the RCA based Carry save accumulator is design for the adder logic of FIR

filter. Full adder is utilized for most of the adder logic as sub-blocks. In this section, the RCA based CSA and clock-gating technique for power optimization have described.

## A. RCA based CSA

The carry save accumulator is the arithmetic adder logic used for addition in the VLSI circuits. This logical adder utilizes the ripple carry additions on middle stage of processing. The design of FIR structure utilizes the RCA based CSA logic for improving the performance. However, the parallel prefix adders are used nowadays to improve the efficiency. This adder performs the addition of 2-bit of numbers from lower significant bit to higher significant bit. Here the RCA ripples the carry bit to the CSA block for performing the addition. It assigns the set of bits as input and the carry is sent through the RCA. Finally, the sum is performed by the CSA. The hybrid adder logic obtains the better outcome but the speed is low when compared to other adders. The analysis of hybrid adder, and prefix adder designed are increasing the area. Carry based adder performance requires more review for application availability. So, the novel technique designed with the consideration of power, area, delay and overall result of FIR.



Fig1. Conventional RCA based CSA block

The existing block diagram of RCA based CSA block is given in the figure 1. The output of the two bits in the same sequence bit is the sequential sum of partial sum bits. Here the DFF is used as a register of the input and it stores the carry bits in sequential manner. This digital adder having the ladder

of the sub-adder block and it performs with the 0 or 1 in the RCA and it sends to the MUX block. It takes the input from the D-FF and performs the RCA to generate & propagate the carry bit; then it performs the CSA performance. The result of this logic is improved by using proposed method.

## B. Power dissipation reduction in Clock gating

The clock gating is the technique of power optimization technique used in various VLSI circuits. The clock is enabling with interconnected gated circuit for reducing the power dissipation while performing the operation. The gated clock of the D-FF is connected with the controlling unit, which has enabled and clock terminals.



Fig2. Traditional Clock gating circuit

The clock gating circuit is given in the fig2. This integrated clock gating circuit will neglects the large number of MUX and it provides the gated clock logic for reducing the design complexity. After getting the enable signal from the controlling block, the D-FF will act to provide the gated signal. If the enable is high, the gated clock=clock. The DFF gives the positive edge triggering of the clock and it changes into the states based on the enable signal of controlling block. The controlling block uses either the OR gate, AND gate and other logic. The existing work, the design of FIR filter circuit reduces the power dissipation through the interconnected gated clock.

#### **PROPOSED METHODOLOGY**

The proposed FIR filter design utilizes the enhanced clock gating circuits for reducing the power dissipation. In this, the CLA logic block is the fastest adder in the synchronous circuits it is enhanced by adding the partial full adder block. Here this enhancement makes the system efficiency and improves reliable performance result. The predefined value of the filter coefficient is assigned in the inbuilt RTL

code. Here the 16-bit FIR filter is designed to compare with the previous work. In this proposed method, the enhanced vedic multiplier is improved by adding the speculative Brent-Kung adder logic and the adder block of the FIR filter is designed with the PFA based CLA logic. Overall, the proposed FIR filter structure is enhanced by reducing the power dissipation through the enhanced clock gating circuit. The RTL view of terminally specified FIR filter function is given in fig3.



Fig3. RTL block of proposed 16-bit FIR filter design

Here the construction of FIR filter with 16-bits of order block and it is tuned by using clock signal and enable signal of the enhanced clock gating technique. Here the gated clock reduces the power dissipation and the output of the filter is mention as 'dataout'. In between this, the enhanced vedic multiplier and PFA based CLA is used to perform the operation of FIR filter. The internal block of the proposed FIR filter is designed with the enhanced adders and multiplier and it is improved by using the adaptive clock gating (see fig4) technique.



Fig4. Internal structure of proposed FIR filter using Enhanced clock gating technique

Here the FIR designs specified to check the result of random data and the adder logic of FIR and multiplier logic in FIR maintains the overall result. With the use of brent kung adder and speculative adder, the vedic multiplier improving the performance by reducing the much sub-adder logics. Also, the CLA performance selects the PFA with effective design utility. The RTL technological schematic shows the internal blocks of FIR design and its processing flow is given to construct the design. By comparing with the proposed 16-bit FIR using E-CGT technique and the existing FIR filter design, the proposed method achieves the result of area utilizations, power consumptions and delay unit.

## C. FIR Filter structure

The design of FIR filters with 16x16-bits of process, which is enhanced with the additional block of enhanced clock gating technique. The input, impulse response, and the clock signal enable the FIR to perform its operation. The gated clock signal and the output of the filter is secondary block terminals. Direct form FIR filter is constructed as straightforward structure. The transfer function of FIR filter is given as,

$$H(z) = k \sum_{n=0}^{N} h(n) Z^{-n}$$
(3)

The gain is multiplied with the sum of (o to n) 'n' valued product with the value of impulse response and filter coefficient. Here the 16-bit direct-form construction of FIR filter uses enhanced vedic multiplier and PFA based CLA logic. Here the direct form 16-Bit FIR structure is given in the fig5.



Fig5. Direct form FIR filter structure

Here the 16<sup>th</sup> order of FIR filter is constructed with number of multipliers and adder block. The much area occupation and the larger delay is affected due to this problem since the proposed research work utilizes the PFA based CLA for adder block and the enhanced vedic multiplier for multiplication block. In this, the enhanced multiplication block having the number of adders, this utilizes the speculative brent-kung adder. Here the input alignment of linear-phase FIR filter is observed and it is given as,

$$y(n) = k0 x_0(n) + k1x_1(n) + k2x_2(n) + \dots + k15x_{15}(n)$$
(4)

The zeros and poles construction is formed with the complex values. The linear phase is reduced with the multiplication block since it reduces the unwanted processing stage. The frequency response is given as,

$$H(f) = \sum_{n=0}^{N-1} h(n)e(-ifn)$$
(5)

The frequency 'f' is the added function of 'k' and 'a'. The equation can be expressed as,

$$H(kt+at) = h(\frac{2pi}{m}(kt+at))$$
(6)

From equation (6), the equation (5) is expressed as,

$$H(f) = \sum_{n=0}^{N-1} h(n) e(-\frac{2pi(kt+at)n}{N})$$
(7)

Where, k = 0 to N-1

$$H(n) = \frac{1}{N} \sum_{k=0}^{N-1} h(kt + at) e(\frac{i2pi(kt + at)n}{N})$$
(8)

By taking the Z – transform, the above equation can be derived as follows,

$$H(z) = \sum_{n=0}^{N-1} \left[ \frac{1}{N} \sum_{k=0}^{N-1} H(kt + at) e\left( \frac{i2pi(kt+at)n}{N} \right) \right] Z^{-n}$$
(9)

$$H(z) = \sum_{k=0}^{N-1} h(kt + at) \left[\frac{1}{N} \sum_{n=0}^{N-1} e\left(\frac{i2pi(kt+at)}{N} Z^{-1}\right)^n\right]$$
(10)

The frequency sampling technique is realized with the desired response of the FIR filter and it modeled with the impulse of h(n). Here, the positive coefficient performed with the zeros system and the negative order filter coefficient is performed with the poles system. The filter order n is a single delayed response of tap coefficients (k-a). Here the D-FF is used in the register block and it assigns the data bit to the filter.

## D. Enhanced Vedic Multipier

The Vedic multiplier is the process of vedic mathematics, which is performed using the Urdhva Tiryagbhyam sutra. The enhanced vedic multiplier utilizes the speculative Brent-kung adder logic. The initial stage of input bit is processed directly through the multiplier block and other bits are delayed to process in the multiplier block. After performing the multiplication, the adder block is assed to provide the output of the FIR filter. In the enhanced vedic multiplier, the 8-bit data is processed in the two sets of 8-bit vedic block and it performs the speculative Brent-kung adder logic. The RTL schematic view of the proposed speculative Brent-kung adder logic-based vedic multiplier's design is given in fig 6.



Fig6. RTL view of Enhanced Vedic Multiplier

This research paper shows, the enhanced vedic multiplier is performed along with the speculative Brent-kung adder logic. The sub-module of vedic multiplier block takes the set of two 8-bit sequence data and it is processed with the adder block, which utilizes the speculative Brent-kung adder. Here the mentioned block "Madd" is the proposed adder, which is connected with the sub-block of vedic multiplier.

a) Vedic Multiplication Technique

The vedic mathematics is an ancient technique used for many complex value performance. A traditional vedic multiplication is applied for many applications in VLSI circuits and mathematical calculations. The Urdhva Tiryagbhyam sutra is used to construct the vedic multiplication. Initially, it process left to right. The vertical multiplication is done for first bits and performs the crosswise wise. Finally, it performs the vertical multiplication for last digits. In between this, the crosswise multiplication and addition of multiplied digits is done with the middle digits. Based on the 16 sutras, the vedic multiplication process is constructed.



Fig7. Block diagram for vedic multiplier using S-BK adder

The process flow of vedic multiplier structure using speculative Brent-kung adder logic given in the figure 7. In this, the urdhva tiryagbhyam sutra is the efficient than other sutra, which is mainly used in this process. Vertically and crosswise multiplication is performed in the process. This process reduces the partial product since it improves the performance by reducing the area and delay. Overall, the processing of the enhanced vedic multiplier increases the speed when compared with the other multipliers.

## b) Speculative Brent Kung Adder logic

The parallel prefix adder performs the Brent-kung type of adder with the speculative stage. The intermediate stage connection is less for the Brent-kung adder logic. When compared with the other adder, the wiring is less since it has less cost and it occupies lesser area. The delay of vedic multiplication process is minimized by the Brent-kung adder. The speculative stage utilizes the timing/speed of the adder logic. The Brent Kung adder achieves the result based on area, cost, and delay.

Here the speculative based adder enables the accuracy control to the processor. The RTL technological schematic view of speculative Brent Kung adder is shown in the fig8.



Fig8. Technological schematic view of speculative Brent-Kung adder

The speculation process is used to detect and correct the error while performing the operation. Brent Kung adder utilizes the speculation stage to detect and correct the error. The path delay is improved by  $O(\log n)$  in the speculation stage.

# E. PFA based Carry Lookahead Adder

The CLA is fastest adder on the synchronous circuits. Here the FIR filter is designed with the PFA based CLA logic in the end of the process (see fig9). In this, the partial full adder logic is used to reduce the delay.



Fig9. RTL schematic view of PFA based CLA logic

In this, the carry generation and propagation of CLA is performed by given expression.

$$Pi = a_i xor b_i$$
(11)  
$$Gi = a_i - b_i$$
(12)

The generation "Gi" and propagation "Pi" is performed by PFA to the CLA network. The lookahead modules in this adder are computed using carry bits after performing the PFA.

## F. Enhanced Clock-gating technique

The power reduction on various synchronous circuits utilizes clock gating technique. Most of the VLSI logic circuits use this approach to improve the performance by reducing the power dissipation. In this research work, the adaptive clock gating technique helps to reduce the heat as a form of power dissipations. The functional block of enhanced clock gating circuit is shown in the fig10.



Fig10. Enhanced clock gating technique

The power reduction process is effectively carried out by using enhanced clock gating technique. Based on the enable port, D-latch will perform and the clock is inverted to given to the latch block and non-inverting block is given directly to the OR gate, which is interconnected with the output of the latch and it inverted to give the OR gate. Here this produced gated clock signal is utilized to the FIR filter design and it reduces the power since the performance efficiency is improved.

#### **RESULT AND DISCUSSION**

Thus, the design of 16-Bit Finite impulse response is constructed with novel adders and multiplier logic circuits. Additionally, the enhanced clock gating approach helps to improve the design by reducing the dissipated power. This section describes the performance of proposed methodology and its comparison with the existing method.

## G. Simulated result of propsoed methodology

In this FIR filter, after performing the delay function the multiplication is performed with the proposed vedic multiplier circuit and the adder block utilizes the partial full adder based CLA-adder logic. The vedic multiplier is enhanced by utilizing the speculative Brent-Kung adder, which improves the speed by optimizing the path delay. In this block, the number of wiring connection is lesser than the other adder since it reduces the area utilization. Here the overall function of FIR filter (see fig11) is optimized by enhanced clock gating technique. This logic circuit is improved by adaptive logic based model. Here the inverting function of both input clock and gated clock is improves the performance for faster operation. The power reduction is possible for this type of circuits since it achieves the low power FIR filter design. Here the FIR filter block and the enhanced clock gating technique is given as with the output. It achieves the performance result by controlling the accuracy using speculative Brent-kung adder in the vedic multiplication block.



Fig11. Simulated result of proposed 16-bit FIR filter using E-Clock gating technique

Here the performance of vedic multiplier enhances the speculative Brent-Kung adder logic and it improves the performance by cost, area and path delay. Here the proposed parallel prefix adder, which is connected with the less number of wiring since it achieves the lesser are utilization. When comparing with the other adders like kogge-stone and Han Carlson adder, the proposed speculative Brent Kung adder achieves the result.



Fig12. Simulated result of enhanced Vedic multiplier

Here the simulated result shows the enhanced vedic multiplier, which is given in fig12, in which accuracy is improved by controlling the Brent Kung adder logic. The internal block of the speculative Brent Kung adder is simulated along with the vedic multiplier, which is shown in the figure 13. The data input is taken from the vedic multiplier block and processed with the carry generation "cg" and propagation "cp".



Fig13. Result of Speculative Brent Kung adder for Vedic multiplier

## H. Performance result

After analyzing all these results, the performance should be evaluated and it measures the parameters such as area, delay, and power. The device utility is shown in the table1. The area occupation is measured in this table. In this proposed method, the optimized FIR filter is designed with novel techniques for improving the performance than the existing work.

## Table1: Area occupation analysis report

**Device Utilization Summary** 

| Slice Logic Utilization                          | Used | Available | Utilization |
|--------------------------------------------------|------|-----------|-------------|
| No. of slices                                    | 107  | 4,800     | 2%          |
| No. of used memory                               | 3    | 1,200     | 1%          |
| No. of Slice LUTs                                | 100  | 2,400     | 4%          |
| No. used as logic                                | 65   | 2,400     | 2%          |
| No. of occupied Slices                           | 31   | 600       | 5%          |
| No. of MUXCYs used                               | 60   | 1,200     | 5%          |
| No. of used LUT-FF pairs                         | 59   | 112       | 52%         |
| No. of slice register sites lost to control sets | 14   | 4,800     | 1%          |
| No. of bonded In / Out -Bounded                  | 38   | 102       | 37%         |
| No. of BUF Register/Buffer_MUX                   | 1    | 16        | 6%          |

# Table2: Delay analysis report

| Parameter                            | Delay(ns) |  |  |  |  |
|--------------------------------------|-----------|--|--|--|--|
| Path delay                           | 9.313     |  |  |  |  |
| Gate Delay (clk)                     | 1.436     |  |  |  |  |
| Worst case stack(S/H)                | 0.429     |  |  |  |  |
| Best case stack (S/H)                | 1.763     |  |  |  |  |
| Timing Summary                       |           |  |  |  |  |
| Min. delay: 1.436ns                  |           |  |  |  |  |
| Input time delay before clk: 2.886ns |           |  |  |  |  |
| Output time delay after clk: 9.843ns |           |  |  |  |  |

# Table3: Power analysis report

| Supply Power        |         |            |         |  |
|---------------------|---------|------------|---------|--|
| Parameter           |         | Power (mW) |         |  |
| Dynamic Power       |         | 0.45       |         |  |
| Static power        |         | 13.69      |         |  |
| Supply Current (mA) |         |            |         |  |
| Supply Source       | Vcc_int | Vcc_aux    | Vcc_out |  |
| Supply Voltage      | 1.20    | 2.500      | 2.500   |  |

#### Journal of Engg. Research Online First Article

| Total Current     | 4.45 | 2.52 | 1.00 |
|-------------------|------|------|------|
| Dynamic Current   | 0.38 | 0.00 | 0.00 |
| Quiescent Current | 4.07 | 2.52 | 1.00 |



# Fig14. Report of Xilinx Power analyzer

# **Discussion:**

The proposed 16-bit FIR filter is designed with the enhanced vedic multiplier and the PFA based CLA logic. In this, the circuit is optimized by using an enhanced clock gating technique for reducing the power. The Xilinx power analyzer tool is used to analyze the power summary report, which is given in figure 14. This low power and fast design achieve the result. The achieved delay analysis and power analysis are reported in table 2 and with table 3 respectively. The comparison result of existing methods versus the enhanced VLSI design is shown in table 4.

| lt |
|----|
|    |

| Parameter   | Existing FIR | Existing FIR | Proposed FIR with |
|-------------|--------------|--------------|-------------------|
|             | with MCM [5] | with MCSA    | E-G_CLK           |
| Power       | 90           | 14           | 13.69             |
| Min. Delay  | 1.690        | 1.436        | 1.436             |
| Clock delay | -            | 9.843        | 8.436             |
| Path Delay  | -            | 9.703        | 9.313             |

Journal of Engg. Research Online First Article

| No. of Slice | 44   | 30  | 14  |
|--------------|------|-----|-----|
| PDP          | 1521 | 201 | 196 |

Thus, the design of the finite impulse response filter is constructed with efficient adder and multiplier logic. The speculative Brent-Kung adder logic helps to increase the speed by reducing the path delay. This comparison shows the achieved result of proposed FIR with the G\_CLK method. Therefore, the using of enhanced clock gating technique reduces the power and the speculative Brent Kung adder based vedic multiplier reduces the delay efficiently since the proposed structure is efficient than the existing work.

### **CONCLUSION AND FUTURE SCOPE**

The design of FIR filter construction is enhanced with improved techniques such as a speculative Brent-Kung adder based Vedic multiplier and the enhanced clock gating technique. This proposed method achieves the result of area, power, and a delay. The most achieved factor is delay, which is done by speculative Brent-Kung adder based vedic multiplier. Here the 16-bit FIR filter is designed and compare with the conventional approach. By this analysis, the proposed 16-bit FIR filter using enhanced clock gating technique achieves the result in the way of power, area, and delay. Thus, it concluded that the proposed methodology is applied for various DSP applications and the design is utilized for novel VLSI circuits. In this proposed research work, the achieved results are compared analyzed for the next coming designs. In future, this design is enhanced by analyzing various factors using the speculation process and parallel prefix adders and the number of bit utilization has to improve. Both adder and multiplier in the FIR filter are enhanced by adding a novel parallel prefix adder.

#### REFERENCES

[1] Jiajia C, Chip-Hong C, Feng F, Weiao D, and Jiatao D., (2015) "A Novel Design Algorithm for Low Complexity Programmable FIR filters based on Extended Double base number system," IEEE Transactions on Circuits and Systems I: Regular Papers. October 2015; 62(1):224-233.

- [2] Y-C Tsao, and Ken C., (2012) "Area-efficient VLSI implementation for parallel linear-phase FIR Digital filters of Odd length based on fast FIR Algorithm," IEEE Transactions on Circuits and Systems II: Express Briefs. June 2012; 59(6): 371-375.
- [3] Jeong-Ho H, and In-Cheol P., (2008) "FIR Filter Synthesis Considering multiple adder graphs for a coefficient," IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems May 2008; 27(5): 958-962.
- [4] Andrea B, Adam T, Phillippe F, and Andreas B., (2017) "Multiplier-Driven Perturbation of coefficient for low-power operation in reconfigurable FIR filters," IEEE Trans. on Circuits and Syst. I: Regular Papers, May 2017; 64(9): 2388-2400.
- [5] Mohanty B.K, and Meher P.K., (2013) "A high performance FIR filter for ficed and reconfigurable applications," IEEE Transactions on VLSI systems, April 2013; 24(2):444-452.
- [6] Pravin Y. Kadu, Ku. S Dhengre. (2014) "High speed and low power FIR filter Implementation using optimized adder and multiplier based on xilinx FPGA," IORD Journal of Science & Technology, vol. 1, Issue 3, pp. 46-52, April 2014.
- [7] Sweta kumari, Sangita kumari, and Mansi waghela. (2014) VLSI Design and Implementation of FIR Digital Filter using Low Power MAC. Intern. Jour. of Science, Engg. and Tech. Vol. 2, Issue 6, pp. 743-749, July 2014.
- [8] T Radha, and M Velmurugan. (2015) Low Power Digital FIR Filter Design using optimized adder and multiplier. Inter.l jour. of advances in engineering. 2015; 1(4): 538-544.
- [9] A Ranga Nayakulu and K Satya prasad. (2016) Low power clock gating method with subword based signal range matching technique. Indian Journal of Sci. and Tech. August 2016; 9(30): 1-13.
- [10] Kazi J Ahmed, Bo Yuan, Myung J Lee. (2018) High-Accuracy Stochastic Computing based FIR Filter Design. Proceedings in the Inter. Conf. on Acoustics, Speech and Signal Processing, Canada. pp. 15-20 April 2018.

- [11] N C Sendhilkumar., (2017) Reduced Complexity Wallace Tree Multiplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter. International Journal of MC Square scientific research. November 2017; 9(2): 166-175.
- [12] Bhawana Datwani and Himanshu Joshi. (2016) Implementation of High Performance FIR filter using High Speed & Low Area Multiplier. International Journal of Engineering and technology research. June 2016; 5(2): 209-213.
- [13] Yu-Ting Pai, and Yu-Kung Chen. (2004) The fastest carry lookahead adder. Proceedings of the DELTA 2004. 2<sup>nd</sup> Inter. Workshop on Electronics Design, test, and applications. Perth, WA, Australia, 28-30 January 2004.
- [14] Rose Ann Mathew, Shruthi H.Shetty, Ashwath Rao, Deepthi Dayanand, Megha N, and Savidhan shetty. (2016) Study on performance of vedic multiplier based on the adders used. Inter. Jour. for research in applied science & Engineering technology.May 2016; 4(5): 813-818.
- [15] Subha Jeyamala K, and Aswathy B S. (2016) Performance enhancement of Han-Carlson Adder. Inter. Jour. of advanced research in electronics and communication engineering. Feb 2016; 5(2): 226-230.
- [16] B Naga jothi, KSN Murthy, and K Srinivasarao. (2016) A Novel Approach for error detection and correction using prefix-adders. Inter. Jour. of Engineering and technology. Jul 2016. 8(3): 1420-1425.
- [17] BV Pavan kumar, M Lalitha Bhavani, and Y Himanth. (2018) Optimized carry speculative adder. International journal of trends in scientific research and development. April 2018; 2(3): 1128-1131.
- [18] Subhashinee A, and Rajasekaran C., (2016) Carry Speculative Adder with variable latency for low power VLSI. Inter. Jour. of computer applications. Pp. 16-18.2016.

- [19] P.S.Sreenivasa Reddy, and B Suneetha. (2019) Design and Implementation of 64Bit Vedic Multiplier using Adders (Verilog HDL). Inter. Jour. of research in advent technology, Special Issue, RAECE-2K19.
- [20] Rakesh Raju Nadimetla, and Lakshmi Bhavani. (2018) Design and Implementation of 64 Bit Vedic Multiplier based on Different Adder Structures on Verilog HDL. International Journal of Engineering Science and Generic Research. August 2018; 4(4): 22-25.
- [21] Neha Tyagi, and Neerak K Sharma. (2017) Implementation of High Speed Vedic Multiplier for Digital Signal Processing using Multiplexer based Adder. Inter. Jour. for Research in Applied science & Engineering Technology. June 2017; 5(6): 186-191.
- [22] Syed S Hussain, Muhannad N Majoka, and Gulistan Raja. (2014) Design and Implementation of 32-Bit Vedic Multiplier on FPGA. 1<sup>st</sup> International conference on modern communication & Computing Technologies. 26-28 February 2014. Nawabshah, Pakistan.
- [23] Ila Chaudhary, and Deepika Kularia. (2016) Design of 64-Bit high speed vedic multiplier. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. May 2016; 5(5): 4090-4096.
- [24] Syed Z Hassan-Naqvi., (2017) Design and simulation of enhanced 64-bit Vedic multiplier. IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies. 11-13 October 2017, Aqaba, Jordan. IEEE 2017.
- [25] Gaurav Raj, and Depanjan De. (2016) Design and Implementation of high speed vedic multiplier using brent kung adder on FPGA. International journal of Science Technology & Engineering. September 2016; 3(3): 84-87.
- [26] Julian F Miller. (1999) Digital FIR filter design at gate-level using Evolutionary Algorithms. Proceedings of the 1<sup>st</sup> annual conf. on genetic and Evolutionslry Computation. 13-17 July 1999, Orlando, Florida. July 1999; Vol. 2:1127-1134.

- [27] Kuldeep S Shekhawat, and Gajendra Sujediya. (2017) Design and Analysis of RCA and CLA using CMOS, GDI, TG and ECRL Technology. International Journal of Advanced Engineering Research and Science. November 2017. 4(11): 126-129.
- [28] Maroju S, and P Samundiswary. (2013) Design and Performance Analysis of various adders using verilog. A Monthly Journal of Computer Science and Information Technology. September 2013; 2(9): 128-138.
- [29] Syed A Alam, and Oscar G, (2016) "On the Implementation of Time-Multiplexed frequency response masking," IEEE Transactions on Signal Processing, August 2016, 64(15): 3933-3944.
- [30] C Dhanalakshmi, and C Manjula, (2016) "An Area efficient, low poer and High speed speculative Han-Carlson Adder," Intern. Jou. of Innov. Research in Scie., Engg and Techn., 5(2):110-117, March 2016.