

(An ISO 3297: 2007 Certified Organization) Vol. 4, Issue 2, February 2016

# Design of Low Power NATURE Architecture by Using SRAM

R.Dineshkumar, P.N.Palanisamy

Assistant Professor, Dept. of ECE, Dhirajlal Gandhi College of Technology, Salem, India

Associate Professor, Dept. of ECE, Mahendra College of Engineering, Salem, India

**ABSTRACT:** In this paper NAno TUbe REcongfigurable architecture based on the Static RAM memory. It will perform as a basic memory element in the Field programmable gate array (FPGA) unit. Basically the NATURE architecture memory element is nano random access memory(NRAM). By using the SRAM it will be provide the maximum speed of read and write operation at the output. By equalizing the operation of the NRAM memory element we can increase the transistor and modifying circuit of the SRAM for memory element. The transistor performance will be evaluate by using the different type technology for better operation.

**KEYWORDS:** NATURE Architecture, FPGA design, memory element.

### **I.INTRODUCTION**

In early using of microcontroller will be perform all problem but it will be not suitable for all complex type of the problem such as it will be need more clock cycle to perform the operation by overcoming this introducing the device as Field Programmable Gate Array. Compare with microcontroller software controlling an hardware is eliminated operation such as fetch/decode/execute/store.

FPGA also has building blocks such as combinational, sequential, memory, register, Arithmetic. This will be provide short time design and consume low power at standby mode also its cost is low at design level. The delay also can make reduce in the application specific integrated circuit. NATURE architecture based on the CMOS/nanotechnology. By designing using nanoRAM have to met main problem as logic density and run time configuration. From changing as a CMOS logic we can overcome this problem. memory will be based on the CMOS device as static random access memory. It consume low power for performing the read and write operation compare with other logic device. It also used a pipelining concept for improve operation of the device.

### a. NATURE Architecture

NATURE's architecture contains logic blocks it's used to connect the interconnect and supporting local and global communication between the logic blocks[2]. It's have connection and switch. Switch boxes are connect wire segment. Logic block contains a supermacroblock(SMB) and local switch matrix. Input and output of SMB are connected to the inter connection network through switch matrix. Neighbouring SMBs is connected via direct links.

They have two levels of logic clusters in an logic block. This will be facilitate temporal logic folding of circuit and enable most inter-block communication to be a local. The first level of macro block contain n1 m-input reconfigurable logic elements. In second level, n2 MBs comprise an SMB it's used communications with different components placed through a local crossbar. Crossbar should be used to speed up the local communication instead of multiplexer. It's require SRAM control bits for the speed-up process.

Logic elements contains an m-input LUT and flip-flop. The m-input of LUT can be implement any m-variable Boolean function. Flip-flop used to store the internal results for further purpose. Pass transistor is used to decide if the internal result will be stored or not.NanoRAM is associated with each block and used to store its runtime of the reconfiguration bits. If the k configuration sets are stored in an NRAM, then the associated components can be reconfigured k times during execution.



(An ISO 3297: 2007 Certified Organization)

#### Vol. 4, Issue 2, February 2016

#### b. SRAM BASED MEMORY

A SRAM memory will be based on 9transistor [4] structure shown figure 1 in that sub circuit of the new memory cell 6transistor SRAM (P1,P2,N1,N2,N3,N4). The write access transistors are (N3,N4)controlled by write signal(WR)and read access transistors (N6) and controlled read signal(RD). During the write operation WR signal is maintained as high and RD signal is maintain as low. The BL and BLB have the alternate value then writing operation will be performed in write access transistor either 0 will be store in transistor.

During the Read operation RD signal is maintained as high and WR signal is maintained as low. when the both BL and BLB value as high level. The transistor N3 and N4 will be cut-off region and 6t SRAM cell is maintained at the ground level during a read operation. Read access transistor (N6) will be activated. Read and write operation for CMOS SRAM will be evaluated by using the hspice software, the output will be shown in figure3. It will be based on bit line of the memory unit.



Fig 1. 9t structure

The read and write operation of new 9Transistor SRAM memory cell will be calculated and it provide better read access time compare with the 10transistor SRAM cell. The table 1 is shown the timing analysis between the transistor.

| Transistor type | Read access time |
|-----------------|------------------|
|                 |                  |
| 10t             | 6.01 us          |
|                 |                  |
| 9t              | 5.56us           |
|                 |                  |

Table1. Timing analysis

### **II.NATURE BY USING SRAM**

CMOS SRAM contains logic blocks connected by interconnect including wires, long wire, for supporting the local and global communication. An LB contains a Logic Elements (LEs) and local switch matrix. To improve the speed and density of FPGA's to replace some of the programmable connections between basic logic blocks with hard-wired connection, which are simple metal wires. Using hard-wired links to construct more coarse-grained logic block from several basic blocks, the size of the circuit and delay can be reduced. The local switch matrix in the LB is placed close to the CB for efficient local communication.

Design of logic element contains look up table (LUT), D flip-flop and several cross bars connection shown in the figure 2. The crossbars determine the output signals of the logic elements from the LUT or DFFs.. At the rising edge



(An ISO 3297: 2007 Certified Organization)

#### Vol. 4, Issue 2, February 2016

of the clock signal CLK, reconfiguration commences, followed by computation. prior computation result is stored into a register to support computation in the next cycle. In Different type of the chip use different folding levels, their clock frequencies will also differ. Each section can be controlled by a different clock signal. The LUT implement by n-variable Boolean function and flip-flop stores the internal results for future purpose. Here pass transistor will be used to decide if the result will be stored or not.



Fig 2 combinational and sequential read/write memory



Fig 3. Read and write performance using Hspice

#### a. Interconnect

Programmable interconnect contains important parts are routing channels, anti-fuse and programming transistors. The routing channels consist of routing tracks used to predefined wiring segments. This will be based on the large number of design. The anti-fuse is located between the wire segment.Routing architecture is maintain good area efficiency by making the easier communication for logic folding. Interconnect contains switch blocks, connection block and wires. Global wires will be reduced because global communication significantly reduces by the temporal logic folding. The programmable interconnect consist of routing channels, anti-fuse and programming transistors. The routing channels contain routing tracks which contain predefined wiring segments of various length to enhance routing. This segmentation is based on statistics from large number of deign. Anti-fuse elements are located at the intersection of the horizontal and vertical wire segments, and between adjacent horizontal and vertical segments

The switch box connects to the length-1 and one-third of the length-3 tracks to and from the four directions. This should be composed of 32switches. Connections between pairs of tracks are based on transmission gates, and an output buffer is placed at each SB output. Six connection share four buffers for better area efficiency. The types of switches. There are typically three types of switches: pass transistor, Multiplexer and tri-state buffer. using pass



(An ISO 3297: 2007 Certified Organization)

### Vol. 4, Issue 2, February 2016

transistors results in fastest implementations of small crossbars, so choose pass transistors for the local crossbars within the MB and SMB. A Multiplexer has longer delay, but it needs fewer reconfiguration bits. Therefore, it is used in the switch matrix to connect to the inputs of an SMB. The outputs of an SMB are connected to long interconnects through tri-state buffers in the switch matrix. The switch box contain transmission gates and SRAM bits. Transmission gates will be provide the better efficiency for making the connection between nodes.

#### b. Logic folding

In a logic folding method temporal logic folding will be used for on chip RAM, for realizing different Boolean function. This can be performed at different level of granularity and provide the flexibility for area performance. Different type of folding level are used, the cycle period of the execution will be increased when the folding level is large, increasing logic element the cycle decreases.level2 folding will be involved in the rearrangement of the logic element.

It will be performed as larger circuit mapped into a same chip area of architecture. from this folding level decrease by increasing the delay logic folding. The delay will be increased by using different level folding.

Clustering is the process of mapping LUT and FPGA. It will be reduced required cluster device in the each cluster. The algorithm is LUTs will be determined every cluster, this will be make apparent overall LUT used in the device. The Greedy clustering leads fully populated clusters. Placement method divide and dispense portion of the empty clusters inside the device for minimizing routing.

#### **III.LAYOUT BASED DESIGN**

The transistor schematic will be consider as a memory for storage input value from using the mentor graphic IC station the NATURE architecture can be design. The layout can be done in mentor graphic calibre and design rule check also done. the run-time configuration of the 9T SRAM stored in the logic block. configurable switch is distributed in the layout and it should be connected with the multiplexer output. layout of switch box should be divided as two blocks they are switches and transmission gates. The area and delay of the component will be calculated by using layout.

The layout will be provide the parasitic capacitance and parasitic resistance after the RC extraction. The layout will be provide the parasitic capacitance and parasitic resistance after the RC extraction. The timing analysis will perform before and after layout and simulation. This will be classified into dynamic and static timing analysis. The dynamic timing analysis will be take the long duration to complete. so the static timing analysis will be performed for pre-layout and post-layout simulation. It will be converted into Register transfer logic (RTL) format timing analysis. At the design entry level it will be converted into netlist file. This will be called pre-layout design, After placement and routing process the timing analysis will be perform as called post-layout design.

| Structure | Power analysis |
|-----------|----------------|
| 9t        | 19.664uw       |
| 10t       | 92.677uw       |
|           | 9t             |

Table 2 .comparison table

#### **IV.RESULTS AND ANALYSIS**

The modified 9 transistor structure consume the low power compare with the 10t SRAM structure. The comparison table will be shown in table2. From the analysis report we can use modify 9t structure in the memory at FPGA unit.

### **V.CONCULSION**

Using 9t CMOS SRAM in NATURE architecture the power will be reduced approximately 60% power consumed and area also reduced. This architecture will be provide the good efficiency and also design complexity reduced.



(An ISO 3297: 2007 Certified Organization)

#### Vol. 4, Issue 2, February 2016

#### REFERENCES

[1].SRAM-Based NATURE: A Dynamically Reconfigurable FPGA Based on 10T Low-Power SRAMs Ting-Jung Lin, Wei Zhang, and Niraj K. Jha VOL. 20, NO. 11, NOVEMBER 2012.

[2]. "A hybrid nano/CMOS dynamically reconfigurable system—Part I: Architecture," W. Zhang, N. K. Jha, and L. Shang, ACM J. Emerg. Technol. Comput. Syst., vol. 5, no. 4, pp. 16.1–16.30, Nov. 2009.

[3]. S. Wilton, "Architectures and Algorithms for Field-Programmable Gate Arrays with Embedded Memories," Ph.D. Dissertation, University of Toronto, 1997.

[4]. A Novel Design of a 9T SRAM Cell with Reduced Leakage for Embedded Cache Memory Application European Journal of Scientific Research ISSN 1450-216X Vol.81 No.1 (2012), pp.93-102

[5]. H. Noguchi, Y. Iguchi, H. Fujiwara, Y. Morita, K. Nii, H. Kawaguchi, and M. Yoshimoto, "A 10T non-precharge two-port SRAM for 74% power.

6]. W. Wang, V. Reddy, A.T. Krishnan, R. Vattikonda, S. Krishnan, and Y. Cao. Compact modeling and simulation of circuit reliability for 65-nm cmos technology. In IEEE Transactions on Device and Materials and Reliability, 2007

[7]. Adres: An Architecture with tightly coupled VLIW processor and coarse-Grained reconfigurable matrix.

[8]. VPR: A New Packing, Placement and Routing Tool for FPGA Research1" Vaughn Betz and Jonathan Rose"

[9]. G. Lemieux and D. Lewis, "Circuit design of routing switches," in Proc.Int. Symp. FPGA, 2002, pp. 19-28.

[10]. I. Kuon and J. Rose, "Measuring the gap between FPGAs and ASICs," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no. 2, pp. 203–215, Feb. 2007.

[11]. H. Noguchi, Y. Iguchi, H. Fujiwara, Y. Morita, K. Nii, H. Kawaguchi, and M. Yoshimoto, "A 10T non-precharge two-port SRAM for 74% power reduction in video processing," in Proc. IEEE Comput. Soc. Annu. Symp. VLSI, 2007, pp. 107–112.

[12]. T. Fujii, K.-I. Furuta, M. Motomura, M. Nomura, M. Mizuno, K.-I.Anjo, K.Wakabayashi, Y. Hirota, Y.-E. Nakazawa, H. Ito, and M. Yamashina, "A dynamically reconfigurable logic engine with a multi-context/multi-mode unified-cell architecture," in Proc. IEEE Int. Solid-State Circuits Conf., 1999, pp. 364–365

[13]. Robert Francis. Technology Mapping for Lookup-TableBased Field-Programmable Gate Arrays. PhD thesis, University of Toronto, 1992.

14].M. Khellah, S. Brown, and Z. Vranesic, "Modelling routing delays in SRAM-based FPGAs," in Canadian Conference on VLSI, pp. 6B.13–18, November 1993.

[15]. E. Ahmed and J. Rose, "The effect of LUT and cluster size on deep-submicron FPGA performance and density," in ACM/SIGDA Int. Symp. on FPGAs, pp. 3–12, 2000.

[16]. S. Brown, J. Rose, Z. G. Vranesic, "A Detailed Router for Field-Programmable Gate Arrays," IEEE Trans. on CAD, May 1992, pp. 620 - 628. [17]. V.De and S.Borkar," Technology and design challenges for low power and high performance," in proc.ISLPED,1999, pp. 163-168.

[18]. Mentor Graphics, OR, "IC Verification and Signoff Using Calibre," 2009. [Online]. Available: http://www.mentor.com/products/ic\_nanometer\_design/verification-signoff.