

ISSN(Online) : 2320-9801 ISSN (Print) : 2320-9798

# International Journal of Innovative Research in Computer and Communication Engineering

(An ISO 3297: 2007 Certified Organization) Vol. 4, Issue 4, April 2016

# Review Paper on Coarse Grain Reconfigurable Architectures for Multimedia Application

Pooja D. Pantawane, Vaishali A. Tehre

Research Scholar, Dept. of ETC, G. H. Raisoni College of Engineering, Nagpur, India

Assistant Professor, Dept. of ETC, G. H. Raisoni College of Engineering, Nagpur, India

**ABSTRACT:** Multimedia applications are widely used in this era. Users demand increasing day by day for better quality and large speed for images, music, video, etc. For this high data commutation intensive processor required. Fulfilling users demand researchers started researching towards new technology to increasing data computation technology. There are already many system to perform multimedia application like DSPs, FPGAs ASICs and many more. But, they are insufficient to give high performance with flexibility consuming low power. Therefor researchers move to CGRAs, Coarse Grain Reconfigurable Architectures (CGRAs) have possibilities to fulfil the increasing demand of users. CGRAs have capability to perform high data computation intensive operations as like multimedia application required. The key characteristics of CGRAs are reconfiguration system and deep pipelining. This paper gives brief details about available coarse grain reconfigurable architectures which are targeted for multimedia application.

**KEYWORDS**: Coarse grain reconfigurable architecture; Application specific integrated circuits; Field programmable gate array; Low power; Multimedia application

### I. INTRODUCTION

Nowadays, multimedia applications are inseparable from our life. Multimedia applications are used for many purpose but mostly used for entertainment purpose. Enjoyers need high quality of photos, this high quality photo required high frequency operation. Applications required to perform very high arithmetic operations for data flow from server to users. Hence analysers are experimenting to increasing performance quality and to serve high speed. To perform various multimedia applications there are many systems like DSPs (Digital Signal Processors), FPGAs (Filed Programming Gate Arrays) and ASICs (Application Specific Integrated Circuits). This systems are not capable to fill increasing demand of customers.

ASIC is popular for specific task in small space. ASICs gives low power consumption and better performance for particular application but it lack in the flexibility. ASICs can perform only one application like music player. It can perform to play mp3 song not more than that but nowadays user need many thing in one application such demand cannot meet hence ASIC is lack in flexibility.

In FPGAs programmable elements are connected in such a way that it can make changes periodically to execute different task. Main advantages of FPGAs [1] are flexibility and short time for various application in SoC (System on Chip). But for computation purpose application like multimedia where it required large number of arithmetic operation it gives huge routing overhead problem, more power consumption, slow speed and more delay. Comparing to ASIC, FPGA have more cost, large area, more power consumption.

To achieve customers requirement coarse grain reconfigurable architecture is introduce. CGRAs is flexibility as compare to ASIC and consume low power as compare to FPGAs. CGRA have potential to bridge the performance and power gap between FPGA and ASIC. There are many type of coarse grain architecture depend on application domain and factor like power, routing timing etc. Depend on factor coarse grain architecture in design. In this paper we will study about the multimedia application based coarse grain reconfigurable architecture. There are many different



(An ISO 3297: 2007 Certified Organization)

### Vol. 4, Issue 4, April 2016

type of coarse grain reconfigurable architecture but reconfigurable architectures are popular among all. Reconfigurable architecture have high performance, efficiency and flexibility.

This paper divide in following section. Section I gives details about coarse grain reconfigurable architecture. Classification of coarse grain reconfigurable architecture is given in section II. Where section III gives brief view on multimedia application based coarse grain reconfigurable architectures. Section IV conclusion.

### II. COARSE GRAIN RECONFIGURABLE ARCHITECTURE

Coarse grain reconfigurable architecture is capable to gives performance with flexibility within application domain. The target of CGRAs is to have performance and power advantage of a ASICs as well as the cost and flexibility of an FPGA. Figure 1 is the example of CGRAs. In the example programmable elements are connected in mesh style structure. Each programmable element have function unit, MUX and output register. This architecture perform world level arithmetic and logical operation like addition, subtraction, shift operation, etc. Memory is attached in CGRAs thus it can store the data or it can retrieve the data. Same value operation do not need to perform once again.

Figure 2 of this paper gives the comparison of the system on the bases of its performance and power consumption. Multimedia application consume more power which is not tolerated by the users. In this figure 2 it clearly shows that FPGAs consume high power and DSPs and ASICs consume less power with high performance but lack in flexibility because of this both DSPs and ASICs are not acceptable. This is the reason CGRAs is famous among all. Within the application domain CGRAs perform high and consume less power it clearly shows in the figure 2.



Figure 1. Basic Coarse Grain Reconfigurable architectureFigure 2. Comparison of systems

### III. CLASSIFICATION OF COARSE GRAIN ARCHITECTURE

Coarse grain architectures are divided on the bases of two pattern first interconnection pattern and second reconfiguration pattern. In interconnection pattern there are different three style to connect FUs in architecture. They are mesh style based interconnection pattern, linear array based and crossbar based. In mesh style interconnection pattern function units are arranged in a rectangular structure. Identical FUs are connected to each other in horizontal and in vertical direction. Arrangement comfort to make direct connection with nearest FUs it gives better connection even if connection is longer.

Linear array based architectures processing elements arranged in a continuous form mostly neighbors connected quickly. Generally designed for the implementation of pipelined processes. There are many architecture based on linear array for pipelined application domain. In crossbar all the function unit are connected to a crossbar. Crossbar performs simple routing task. Crossbar behave like a switch many input connected too many output in matrix style. PADDI and SmartCell are the example of crossbar interconnection structure.



(An ISO 3297: 2007 Certified Organization)

### Vol. 4, Issue 4, April 2016

Reconfigurable system is rising standard for SoC technology improved version of coarse grain architecture. Key feature of reconfiguration system is specialization, performance and flexibility. Specialization in power consumption, performance flexibility and programming. Compare to ASIC, reconfigurable system gives better performance and flexible. Reconfigurable method have two type first is static and second is dynamic method. Depend on the application domain and operations reconfigurable method is select. Coarse grain reconfigurable architecture become in reality.

Static reconfiguration is the simplest and most common approach for implementing applications with reconfigurable logic. It involves hardware changes at a relatively slow rate, and consists of a single system-wide configuration. Prior the execution of an application, the reconfigurable resources are loaded with their respective configurations, and during the execution of the operation, the reconfigurable resources will remain in the same configurations (i.e. remain static) throughout the end of application execution. Advantages are higher performance than pure software implementation, lower cost than specific hardware.

Dynamic reconfigurable architecture performs parallel processing. Dynamic reconfigurable method is dedicated for stream based processing. Network of dynamic reconfiguration is distributed network. Dynamic reconfiguration gives high speed. It is multicast/broadcast of configuration data inside the chip on-line configuration. The table 1 shows the different coarse grain architecture there interconnection pattern, reconfiguration method and application domain. In this paper we focus on the multimedia application based architecture.

| Architecture<br>Name | Interconnection pattern | Reconfiguration method | Application domain                |
|----------------------|-------------------------|------------------------|-----------------------------------|
| MATRIX               | 2-D Mesh                | Static                 | General purpose                   |
| Chess                | Mesh                    | Static                 | Multimedia                        |
| Morphosys            | Mesh                    | Dynamic                | Multimedia                        |
| SYSCORE              | 1-D Linear array        | Static                 | Biomedical monitoring application |
| RaPid                | Linear array            | Static                 | DSP application                   |
| MORA                 | Linear array            | Dynamic                | Multimedia                        |
| FloRA                | 2-D Linear array        | Dynamic                | Multimedia, DSP                   |
| DRAA                 | Linear array            | Dynamic                | Multimedia                        |
| PADDI                | Crossbar                | Static                 | DSP application                   |
| SmartCell            | Crossbar                | Dynamic                | Multimedia                        |

Table 1.

#### **IV. MULTIMEDIA APPLICATION BASED CGRAs**

As table 1 shows that there are many coarse grain reconfigurable architecture for multimedia application. There is lot of research are done in this field and still continue to get high performance. In this section few multimedia based architecture were study in detail.

In MorphoSys [4] system targeted for high throughput. While computing word level operations MorphoSys perform faster than the FPGAs without giving any propagation delay. MorphoSys work for multimedia application like video compression. System have goal for users to give low cost frame data processing. MorphoSys is combination of CGRAs and FPGAs. Figure 2 shows the architecture of MorphoSys. There is core processor, reconfigurable cells array, context memory and main memory. MorphoSys based on the SIMD data flow pattern and operate on 100MHz frequency. Reconfigurable cells array are in two dimensions mesh style structure having three layered interconnection



(An ISO 3297: 2007 Certified Organization)

### Vol. 4, Issue 4, April 2016

network with operation perform in four stage pipeline. Core processor perform the 32bit operation. Context memory is play important role for making system dynamic. Context memory store the data without disturbing reconfigurable cells array's operation.



Figure 3. MorphoSys Architecture Overview [4] Figure 4. SmartCell architecture [12]

ASmartCell [12] Architecture targeted to stream based application. Large number of computation intensive operation can perform. SmartCell consist of cells in the form of 4x4 matrix. Each cell have four programmable units placed in four edges. Programmable elements have ALU, register, input output MUX, instruction memory and controller. Programmable unit perform different logical and arithmetical operations. SmartCell performed in 4stages of pipeline. It perform word level operation. Architecture have three type interconnection i.e. crossbar connection, C mesh and nearest neighbour inercell connection. Crossbar is type of share register memory block. Crossbar allow to share input output within the cell that makes fast data flow and nearest neighbour inercell connection gives direct sharing through short wire so that delay is not introduce like FPGAs. C mesh connection is nothing but a network which flow the data dynamically from one cell to another cell diagonally. By using this type of network connection performance of system increase to mapped various application in this architecture in same time. SmartCell is strong architecture to perform in real time they introduce software environment for this architecture. Figure 4 is shows SmartCell overview architecture.

Figure 5 in this paper is shows performance oriented DRAA [8] coarse grain architecture. Dynamically reconfigurable ALU array (DRAA) architecture have potential to perform in heavy traffic data it is suitable for multimedia application. DRAA have two main part in the architecture i.e. programmable elements and memory. Programmable elements are arrange in the 2-D linear array pattern and data path width is of 8bit. DRAA local memory sends the data to PEs array they perform the operation simultaneously. It give the output of Pes array operations to local memory. Through local memory it gives to a main memory and processor, to perform farther operation this scheme make system faster and reliable to used multimedia application. Local memory store the opcode for the next operation after completing one task another task reloaded in the PEs array. DRAA is focus in memory in hardware to perform early as compared to other architecture. Many application mapped on this architecture through conducting loops in the operation.



#### (An ISO 3297: 2007 Certified Organization)

Vol. 4, Issue 4, April 2016



Figure 5. DRAA Architecture [8]



Figure 6. MORA architecture [15]Figure 7. ADRES core architecture[11]

MORA is coarse grain reconfigurable architecture for multimedia application. MORA gives low power, high throughput, and area efficient architecture. MORA architecture perform on the bases of MIMD. MORA also design low level MORA assembly language to perform various operation for multimedia application. They also test this language for image compression algorithm and they got success. Many algorithm can be performed using this architecture to get high performance, high throughput and flexibility. MORA (Multimedia oriented reconfigurable array) [15] consist of reconfigurable cell, input output controller and external memory as shown in figure 6. Reconfigurable cell have programmable element with internal memory or local memory. Programmable elements are in 2-D structure array. ALU perform 8bit operation. They perform similar to small DSP processor. They are not depend on external memory hence they perform faster. Thus, they required very less memory access time and give less memory access delay with high memory bandwidth. Controller controls all the input output data and help to make connection with external memory. As compared to other system MORA is reliable and promising architecture for multimedia applications.



(An ISO 3297: 2007 Certified Organization)

#### Vol. 4, Issue 4, April 2016

ADRES (Architecture for Dynamically Reconfigurable Embedded system) [11] is coarse grain architecture degine in 2005. It give high performance for embedded applications. Multimedia applications like video compression performed in this architecture. ADRES architecture is tasted for H.264/AVC decoder successfully. ADRES architecture give high peformance and flexibility. ADRES core architecture given in figure 7 it shows the two view i.e. VLIW view and Reconfigurable array view or accelerate view. In the VLIW view control the data flow. Cental register file is used store the operation opcode. Architecture contain eight functional units. Where in accelartion view there is 2-D array of functional unit with local register file for each function unit. Infinite operation can be store in VLIW mode but finite number of operation can be performed by reconfigurable arrays. Each function unit have ALU, muxes and local memory. ALU can perform 32 bit of operations like move and logical operation. Reconfigurable array view have orthogonal interconnection network so that data flow in the architecture horizontaly and verticaly. Function units share the data through global bus or direct exchange with neighbour. Reconfigurable array exchange the data with external memory through the VLIW processor. This architecture give perform well in small area in low cost.

#### **V. CONCLUSION**

After studying existing architecture it conclude that coarse grain reconfigurable architecture is the suitable one for multimedia application as FPGAs, ASICs and DSPs lack in some discipline. Whereas coarse grain reconfigurable architecture gives high performance, low power consumption, low delay, high throughput and area efficient. This review also shows that 2-D array based programmable elements mostly used in the architecture such that it give faster response. Local memory is used in the architectures to decrease memory access time.

#### REFERENCES

- [1] Y. Shibata, H. Funatsu, Y. Ishida and J. Yoshida "A development system for an SRAM-based user-reprogrammable gate array", P3/2.1-P3/2.4, IEEE 1990.
- [2] Ethan Mirsky and AndrCDeHon. "MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources." Pages 157-166, IEEE 1996.
- [3] Carl Ebeling, Darren C. Cronquist, Paul Franklin, Jason Secosky, and Stefan G. Berg. "Mapping Applications to the RaPiD Configurable Architecture." Pages 106-115, IEEE 1997.
- [4] Hartej Singh, Ming-Hau Lee, Guangming Lu, Fadi J. Kurdahi, Nader Bagherzadeh, and Eliseu M. C. Filho. "MorphoSys: An Integrated Reconfigurable System for Data-ParallelComputation-Intensive Applications." sbcci, pp.134, XI BrazilianSymposium on Integrated Circuit Design, 1998.
- [5] Seth Copen Goldstein, Herman Schmit, Mihai Budiu, Srihari Cadambi, Matt Moe and R. Reed Taylor. "PipeRench: A Reconfigurable Architecture and Compiler." Pages 70-77, IEEE April-2000.
- [6] W. J. Dally and A. Chang, "The role of custom designs in ASIC chips," In Proceedings of 37th Design Automation Conference, pp. 643–647, 2000.
- [7] Benini, A. Bogliolo, and G. D. micheli, "A survey of design techniques for systemlevel dynamic power management," in Proceedings of IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 299–316, 2000.
- [8] Jong-eun Lee, Kiyoung Choi and Nikil D. Dutt. "Evaluating Memory Architectures for Media Applications on Coarse-Grained Reconfigurable Architectures." Proceedings of the Application-Specific Systems, Architectures, and Processors (ASAP'03), IEEE 2003.
- [9] Becker and M. Vorbach, "Architecture, memory and interface technology integration of an industrial/academic configurable system-on-chip (CSoC)," In Proceedings of the IEEE Computer Society Annual Symposium on VLSI, pp. 107–112, 2003.
- [10] Tuan and B. Lai, "Leakage power analysis of a 90 nm FPGA," in Proceedings of the IEEE Custom Integrated Circuits Conference, pp. 57–60, 2003.
- [11] Francisco-Javier Veredas, Michael Scheppler, Will Moffat and Bingfeng Mei "custom implementation of the coarse-grained reconfigurable adresarchitecture for multimedia purposes", IEEE 2005
- [12] Cao Liang and Xinming Huang. "Smart Cell: An Energy Efficient Coarse-Grained Reconfigurable Architecture for Stream-Based Applications." Hindawi Publishing Corporation EURASIP Journal on Embedded Systems Volume 2009, Article ID 518659.
- [13] Chenxin Zhang, Thomas Lenart, Henrik Svensson and Viktor Öwall. "Design of Coarse-Grained Dynamically Reconfigurable Architecture for DSP Applications." In proceedings of IEEE symposium on International Conference on Reconfigurable Computing and FPGAs Pages 338-343, IEEE 2009.
- [14] Dongwook Lee, Manhwee Jo, Kyuseung Han and Kiyoung Choi. "FloRA: Coarse-Grained Reconfigurable Architecture with Floating-Point Operation Capability." Pages 376-379, IEEE 2009.
- [15] Sai Rahul Chalamalasetti, SohanPurohit, Martin Margala, WimVanderbauwhede "MORA An Architecture and Programming Model for a Resource Efficient Coarse Grained Reconfigurable Processor", Pages 389-396, IEEE2009. Kunjan Patel, SeamasMcGettrick and Chris J. Bleakley, "SYSCORE: A Coarse Grained Reconfigurable Array Architecture for Low Energy Biosignal Processing" The 19th Annual IEEE International Symposium on Field-Programmable Custom Computing Machines 2011.