Abstract—This paper presents a synthesis and simulation of proposed enhanced buffer. The design provides advantages of both buffer and bufferless network for that two cross bar switches are used. The concept of virtual channel (VC) is eliminated from the previous design by using an efficient flow-control scheme that uses the storage already present in pipelined channels in place of explicit input VCBs. This can be addressed by providing enhanced buffers on the bufferless link and creating two virtual networks. With this approach, VCBs act as distributed FIFO buffers. Without VCBs or VCs, deadlock prevention is achieved by duplicating physical channels. An enhanced buffer provides a function of hand shaking by providing a ready valid handshake signal and two bit storage. Through this design the power is reduced to 15.65% and delay is reduced to 97.88% with respect to virtual channel router.

Keywords—Enhanced buffer, Gate delay, NOC, VCs, VCB.

I. INTRODUCTION

WITH the advancement in IC technology the gate delay decreases which leads to relatively increase the wire delay. The wire delay decides the overall performance of the system. Many VLSI designers are trying to solve this long global wire delay problem through buffer insertion. Many current System-on-Chips (SoCs) use a system bus to connect several functional units. These SoC system buses can support only limited number of functional units, and thus will face scaling problems in heterogeneous Multiprocessor System-on-Chips (MPSoCs) or large scale Chip-Multiprocessors (CMPs). In order to solve these long global wire delay and scalability issues, many studies suggested the use of a packet based communication network which is known as Network-on-Chip (NoC). This NoC is used to connect many functional units with a universal communication network [1]-[3]. In NoC, a router sends packets from a source to a destination router through several intermediate nodes. If the head of packet is blocked during data transmission, the router cannot transfer the packet any more. In order to remove the blocking problem, wormhole routing method is proposed in [4]. The wormhole router splits the packet into several flits which can be transferred in a single transmission. Buffer allocation and flit control are performed at a flit level in wormhole routing since wormhole routing does not allocate available buffer to whole packet [5]. Therefore, the wormhole routing is a method which can minimize overall latency and may decrease buffer size compared to others. In addition, VCs are used to avoid deadlock problem and thus increase throughput. The main purpose of VCs is to decouple the allocation of buffer space to allow a flit to use a single physical channel and competing with other flits. The existing router technique based on virtual channel as discussed below.

A. Virtual Channel Regulator (ViChaR)

The router implemented using virtual channel is virtual channel regulator ViChaR [6], [7]. This can efficiently allocating buffers to all virtual channels, instead of using separated buffers, a unified buffer structure(UBS) is used to share the internal flit buffers and Unified Control Logic (UCL) to control UBS and assign buffers into VCs dynamically according to the network traffic [11]. Fig. 1 shows the ViChaR router buffer architecture.

It is shown in Fig. 1 the UBS buffer structure in which the v independently k- flit are logically grouped in a single vk-flit and with UCL as logically unified structure. To avoid large
components UBS has the same number of MUX/DEMUX i.e. one MUX/DEMUX per k flits.

![Diagram of Virtual Channel Allocation for ViChaR](image1.png)

**Fig. 2 Virtual Channel Allocation for ViChaR**

- In the shown Fig. 2 Virtual Channel Allocation for ViChaR the first arbitration stage reduces the number of requests for a particular output port to one request per input port. In ViChaR the VA first stage needs \(vk:1\) arbiter but in second stage a winner for each output port among all the competing input ports. So it just allocates the VCs instead of accepting requests for specific VCs. Similarly the SA in Fig. 3 the first stage of ViChaR sharing of a single port by a number of VCs, ViChaR needs larger \(vk:1\) arbiters. The second stage arbitrates between the winning requests from each input port (i.e. \(P\) ports) for each output port.

![Diagram of Switch Allocation for ViChaR](image2.png)

**Fig. 3 Switch Allocation for ViChaR**

- In the proposed design the advantage of both bufferless and buffered networks is taken into account for that two cross bar switches are used [9]. The proposed design is a combination of a bufferless primary crossbar and a buffered secondary crossbar. At low load condition the all packets would only traverse the primary crossbar and follow minimum path and experiences minimum delay so behave as bufferless network. At high load the packets get traverse from secondary cross bar. Through an enhanced buffer which provides a function of handshake by providing a ready valid handshake signal.

II. PIPELINE STAGES

NoCs routers are pipelined at the flit level to better utilize all the control units and improve the throughput [8]. Fig. 5 (a) shows the pipeline of virtual router and timing of different flits in a packet. The stages are: buffer write (BW), Routing Computation (RC), Allocation (VA), Switch Allocation (SA), Switch Traversal (ST), and Link Traversal (LT). RC works when the head flit of a packet is stored in a virtual-channel, the routing information carried by the head flit is input to the router to determine the output port of the packet. Once the result is calculated, all of the flits in the same packet must use the same output port. VA works when the output port is determined, the result is input to the virtual-channel allocator to assign a single output virtual-channel on the corresponding output port. If the allocation fails, the head flit needs to wait until the output port has a free VC to assign. The allocation is performed for the head flit only. SA works when the output VC is assigned, per-packet operations are completed and switch allocation is performed flit-by-flit. All of the flits in a packet will consecutively bid for a single-flit time slot to traverse the switch. ST works when the switch is allocated to a flit, the flit uses one cycle to traverse the switch to the desired output port. LT works when the switch is traversed by a flit, the flit uses another cycle to traverse the channel and reach its downstream router. Each pipeline stage requires one cycle to perform. Therefore, a flit needs six cycles to traverse the router. In ViChaR buffer write (BW) and Routing Computation (RC) access simultaneously so the number of stage need is 5. The problems which are concluded from the above technique is delay as processing time is too long and second due to the virtual channels the buffers are allotted to all VCs resulting in increased complexity in buffer management resulting increases power consumption [7], [10]. So in order to overcome delay problem and power consumption problem new proposed router design come in to account so solve these problem, in the proposed design the Virtual channel is eliminated and the number of stages get reduced to three.

III. PROPOSED METHODOLOGY

In the proposed design the advantage of both bufferless and buffered networks in taken in to account for that two cross bar switches are used [9]. The proposed design is a combination of a bufferless primary crossbar and a buffered secondary crossbar. At low load condition the all packets would only traverse the primary crossbar and follow minimum path and experiences minimum delay so behave as bufferless network. At high load the packets get traverse from secondary cross bar. Through an enhanced buffer which provides a function of handshake by providing a ready valid handshake signal.
In Fig. 4 proposed router with enhanced buffer is shown. It is having four input port at low traffic the data flit traverse through primary switch at heavy load the flit get store in the enhanced buffer and the data flit traverse through secondary crossbar. The function of processing element is to give feedback from output to input to show whether the flit is valid or not. Buffers are provided in front of secondary in which the data moves serially as the virtual channel is eliminated so the virtual allocator stage is eliminated. Switch arbiter (SA) is modified to make control over the Demux and Mux to maintain the correct packet flow in both crossbars. The elimination of VCs eliminates the VA stage and simplifies SA stage, so that SA and ST could be performed in the same cycle the number of stages reduce to 3 so shown in Fig. 5 (b).

The function of enhanced buffers is a flow-control scheme that uses the storage already present in pipelined channels in place of explicit input VCBs. With this approach, the channels themselves act as distributed FIFO buffers under congestion. Without VCBs, and hence VCs, deadlock prevention is achieved by duplicating physical channels. Duplicate physical channels must be used in the same way as VCs to prevent deadlocks and to differentiate between traffic classes. Fig. 6 (a) shows a Dflip-flop (DFF) that is implemented using master and slave latches. By adding control logic to drive the latch enable pins independently, each latch can be used as an independent storage location. Thus, the FF becomes an EB, a FIFO with two storage locations. This is illustrated in Fig. 6 (b).

**Fig. 4 Proposed router with enhanced buffer**

**Fig. 5 (a) 5 stage ViChaR router (b) 3 stage proposed router with enhanced buffer**

**Fig. 6 (a) A DFF with master and slave latch (b) A EB is a FF whose latch are control by the EB control logic**

EB use ready (R) and valid (V) handshake to advance data. Output of ready indicates that enhanced buffer has at least one empty storage slot to latch more data. Valid outputs indicate that the data currently being driven is valid. The control logic samples incoming control signals at rising clock edges to determine if data has advanced to or from the Enhanced buffer. Asserted ready and valid signals between two EBs indicate that data has advanced. Data is latched in the master latch at the end of the cycle, and advances to the slave at the beginning of the next cycle. If during the previous cycle the incoming ready was de-asserted and the state remains at 1, data is already in the slave latch and its enable input needs to be disabled. In the buffered secondary crossbar the incoming flits are stored at the selected cross point for their output.
the second stage, outputs arbitrate among all the valid cross points and store flits into the output. The ready-valid handshake is used in both stages.

IV. IMPLEMENTATION AND RESULTS

Implementation is done in Cadence Virtuoso at 180nm. The designing parameter are selected at the time of designing. Virtual channel router is designed for making comparative study between the two techniques. Table I shows the comparison based on the delay and total average power.

<table>
<thead>
<tr>
<th>Design</th>
<th>Delay (nsec)</th>
<th>Total average power (µW)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Virtual channel router</td>
<td>0.877</td>
<td>445.1</td>
</tr>
<tr>
<td>Proposed enhanced buffer router</td>
<td>0.358</td>
<td>427.5</td>
</tr>
</tbody>
</table>

The proposed enhanced buffer router performs better than the virtual channel router in both the parameters; the power is reduced to 15.65% and delay is reduced to 97.88% with respect to virtual channel router. Fig. 7 shows the circuit diagram of the proposed enhanced buffer router. Fig. 8 shows the delay of the proposed enhanced buffer router. Fig. 9 shows the power of the proposed enhanced buffer router.

V. CONCLUSION

In proposed design the advantage of both buffered and bufferless is achieved. At low load condition the packet traversal is done through a bufferless network and at high load the packet traversal is through enhanced buffer i.e., buffered network. It provides better results than generic NoC router as well as virtual channel router. Proposed router achieves 15.6% compare to virtual channel router. In terms of delay the proposed router provides 97.88% less delay as compared to virtual channel router. Reducing pipeline stages result in reduced delay for packets in each router, which in turn decreases the average latency. Because packets spend less time traveling in the network, the now-unoccupied cycles enable more packets to be transmitted and increase the throughput of the network.

REFERENCES


Bhavana Prakash Shrivastava received her degree in Electronics and Communication Engineering in 2003, M.Tech. degree in Digital Communication Systems in 2007. Now she is pursuing her Ph.D degree in VLSI design under the guidance of Dr. Kavita Khare. She is working as Assistant Professor of Electronics and Communication Engineering in MANIT, Bhopal. Her fields of interest are VLSI design and Communication Systems and networking. She is fellow of IEEE (India).

Kavita Khare received her B.Tech degree in Electronics and Communication Engineering in 1989, M.Tech. degree in Digital Communication Systems in 1993 and Ph.D. degree in the field of VLSI Design in 2005. She has nearly 100 publications in various international conferences and journals. Currently, she is working as Associate Professor of Electronics and Communication Engineering in MANIT, Bhopal. Her fields of interest are VLSI design and Communication Systems. Her research mainly includes design of arithmetic circuits and various communication algorithms related to synchronization, estimation and routing. Dr. Khare is a Fellow IETE (India) and Life Member ISTE.