# FPGA Implementation of Generalized Maximal Ratio Combining Receiver Diversity

Rafic Ayoubi, Jean-Pierre Dubois, Rania Minkara

Abstract— In this paper, we study FPGA implementation of a novel supra-optimal receiver diversity combining technique, generalized maximal ratio combining (GMRC), for wireless transmission over fading channels in SIMO systems. Prior published results using ML-detected GMRC diversity signal driven by BPSK showed superior bit error rate performance to the widely used MRC combining scheme in an imperfect channel estimation (ICE) environment. Under perfect channel estimation conditions, the performance of GMRC and MRC were identical. The main drawback of the GMRC study was that it was theoretical, thus successful FPGA implementation of it using pipeline techniques is needed as a wireless communication test-bed for practical real-life situations. Simulation results showed that the hardware implementation was efficient both in terms of speed and area. Since diversity combining is especially effective in small femto- and picocells, internet-associated wireless peripheral systems are to benefit most from GMRC. As a result, many spinoff applications can be made to the hardware of IP-based 4<sup>th</sup> generation networks.

Keywords— Femto-internet cells, field-programmable gate array, generalized maximal-ratio combining, Lyapunov fractal dimension, pipelining technique, wireless SIMO channels.

# I. INTRODUCTION

The performance of wireless networks is inhibited by the effect of multipath fading noise on the communication link's effective capacity and signal-to-noise-ratio (SNR). However, we can take advantage of multipath scattering by processing multiple signal paths between the transmitter and the receiver in order to improve the SNR and channel capacity. This type of signal processing is referred as diversity combining and the wireless channel is termed single-input-multiple-output (SIMO). Maximal Ratio Combining (MRC) is a commonly used receiver diversity technique and it is proven to be the optimal multichannel receiver in the sense that it minimizes the mean square error (MMSE). Since this technique requires SNR estimation algorithm, two cases can be explored: perfect channel states information and imperfect

R. Ayoubi, J. Dubois (J. Daba), and R. Minkara are with the Electrical Engineering Department., University of Balamand, KalhatEl Koura, Lebanon. (e-mails: rafic.ayoubi@balamand.edu.lb, jay.daba@balamand.edu.lb, rania.minkara@balamand.edu.lb).

channel states information. The performance of MRC under imperfect channel state information starts to deteriorate as the estimation error is increased. In a separate study [1], we developed a novel supra-optimal technique termed Generalized Maximal Ratio Combining (GMRC) and we analysed its structure. We showed that GMRC and MRC yield identical bit error rate (BER) in the case of perfect channel states information, and as errors were introduced into the estimation of channel coefficients, GMRC resulted in a better performance. Until an efficient hardware implementation is devised for GMRC, it remains only a theoretical scheme.

The purpose of this research is to study the feasibility of GMRC using FPGA. Since its introduction in the late eighties, FPGA (field-programmable gate array) technology has been steadily improving in performance, density, power consumption, and overall system cost. This is due to two main factors. First, advancement in VLSI technology allows few million logic gates to be fabricated on a single die. FPGAs include a relatively large number of programmable logic elements, typically ranging from tens of thousands to several million programmable logic elements. Second is the availability of powerful software tools that can simplify conception, simulation, and implementation of complex designs. Other advantages such as a shorter time to market, non-recurring engineering costs, and ability to re-program in the field to fix bugs make FPGA an attractive technology.

The FPGA architectures [2] are dominated by interconnect which makes them very flexible in terms of the range of designs that are practical for implementation within them. These architectures contain embedded memories and higher-level embedded functions such as adders and multipliers. Many modern FPGAs also support full or partial in-system reconfiguration, allowing the ability to be reprogrammed at "run time". FPGA architectures have been modernized by combining logic blocks and interconnects with embedded microprocessors to form a complete "system on a programmable chip" (e.g. Xilinx Virtex-II PRO and Virtex-4 devices, and the Atmel FPSLIC [3]).

FPGAs offer architecture that are well suited for algorithms that require massive parallelism. The inherent parallelism of the logic resources as well as the availability of complex hard cores (such as multipliers, large RAM blocks, DSP slices, etc.) on the FPGA allow for considerable computation throughput even at sub-500MHz clock rates. In fact, today's generation of FPGAs can implement around 100 single precision floating point units, all of which can compute a result every single clock cycle.

Applications of FPGAs include DSP, image processing, computer visionm, speech recognition, cryptography, telecommunication, computer hardware emulation, and a growing range of other areas. High performance computing DSP applications with FFT or convolution computational kernels are increasingly being performed on the FPGA instead of a microprocessor [4].

Due to the abundance of logic fabrics and state elements in the FPGA, one of the main techniques to improve performance is pipelining. In this paper, an efficient FPGA implementation of a novel GMRC combiner [1] in wireless single-input multiple-output (SIMO) systems is proposed, where pipelining techniques play an important role in achieving a very high throughput.

The rest of this paper is organized as follows. In section 2, we describe our previously published novel GMRC diversity combining technique whose performance was investigated for BPSK transmission over Rayleigh fading channels in single-input multiple-output (SIMO) systems (a sub-component of MIMO). In section 3, we propose the FPGA implementation of GMRC using efficient pipelining techniques. Future extension of the design includes implementing a detector and interfacing a wireless BPSK transceiver to the FPGA.

### II. GENERALIZED MAXIMAL RATIO COMBINING

GMRC is a generalized form of MRC. In MRC, the signals received form each diversity path are weighted by a coefficient  $\alpha_i$  and then they are summed [1]. The same process (illustrated in Fig. 1) applies for GMRC with different weighting coefficient  $\alpha_i$ .



Fig. 1 GMRC combiner

Assuming that the original transmitted signal is x(t), the output signal after applying GMRC is

$$r_{GMRC}(t) = \sum_{i=1}^{L} \alpha_i r_i(t) = \sum_{i=1}^{L} \alpha_i \left( \gamma_i x(t) + n_i(t) \right)$$
 (1)

where  $\gamma_i$  and  $n_i(t)$  are respectively the fading envelope and the additive noise of the  $i^{th}$  path. The weighting coefficients are:

$$\alpha_{i} = \frac{g(\gamma_{i})}{\sqrt{\sum_{i=1}^{L} g^{2}(\gamma_{i})}}$$
 (2)

Using the polynomial kernel  $g(\gamma_i) = \gamma_i^{n/2}$ , the weights become

$$\alpha_i = \frac{\sqrt{\gamma_i^n}}{\sqrt{\sum_{i=1}^L \gamma_i^n}},\tag{3}$$

$$\sum_{i=1}^{L} \alpha_i^2 = 1. \tag{4}$$

If n = 2 the coefficients are

$$\alpha_i = \frac{\gamma_i}{\sqrt{\sum_{i=1}^{L} \gamma_i^2}},\tag{5}$$

and MRC is thus obtained.

We now introduce a new performance metric, the signal-to-scattering-noise ratio (SSNR), that captures the "amount" or "degree" of fading. The SSNR is defined as

$$SSNR = \frac{E(\beta^2)}{\sigma_{\beta^2}}$$
 (6)

SSNR corresponds to the reciprocal of speckle contrast in active radar imagery [5-12]. A large SSNR signifies that the signal's power level fluctuations are small relative to the mean signal's power strength, indicating "reliable" communication.

For GMRC, the fading power is

$$\gamma_{\text{GMRC}}^{2} = \frac{\left(\sum_{i=1}^{L} \gamma_{i}^{1+n/2}\right)^{2}}{\sum_{i=1}^{L} \gamma_{i}^{n}}$$
(7)

In order to optimize (maximize) the SSNR over n, the SSNR was plotted versus the parameter n for different types of fading power. The SSNR plot reached its maximal for the values of n in the domain [1.3, 3] for Rayleigh, Rician, and Nakagami-m fading. We later established that this result applies to any type of fading statistics [5-12].

A Conjecture: Although no proof of this result is established, it is left as a conjecture for researchers to prove in the future. We believe though that it is unlikely that a solid theoretical proof exists for this conjecture and that only a

disproof in the form of a simulated counter-example could be presented, in the unlikely case where someone shows the existence of a fading stochastic model for which the conjecture fails. We claim for now that the same performance is achieved for all values of n between 1.3 and 3. Fig. 2 shows a sample of the SSNR plot versus n for a Rayleigh fading with average diffuse power equal to 10 and 4 diversity antennas.



Fig. 2 SSNR versus n

Since MRC is sensitive to channel estimation errors, the cases where  $n \le 2$  are of special interest. For imperfect channel state information, estimation errors will occur on the channel coefficients. When applying MRC, we would be effectively squaring estimation error terms, whereas if we tune n to a value less than 2 (but more than 1), the estimation error will have a less significant effect.

Accordingly, with perfect channel state information, n = 1.3 (GMRC) and n = 2 (MRC) produce the same BER, whereas with imperfect channel state information, n = 1.3 outperforms n = 2.

A Challenge: Let us examine the term  $\gamma^{\sigma=1.3}$ . It is quite remarkable that the exponent of value 1.3 appears more than once in our study. This leads us to believe that  $\sigma$  is a Lyapunov characteristic exponent. Although we cannot prove it (yet another conjecture), we expect the dimension of our model to be a Lyapunov fractal dimension of order 1.3. By comparison, the dimension of the MRC model is 2, which is a physical representation of the fading power. A challenging task is to find the physical meaning of the Lyapunov fractal dimension 1.3 in the GMRC model.

We also suggested in prior research to make use of a nearoptimal diversity technique termed RMSGC (root-meansquare gain combining) [13-18] when an ideal channel environment is present. This is because RMSGC does not require estimation of the channel coefficients, at the cost of greater implementation complexity, which was proven in a sub-sequent study to be feasible using an efficient FPGA implementation [13].

## III. FPGA IMPLEMENTATION

In the following proposed design, reference is made to the functional block diagram of the SIMO-GMRC processor of Fig. 1. After the data is received, analog-to-digital conversions should take place on all channels. The data is then digitally processed using FPGA technology. As explained previously, the optimal exponent would be around 1.3. Therefore, a hardware design of the input raised to the power 1.3 is implemented on FPGA technology using Altera Cyclone EP1C6F256 due to its availability in our labs. An 8-bit input is fed into the exponent unit and a 10-bit output is generated as shown in Figure 3. A detailed explanation of the hardware part is given below.



Fig. 3 Input/output interface of the exponent unit

The proposed implementation calculates  $x^{1.3125}$  rather than  $x^{1.3}$ . The reason behind using 1.3125 for the exponent is to write the exponent in fraction form with power of two in the denominator, i.e. 1.3125 = 21/16. The formula used to calculate  $x^{1.3125}$  is derived as follows. Since 21/16 = 1 + 1/4 + 1/16, then  $x^{1.3125}$  can be rewritten as  $x^{(1+1/4+1/16)} = x.x^{1/4}.x^{1/16}$ . Therefore, by taking the 4<sup>th</sup> root of x, along with the  $16^{th}$  root of x, multiply both roots, then multiply by x, the final result is achieved.

An important feature of our implementation is that even though we are calculating  $x^{1.3125}$ , the actual results is much closer to  $x^{1.3}$ . This is due to careful padding of zeros of the initial input as well as intermediate results, followed by truncation of the square root units' results. The number of zero padding is done through trial and error until a desirable result is achieved.

In order not to clutter the figure, the FPGA implementation of the exponent is shown in two separate figures in a sequential order. Figure 4 shows the intermediate calculation of  $y = x\sqrt{\sqrt{x}}$ . First, x is fed into a square root unit after padding it with 8 zeros, then the results is fed into another square root unit after padding 4 zeros. Finally the result is multiplied by x. Figure 5 shows the final result, where  $\sqrt{\sqrt{x}}$  is fed into two square root units with 4 zeros padding in order to get  $\sqrt{\sqrt{x}}$ . After this both results are multiplied in



Fig. 4 Partial result of the exponent calculation



Fig. 5 Second part of the exponent design

In order to achieve high throughput, the exponent unit is pipelined. The overall design consists of sixteen pipeline stages, where each square unit is divided into 4-stage pipeline. Moreover, the intermediate multiplication is performed in parallel with the operation of the third square root unit. After an initial latency of sixteen cycles, every clock cycle an output is generated as shown in Figure 6. This leads to a very high throughput at a very high clock rate.



Fig. 6 Simulation results

As shown in Figures 3 - 6, the main components of our design are the square units and the multipliers. Since the current FPGA technology provides multipliers as hardwired components inside the chips, they can run at very high clock speed. However, the performance bottleneck of the FPGA implementation is the square root unit. After analyzing the timing reports generated by Altera's Quartus II software, we found, as expected, that the timing limitation was due to the square root unit. This limited the clock speed to around 50 MHz. After experimenting with different pipelining stages for the square root unit, we found that four-stage pipeline is the optimal number of stages for each square root unit. This leads to an overall 16 pipeline stages. As shown in Table I, we were able to improve the clock speed to more than 150 MHz. Moreover, the table also shows that the total logic elements consumed were only 426 out of 5980; a very small fraction of the total resources of a medium size chip.

TABLE I IMPLEMENTATION RESULTS

| TABLE I INI LEMENTATION RESCETS |                      |
|---------------------------------|----------------------|
| Total Logic Elements            | 426 out of 5980 (7%) |
| Max. clock frequency            | 155.98 MHz           |
|                                 | (period = 6.411 ns)  |

## IV. CONCLUSIONS

In this paper, we proposed an FPGA implementation of a novel GMRC receiver diversity combining scheme for wireless SIMO channels. GMRC was previously proven to be superior to MRC [1] which has been widely accepted as "the" optimal combiner ever since Gans' infamous paper in 1971 [19]. Under all fading statistics and channel conditions, GMRC either outperforms MRC or yields identical performance. There is no added complication associated with GMRC as it simply transforms the estimated fading amplitudes using an arithmetic kernel with lower exponent power (n = 1.3) than MRC (n = 2). The only known work to improve the performance of MRC is that of Zhu and Gunawan in 2000 in which they proposed a modified controlled MRC scheme which incorporates power control for multi-carrier CDMA systems [20]. Their work did not establish a generalized supra-optimal diversity combiner and remained within the limited simulation scope of their specific MC-CDMA system.

The practical feasibility of GMRC was demonstrated in this paper by verifying that the proposed hardware implementation was efficient both in terms of speed and area. Moreover, we were able to achieve a clock frequency above 150 MHz. This high speed was possible due to sub-dividing the complex operations into smaller computation stages and pipelining all the stages. Without a pipeline scheme, standard techniques would lead to each sample going through several computation stages before the final output is produced, whereas with pipelining techniques there will be overlapping in computation leading to a single sample per clock (6.411 ns per sample). This limit is more than sufficient for practical signal processing with sampling rates set at the fundamental Nyquist rate. Future extension of the design includes implementing a detector and interfacing a wireless BPSK transceiver to the FPGA

The results of this paper and our work in [1] compel us to recommend GMRC as a replacement for MRC in SIMO diversity receivers. The new GMRC architecture and recommendations have not been yet tested in real world. But, with the widely available wireless communication components, a feasibility test should not be difficult to conduct and it will be left as additional future work.

# ACKNOWLEDGMENT

The authors acknowledge the financial support provided by a research grant from C.N.R.S., Bir Hassan, Beirut, Lebanon.

# REFERENCES

- [1] J. Daba, J. Dubois, R. Minkara, "Generalized Maximal Ratio Combining as a Supra-optimal Receiver Diversity Scheme," *ICCESSE* 2010 -*International Conference on Computer, Electrical, and Systems Science,* and Engineering, Paris, France, July 2010.
- S. Brown and J. Rose, "Architecture of FPGAs and CPLDs: Tutorial," *IEEE Design & Test of Comp.*, vol. 13(2), pp. 42-57, 96.
- $[3] \quad http://www.fpga-faq.org/FPGA\_Boards.shtml$

### World Academy of Science, Engineering and Technology International Journal of Electronics and Communication Engineering Vol:4, No:8, 2010

- [4] L. Bylanger, "Combining DSPs and FPGAs in Next-Generation Multimode Wireless Handset Designs," DSP-FPGA.com Magazine, May. 2007.
- [5] J. Daba and M. Bell, "Statistics of the Scattering Cross Section of a Small Number of Random Scatterers," *IEEE Trans. on Antennas and Propagation*, August 1995.
- [6] J. S. Daba and M. R. Bell, "Statistical Distributions of Partially Developed Speckle Based on a Small Number of Constant Scatterers with Random Phase," *IEEE International Geoscience and Remote Sensing Symposium*, California Institute of Technology, Pasadena, CA, USA, August 8-12, 1994.
- [7] A. Abdi, S. Nader-Esfahani, J. Daba and M. R. Bell, "Comments on Statistics of the Scattering Cross Section of a Small Number of Random Scatterers," *IEEE Transactions on Antennas and Propagation*, vol. 48, no. 5, pp. 844-845, May 2000.
- [8] J. Dubois and P. Jreije, "Advanced Stochastic Models for Partially Developed Speckle," 5th International Conference on Computer, Electrical, and Systems Science, and Engineering (CESSE 2008), organized by the World Congress on Science, Engineering, and Technology, Vienna, Austria, August 2008.
  [9] J. Dubois, "Poisson Modulated Stochastic Model for Partially
- [9] J. Dubois, "Poisson Modulated Stochastic Model for Partially Developed Multi-Look Speckle," American Conference on Applied Mathematics, Harvard University, Cambridge, MA, USA, March 2008.
- [10] J. Dubois, "Scattering Statistics of Doppler Faded Acoustic Signals Using Speckle Noise Models," *IEEE International Conference on Direct* and Inverse Problems of Electromagnetic and Acoustic Wave Theory, Lviv, Ukraine, Sept. 2003.
- [11] J. Daba and M R. Bell, "Statistics of the Scattering Cross Section of a Collection of Constant Amplitude Scatterers with Random Phase," West Lafayette, IN, Purdue University, School of Electrical Engineering Technical Report, TR-EE 94-25, July 1994.
- [12] J. Daba and P. Jreije, "Probability Distributions for Multiplicative Noise in Internet Associated Wireless Cells and in Speckled Images," Conference on Applied Mathematical Problems, Ukrainian Mathematical Congress, Kiev, Ukraine, August, 2009.
- [13] R. Ayoubi, J. Dubois and O. Abdul-latif, "FPGA Implementation of a Novel Receiver Diversity Combining Technique," *IEEE International Conference on Signal Processing and Communications*, Dubai, Nov. 2007
- [14] O. Abdul-Latif and J. Dubois, "LS-SVM Detector for RMSGC Diversity in SIMO Channels," *IEEE International Conference on Information Sciences, Signal Processing and its Applications*, Dubai, Feb. 2007.
- [15] J. Dubois and O. Abdul-Latif, "Novel Diversity Combining in OFDM-Based MIMO Systems," American Conference on Applied Mathematics, Harvard, Cambridge, MA, USA, 2008.
- [16] O. Abdul-Latif and J. Dubois, "Performance of UWB System in a Partially Developed Fading Channel with CCI," 5<sup>th</sup> IEEE GCC Communication and Signal Processing Conference, Kuwait, March 2009.
- [17] J. P. Dubois, "Recent Advances in Wireless MIMO Sytems with Improved Receiver Diversity Combiners," in *Annual Review of Communications: Volume* 60, A. Sulluchuco, Editor, International Engineering Consortium (IEC), Chicago, USA, ISBN: 978-1-931695-59-6, pp. 513 – 527, 2007.
- [18] J. Daba and P. Jreije, "A Novel Receiver Diversity Combining Technique for Internet-Based 4G Wireless Communication", The 9<sup>th</sup> International Conference on Signal Processing, Robotic and Automation (ISPRA'10), University of Cambridge, Cambridge, United Kingdom, February, 2010.
- [19] M. J. Gans, "The effect of Gaussian error in maximal ratio combiners," IEEE Transactions on Communication Technology, Vol. COM-19, pp. 492-500, Aug. 1971.
- [20] Y. Zhu and E. Gunawan, "Performance of MC-CDMA system using controlled MRC with power control in Rayleigh fading channel," *Electronic Letters*, Vol. 36, No. 8, pp. 752-753, April 2000.