High Datarate Solutions for Next Generation Wireless Communication

ZHONGXIA (SIMON) HE

Microwave Electronics Laboratory

Microtechnology and Nanoscience (MC2)

CHALMERS UNIVERSITY OF TECHNOLOGY

Göteborg, Sweden 2013
Thesis for the degree of Doctor of Philosophy

High Datarate Solutions for Next Generation Wireless Communication

by

Zhongxia (Simon) He

何仲夏

Microwave Electronics Laboratory
Microtechnology and Nanoscience (MC2)
Chalmers University of Technology
Göteborg, Sweden 2013
High Datarate Solutions for Next Generation Wireless Communication
ZHONGXIA (SIMON) HE

Copyright © ZHONGXIA (SIMON) HE, 2013. All rights reserved. E-mail: zhongxia@chalmers.se


Doktorsavhandlingar vid Chalmers Tekniska Högskola
Ny serie nr 3639
ISSN 0346-718X

Technical Report MC2-268
ISSN 1652-0769
Microwave Electronics Laboratory

Microtechnology and Nanoscience (MC2)
Chalmers University of Technology
SE-412 96 Göteborg, Sweden
Phone: +46 (0) 31 772 1000

Printed by Chalmers Reproservice
Göteborg, Sweden, Dec. 2013
To my family
Abstract

Next generation wireless communication systems are essential components in order to increase the capacity of existing digital networks. Channel bandwidth as wide as several gigahertz is generally available at millimeter wave frequencies. However, it is a challenge to design and implement millimeter wave transceivers that can utilize such wideband effectively for the data transmission.

In this thesis, the feasibility of high data rate baseband modulators and demodulators (modems) that support various digital modulation formats is investigated. Novel modem structures for OOK, D-QPSK/QPSK, 8-PSK and 16-QAM modulation are presented, and six proof-of-concept modems are designed and verified using these new structures.

The work mentioned in this thesis constitutes an efficient portfolio of baseband modem solutions for different wireless communication applications. These modem solutions are optimized for high capacity based on the hardware components that exist today. These solutions may in principle support data rates of up to 100 Gbps, given enough bandwidth and more advanced semiconductor devices.

Keywords: OOK, D-QPSK, QPSK, 8-PSK, 16-QAM, Wireless Data Center, High Definition Video Transmission, Mobile Backhaul, Modem, MMIC, HBT, mHEMT, Differential Encoder, FPGA, E-band, Point-to-point Radio, Injection Locking, Harmonic Generation, Carrier Recovery.
List of Appended Papers

Appended Publications

This thesis is based on work contained in the following papers:


Other Publications

The following papers are not included in this thesis. The content partially overlaps with the appended papers or is out of the scope of the thesis.


Patents


Abbreviations and Acronyms

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>Modem</td>
<td>Modulator and demodulator</td>
</tr>
<tr>
<td>QoS</td>
<td>Quality of service</td>
</tr>
<tr>
<td>UE</td>
<td>User equipment</td>
</tr>
<tr>
<td>RAN</td>
<td>Radio access network</td>
</tr>
<tr>
<td>CN</td>
<td>Core network</td>
</tr>
<tr>
<td>BS</td>
<td>Base station</td>
</tr>
<tr>
<td>HetNet</td>
<td>Heterogeneous Network</td>
</tr>
<tr>
<td>RRU</td>
<td>Remote radio unit</td>
</tr>
<tr>
<td>DCN</td>
<td>Data center network</td>
</tr>
<tr>
<td>OOK</td>
<td>On off keying</td>
</tr>
<tr>
<td>D-QPSK</td>
<td>Differential quadrature phase shift keying</td>
</tr>
<tr>
<td>SNR</td>
<td>Signal-to-noise ratio</td>
</tr>
<tr>
<td>QPSK</td>
<td>Quadrature phase shift keying</td>
</tr>
<tr>
<td>QAM</td>
<td>Quadrature amplitude modulation</td>
</tr>
<tr>
<td>ADC</td>
<td>Analog to digital converter</td>
</tr>
<tr>
<td>DSP</td>
<td>Digital signal processor</td>
</tr>
<tr>
<td>FoM</td>
<td>Figure of merit</td>
</tr>
<tr>
<td>MMIC</td>
<td>Monolithic microwave integrated circuit</td>
</tr>
<tr>
<td>FDD</td>
<td>Frequency division duplexing</td>
</tr>
<tr>
<td>DVI</td>
<td>Digital video interface</td>
</tr>
<tr>
<td>RF</td>
<td>Radio frequency</td>
</tr>
<tr>
<td>IF</td>
<td>Intermediate frequency</td>
</tr>
<tr>
<td>BPF</td>
<td>Band pass filter</td>
</tr>
<tr>
<td>LNA</td>
<td>Low noise amplifier</td>
</tr>
<tr>
<td>LO</td>
<td>Local oscillator</td>
</tr>
<tr>
<td>PLL</td>
<td>Phase lock loop</td>
</tr>
<tr>
<td>Acronym</td>
<td>Definition</td>
</tr>
<tr>
<td>---------</td>
<td>------------</td>
</tr>
<tr>
<td>AWGN</td>
<td>Additive white Gaussian noise</td>
</tr>
<tr>
<td>EVM</td>
<td>Error Vector Magnitude</td>
</tr>
<tr>
<td>RMS</td>
<td>Root mean square</td>
</tr>
<tr>
<td>CW</td>
<td>Continuous wave</td>
</tr>
<tr>
<td>TWA</td>
<td>Travelling wave amplifier</td>
</tr>
<tr>
<td>FET</td>
<td>Field effect transistor</td>
</tr>
<tr>
<td>ECP</td>
<td>Emitter coupled pair</td>
</tr>
<tr>
<td>LPF</td>
<td>Low-pass filter</td>
</tr>
<tr>
<td>BER</td>
<td>Bit-error rate</td>
</tr>
<tr>
<td>ROM</td>
<td>Read only memory</td>
</tr>
<tr>
<td>HBT</td>
<td>Heterojunction bipolar transistor</td>
</tr>
<tr>
<td>FPGA</td>
<td>Field programmable gate array</td>
</tr>
<tr>
<td>PPL</td>
<td>Parallel prefix layer</td>
</tr>
<tr>
<td>ILO</td>
<td>Injection locking oscillator</td>
</tr>
<tr>
<td>SH-ILO</td>
<td>Second harmonic injection locking oscillator</td>
</tr>
<tr>
<td>ILVCO</td>
<td>Injection locking voltage controlled oscillator</td>
</tr>
<tr>
<td>DRIVO</td>
<td>Dielectric resonator injection locking oscillator</td>
</tr>
<tr>
<td>PRBS</td>
<td>Pseudo random binary sequence</td>
</tr>
<tr>
<td>STR</td>
<td>Symbol time recovery</td>
</tr>
<tr>
<td>CDR</td>
<td>Clock data recovery</td>
</tr>
<tr>
<td>CR</td>
<td>Carrier recovery</td>
</tr>
<tr>
<td>LUT</td>
<td>Look-up table</td>
</tr>
<tr>
<td>AWG</td>
<td>Arbitrary waveform generator</td>
</tr>
<tr>
<td>BERT</td>
<td>Bit error rate tester</td>
</tr>
<tr>
<td>mHEMT</td>
<td>Metamorphic high electron mobility transistor</td>
</tr>
</tbody>
</table>
Contents

Abstract i

List of Publications iii

Abbreviations and Acronyms vii

Contents ix

1 Introduction 1

1.1 The Expanding Digital Universe 1

1.2 Enhancement of Current Digital Communication Networks using Next Generation Wireless Solutions 2

1.2.1 Improvement of the Mobile Network Capacity using High Data Rate Radios 3

1.2.2 Capacity-enhanced Wireless Data Center 4

1.2.3 Future Video Broadcasting Networks with High Data Rate Radios 6

1.3 Thesis Motivation 8

1.3.1 Wireless Transceiver Requirements for Various Applications 8

1.3.2 Motivation: Advanced Modem Solutions for Future Communication Systems 10

1.4 Thesis Contribution 10

1.5 Thesis Outline 12

2 Theoretical Background 13

2.1 Single Carrier Wireless Data Transmission 13

2.2 Front-end Architecture 15

2.2.1 Heterodyne Receiver 15
2.2.2 Homodyne Receiver ........................................... 17
2.2.3 Interface between Front-end and Baseband ............... 18
2.3 Baseband Architecture and Modulation Techniques ........... 18
  2.3.1 Different Formats of Baseband Signals ................. 18
  2.3.2 Digital Modulation Schemes ............................. 19
2.4 Figure of Merit (FoM) ......................................... 20
  2.4.1 Spectrum Efficiency and Bit Error Probabilities ....... 20
  2.4.2 Error Vector Magnitude (EVM) of a Received Constellation 21
  2.4.3 Input Sensitivity of Demodulator ....................... 23
  2.4.4 Energy Efficiency of the Modem ......................... 23

3 Implementation of OOK Modulator and Demodulator 25
  3.1 MMIC-based OOK Modulator ................................. 25
    3.1.1 Different Approaches of OOK Implementation .......... 25
    3.1.2 Overview of State-of-the-art OOK Modulator Implementations ......... 27
    3.1.3 Latch-based High Datarate OOK Modulator .......... 27
  3.2 MMIC-based OOK Demodulator ............................... 31
    3.2.1 OOK Demodulation Solutions ............................ 31
    3.2.2 mHEMT Active Envelope Detector ..................... 31
    3.2.3 Detector Measurement ................................. 32

4 High Data Rate QPSK Modem Solutions 35
  4.1 QPSK Modulation and Detection Techniques .................. 36
    4.1.1 QPSK Modulation and Coherent Detection ............. 37
    4.1.2 D-QPSK Modulation and Non-coherent Detection ........ 37
  4.2 Overview of Reported QPSK Baseband Transceivers .......... 38
  4.3 Fixed Rate D-QPSK Modem Implementation .................... 38
    4.3.1 D-QPSK Modulator Implementation ...................... 38
    4.3.2 D-QPSK Demodulator Implementation ................... 43
    4.3.3 5 Gbps D-QPSK E-band Radio Implementation .......... 45
  4.4 Multiple Data Rate D-QPSK Modem Implementation ............ 45
    4.4.1 Data Rate Limitation in a D-QPSK Demodulator ........ 45
    4.4.2 Proposed Multirate D-QPSK Modem ....................... 45
    4.4.3 Experimental Result of Multirate D-QPSK Modem ........ 48
  4.5 Coherent QPSK Receiver Implementation ...................... 49
    4.5.1 Overview of Injection Locked Oscillator-based Receivers . 49
    4.5.2 Injection Locking Principle .......................... 50
## High Data Rate 8-PSK Demodulator Solutions 57

### 5.1 Different Methods for 8-PSK Carrier Recovery  
#### 5.1.1 Multi-channel Digital Demodulation  
#### 5.1.2 Frequency Multiplication based Demodulation  
#### 5.1.3 Proposed Selective Harmonic Generator-based 8-PSK Demodulation  

### 5.2 The Proposed 8-PSK Carrier Recovery  
#### 5.2.1 Comparator-based Harmonic Generator  
#### 5.2.2 Harmonic Generation from a Modulated Signal  
#### 5.2.3 Static Frequency Divider  
#### 5.2.4 Carrier Recovery Principle  

### 5.3 8-PSK Demodulator Implementation  
### 5.4 8-PSK Demodulator Measurement Result  
#### 5.4.1 Harmonic Generator Test  
#### 5.4.2 Phase Noise Measurement  
#### 5.4.3 Demodulated 8-PSK Signals  

## Hardware Efficient 16-QAM Demodulator 69

### 6.1 High Spectrum Efficiency Transmission System Overview  
### 6.2 Sampling Rate and Symbol Time Recovery  
### 6.3 The proposed “Single Sample per Symbol” Baseband Receiver  
#### 6.3.1 The Proposed Analog STR  
#### 6.3.2 The FPGA-based Carrier Recovery (CR)  
### 6.4 16-QAM Baseband Receiver Measurement  
#### 6.4.1 Analog STR Block Test  
#### 6.4.2 FPGA based CR Block Test  

## Conclusion and Future Work 83

### 7.1 Conclusion  
### 7.2 Future Work  
#### 7.2.1 Improving the Energy Efficiency of 8-PSK and 16-QAM Modems  
#### 7.2.2 Towards a Complete System Solution  
#### 7.2.3 Analog Signal Processing in other Applications  

## Acknowledgments 87

## Bibliography 88
Chapter 1

Introduction

1.1 The Expanding Digital Universe

We all live in a “digital universe”. The digital universe is made up of images and videos on mobile phones uploaded to YouTube, digital movies populating the pixels of our high-definition TVs, banking data swiped in an ATM, security footage at airports and major events such as the Olympic Games, subatomic collisions recorded by the Large Hadron Collider (LHC) at CERN (based on these records, the 2013 Nobel winners Higgs and Englert’s theories of particles acquiring mass have been verified), transponders recording highway tolls, voice calls through digital phone lines and texting as a widespread means of communications.

International Data Corp (IDC) has carried out a digital universe study by measuring all digital data created, replicated and has made predictions concerning the size of that universe up to the end of the decade. Citing their result, Fig. 1.1 shows that the digital universe accounted for 1,200 exabytes in 2010 alone. This universe is expending dramatically and is expected to have a 50-fold growth within this decade. To explain such expansion, Eric Schmidt, former CEO of Google, announced in a conference in August 2010 that between the beginning of time and 2003, humanity generated roughly five exabytes of data, whereas we now produce the same volume of bits every two days. “The information explosion is so profoundly larger than anyone ever thought,” said Schmidt, “Five exabytes is more than 200,000 years of DVD-quality video [1]”. By the end of this decade, the digital universe is expected to exceed 40,000 exabytes in 2020.

This digital universe exists in the worldwide digital communication networks. Only a fraction of the data is stored in a stationary manner and the majority of it is constantly transferred through the network. By 2020, only 13% (5,208 exabytes) of the data universe could be stored in “the cloud”, which consists of the worldwide distributed computing networks that have the ability to run pro-
grams on many connected computers at the same time, including but not limited to the internet. On the other hand, the remaining 87% of the digital universe is transient—phone calls that are not recorded, digital TV images that are watched and not saved, packets temporarily stored in routers, digital surveillance images purged from memory when new images come in, and so on.

The throughput capacity of the digital communication network plays a critical role in supporting the expanding digital universe. It is important to increase the capacity (measured by their maximum data rate) of digital communication networks.

### 1.2 Enhancement of Current Digital Communication Networks using Next Generation Wireless Solutions

This massive transient data challenges the capacity of all digital communication networks—mobile networks, distributed computer networks, sensor networks, video broadcasting networks, etc. One solution for increasing the data rate of these net-
works is to enhance the current networks’ structures with varied high data rate wireless links. Several examples are given in the following sections.

1.2.1 Improvement of the Mobile Network Capacity using High Data Rate Radios

The Demand of Mobile Network Capacity

As the digital universe expands, the data traffic in mobile networks is growing exponentially. This is expected to grow by a factor of 12 between 2012 and 2018 [2]. This increase in mobile traffic puts a huge demand on the mobile communication capacity and quality of service (QoS) of each node in the mobile networks. Especially important is that for the nodes in dense urban areas, where the mobile capacity is estimated to be 25 Gbps/km², assuming an average bit rate of 1 Mbps per user during busy hours and a typical user density of 25,000 users/km² in dense urban regions [3].

Approaches to Improve Network Capacity

The radio access network (RAN) is the network that connects user equipment (UE) to the core network (CN). The mobile base station (BS) is an important part of the RAN since it provides direct access for UEs. Another important component in the RAN is the mobile backhaul, which connects different BSs to form the RAN. However, traditional RAN architecture is not able to efficiently support the flexibility required by capacity deployments, as described in [4]. To meet the capacity requirement, one solution to this issue is to deploy a Heterogeneous Network (HetNet), in which one or more minimized BSs (so-called small cells) are embedded on a conventional cellular network to enhance the capacity. This trend is expected to intensify during the next 3–5 years, paving the way for next generation (5G) ultra-dense RAN, where the number of BSs would become 10–100 times larger than today. A key enabler for this RAN spanning is low-cost backhaul solutions. Currently, microwave and fiber optical communication links are the main solutions for backhauling. The main limitation for a microwave solution is its limited capacity compared to the fiber solution. However, deployment flexibility and ease of relocation often make the microwave solution more attractive than fiber.

Evolutions Towards Heterogeneous Network (HetNet)

Fig. 1.2 illustrates a future mobile network in an urban scenario. The location of base station A has direct access to a fiber connection, thus fiber backhaul is used to support the mobile traffic through this BS. When the capacity requirement increases, base station B needs to be set up to share the traffic load. However, a
fiber connection is not available at that location, but instead a microwave radio link can form a wireless backhaul to support the need for BS densification. Both base station A and B are macro cell BSs that can cover a large area, however, the performance of these macro cell BSs would reduce dramatically for indoor UEs or UEs blocked by buildings. For certain locations such as offices and bus stations, adding small cell BSs can be a good complement to the macro cells. These small cell BSs can be catalogued into micro cells, pico cells or low-power remote radio units (RRUs) by their size. Though these small cell BSs cover a smaller area than macro cell BSs, they deliver high per-user capacity and consume less power than traditional BSs. Small cell BSs can use fiber backhaul if the location allows, however, most small cell BSs will rely on microwave for their backhaul.

1.2.2 Capacity-enhanced Wireless Data Center

Capacity Limitation of Current Data Centers
Data centers play a key role in the expansion of “the cloud”, but the efficiency of the data center networks (DCNs) is limited. Typical data center architecture is shown in Fig. 1.3(a). The network servers are stacked up in server racks. On top of these racks, there are switches connecting all the servers in the rack. These racks are connected together by aggregate switches, and the core switches are used for aggregate-switch interconnections. This pyramid-like structured data center is facing a bottleneck in scaling up in capacity due to three reasons:
1.2. ENHANCEMENT OF CURRENT DIGITAL COMMUNICATION NETWORKS USING NEXT GENERATION WIRELESS SOLUTIONS

![Diagram of conventional and wireless data centers](image)

Figure 1.3: Revolution from conventional data center to a wireless data center

- Data exchange between racks is aggregated in high-level switches, such as the aggregate switch and the core switch. Thus the core switch capacity becomes bottleneck for DCN exchange rate upgrading.

- Massive wires are needed to provide interconnection between servers and racks. The data center is full of these wires and switches, the limitation for capacity is simply due to lack of space.

- The DCN interconnection is based on massive and fairly expensive as well power-hungry switches. Vast amounts of energy is used by these switches to route signals in and out of the servers and send them off via wire to other servers based on their electronic addresses. Increasing capacity in a pyramid-like DCN means increasing the number of servers and switches, which increases the overall power consumption of the DCN as well as its heat dissipation. At a certain point, the capacity would be limited by either the available power or the cooling ability.
Mitigating the Capacity with a Wireless Data Centers

Wireless networking as a complement technology to Ethernet has the flexibility and capability to provide feasible approaches to handle the problem. Wireless data center networking is first introduced in [6], in which data exchange is established by adding wireless links between servers to alleviate the congestion problem of hot racks and to minimize the maximum transmission time. A feasible architecture of wireless data center is shown in Fig. 1.3(b)[7]. Servers are mounted vertically in cylindrical racks several tiers high and one wedge-shaped slice of any tier represents a server. On each server module there are two wireless transceivers, one for inner rack interconnection and the other for inter-rack interconnections. With the wireless link formed by these transceivers, data exchange and signal routing can occur directly at the server level. This can ease the data rate aggregation problem, thus this enables the possibility of increasing the DCN capacity.

1.2.3 Future Video Broadcasting Networks with High Data Rate Radios

Video Broadcasting Networks Need More Flexibility

Limited by the capacity of traditional wireless communication systems, video broadcasting networks are mainly built over cable and fiber networks. Thus the video broadcasting equipment deployment is limited at the locations where fiber or cable infrastructure is available. However, such limitation becomes more and more unacceptable in current broadcasting applications. As an example, Fig. 1.4 illustrates a proposed infrastructure solution for video broadcasting when a music concert is going to be held in a stadium. Video broadcasting in such events can be classified into several types, based on scale: at the stage scale, as is shown in the top left in Fig. 1.4. In the center of the stadium there is a stage, several large dimension display screens may be set up around the stage. These screens are used to display live video streams from the field cameras to the audience. To prepare for this deployment, fibers need to be set up between cameras and the display screen, which takes a long time before the setup is done. This also limits the selection of camera locations to wherever the fiber can reach.

At the stadium scale, modern stadiums nowadays have built-in fiber or cable ring networks throughout the construction. This ring network is designed for supporting video data transmission for live video broadcasting, as the green line illustrates in Fig. 1.4. There are only limited access points to this internal ring network, thus the cameras can only be located around these access points. For some events, this constraint on camera positions would limit the broadcasting results.
Figure 1.4: Video streaming for event broadcasting

At the scale of an entire broadcasting network, the video signals from various cameras in the stadium will aggregate in the stadium ring network, then being processed by a media center. The media center, normally set outside the stadium, is where the video signals will be processed and sent to a nearby metro network. Via metro networks, the broadcasting signal is transferred to TV stations and the program can be broadcast to entire countries or even worldwide. It is possible that there are no available metro network access points near some of the stadiums. In that case, the broadcasting companies would have to pay a great price to set up a temporary link to access the metro network.

**Improve Broadcasting Network Flexibility with Wireless Radio Links**

Recently, several solutions have been proposed for wireless video streaming. Some solutions focus on reducing the data rate with video compression, others propose using several low data rate wireless transceivers simultaneously to obtain enough capacity. The drawback of these solutions is that they introduce extra latency during transmission. The result is that the video becomes asynchronous with the actions on the stage. A microwave and millimeter wave-based radio link for video streaming has been demonstrated in [8]. This shows that it is possible to transfer uncompressed video using a low latency modulation format over high available bandwidth on these bands.
Such technology can be applied at different scales in the broadcasting network: at the stage scale, a wireless link can be used for video streaming between cameras and the display screen; at the stadium scale, wireless radios can provide access for cameras at a distance to the stadium ring network; at the scale of an entire broadcasting network, a media center can obtain access to the metro network over a pair of microwave links.

1.3 Thesis Motivation

1.3.1 Wireless Transceiver Requirements for Various Applications

As mentioned in previous sections, high data rate wireless communication systems are needed in various applications. The nature of these applications should be taken into account when designing these wireless systems. In particular, several key factors shall be considered: in Table. 1.1, eight different design parameters of a transceiver system for three different application examples are summarized. Several transceiver design constraints in different aspects are illustrated in Fig. 1.5. Each axis in the figure represents one design constraint aspect and there is a point marked for each application on each axis. By connecting the points of an application at different axes, one can obtain a closed curve that reflects the features of that application. Three application curves are drawn in the figure; it is clear that different application areas prioritize these design parameters in different manners.
1.3. THESIS MOTIVATION

<table>
<thead>
<tr>
<th>Constraint</th>
<th>Microwave Backhaul</th>
<th>Wireless DCN</th>
<th>Video Streaming</th>
</tr>
</thead>
<tbody>
<tr>
<td>Hop Length</td>
<td>1-3 km</td>
<td>0.1-3 m</td>
<td>500-800 m</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>5-10 GHz</td>
<td>10-50 GHz</td>
<td>3-8 GHz</td>
</tr>
<tr>
<td>Latency</td>
<td>5 ( \mu )s</td>
<td>( \leq 0.5 ) ms</td>
<td>( \leq 10 ) ms</td>
</tr>
<tr>
<td>Size</td>
<td>( \leq 0.1 ) m(^3)</td>
<td>( \leq 5 ) cm(^3)</td>
<td>( \leq 0.1 ) m(^3)</td>
</tr>
<tr>
<td>Data Rate</td>
<td>( \leq 50 ) Gbps</td>
<td>( \leq 100 ) Gbps</td>
<td>( \leq 10 ) Gbps</td>
</tr>
<tr>
<td>Noise Tolerance</td>
<td>High</td>
<td>Low</td>
<td>Moderate</td>
</tr>
<tr>
<td>Cost</td>
<td>Moderate</td>
<td>Low Cost</td>
<td>Moderate</td>
</tr>
<tr>
<td>Power</td>
<td>( \leq 50 ) W</td>
<td>( \leq 1 ) mW</td>
<td>( \leq 100 ) W</td>
</tr>
</tbody>
</table>

Table 1.1: Constraints for different wireless link applications

Figure 1.5: Wireless transceiver constraints for different applications
For microwave-based mobile backhaul applications represented by the red curve, a long hop length is required with strict latency constraint. It must also operate in an outdoor environment and be able to tolerate different climate conditions. To fulfill these requirements, compromises can be made in transceiver size and power consumption. The cost has the least priority as compared with other aspects.

For wireless DCN applications, a massive number of transceivers are required to equip a DCN. Thus the transceivers must be low-cost, compact and operate within the power budget. Fiber-comparable capacity is required, however the hop length is short and high operational bandwidth may be available.

Compared with the other applications mentioned above, the transceivers for video streaming/broadcasting have moderate requirements in all design parameters. This implies that the design for such application emphasizes the balance among these design parameters.

1.3.2 Motivation: Advanced Modem Solutions for Future Communication Systems

As mentioned above, high data rate communication systems need to be customized, in order to meet the different constraints and requirements of the application. However, it is not straightforward to upgrade a traditional wireless communication system to a high data rate. The reason here is that the components available on the market have limited performance and this becomes the bottleneck for capacity improvement.

To achieve high capacity as required by the applications we mentioned above, structure innovation is needed at the system level. In this thesis, the focus is on designing and implementing baseband modulators and demodulators (modems). The modem is an important part in a communication system, since it provides conversion between binary data and analog waveforms. The analog waveforms are physical signals carrying information, which are wirelessly transferred between transceivers. In this thesis, it has been an important motivation to look for modem design solutions that are useful for various applications. A number of different modems with various features need to be studied.

1.4 Thesis Contribution

This thesis consists of eight publications that describe modulator and/or demodulator implementations with various modulation formats. These publications are summarized and presented in Fig. 1.6, where they are sorted by their modulation formats and maximum data rate. On the left we can see the modem design with on
1.4. THESIS CONTRIBUTION

<table>
<thead>
<tr>
<th>Capacity (Gbps)</th>
<th>Modulation</th>
</tr>
</thead>
<tbody>
<tr>
<td>15</td>
<td>[A] [H]</td>
</tr>
<tr>
<td>10</td>
<td>[B]</td>
</tr>
<tr>
<td>5</td>
<td>[C]</td>
</tr>
<tr>
<td>2.5</td>
<td>[D]</td>
</tr>
<tr>
<td>10</td>
<td>[E]</td>
</tr>
<tr>
<td>5</td>
<td>[F]</td>
</tr>
<tr>
<td>2.5</td>
<td>[G]</td>
</tr>
</tbody>
</table>

Figure 1.6: Several high data rate modem designs discussed in this thesis

OOK modulation format, which has the lowest spectrum efficiency compared to other modulations. Paper [A] proposed a novel circuit structure for designing an OOK modulator in a bipolar process, which supports arbitrary data rates up to 14 Gbps. This result was state-of-the-art when the paper was published. An MMIC-based OOK demodulator is also present in [H].

However, the OOK modulation’s spectrum efficiency is so low that it is not practical for telecommunication applications. In the center of the figure, several designs have been presented using differential-quadrature phase shift keying (D-QPSK) modulation, which give higher spectrum efficiency than OOK. An FPGA based D-QPSK modem is presented in [G], which supports a fixed data rate of 2.5 Gbps. An E-band radio link based on fixed datarate 5-Gbps D-QPSK modem is also described in [F]. These D-QPSK modems can only operate at a single data rate, which is limited by the structure of their non-coherent receiver structures. However, this limitation is solved in [B], where a novel modulator encoding scheme is proposed. With this improvement, a millimeter-wave radio link can support multi-rate D-QPSK transmission and requires no additional modifications of the receiver hardware.

Non-coherent detection is easier to implement, but it suffers from 3 dB penalty in signal-to-noise ratio (SNR) as compared to a coherent receiver operating at the same BER threshold. In paper [D], the theory of injection locking is studied, and a coherent quadrature phase shift keying (QPSK) receiver based on an injection lock principle is presented. An injection lock VCO is used to recover the QPSK
carrier signal, which can work at any data rate up to at least 12 Gbps.

Improving spectrum efficiency would improve the capacity without requiring extra bandwidth. Thus it is important to explore the possibility of implementing a higher spectrum efficiency modulation than QPSK. A quadrature amplitude modulation (QAM) receiver is presented in paper [C], which can received 5 Gbps16-QAM signal. The receiver used a traditional analog-to-digital converter (ADC) and digital signal processor (DSP) platform for carrier recovery. This work is unlike other work reported, where the ADC sampling rate is relatively higher than the symbol rate. With a novel analog symbol time recovery circuit, the ADC in our receiver works at exactly the same rate as the symbol rate. The ADC and/or DSP are usually limited by the maximum receiver capacity, so a solution that requires a relatively low sampling rate relative to its symbol rate would increase the maximum capacity of a certain ADC and DSP based receiver.

1.5 Thesis Outline

The thesis is organized as follows: The need for increasing capacity in current digital communication networks is addressed in chapter 1. Several capacity enhancement solutions using next generation wireless transceivers are proposed. This gives the motivation of looking for high-data-rate modem solutions. In chapter 2, the theoretical background is described, including the concept of different modulations and their figure-of-merit (FoM), spectrum efficiency, and front-end architectures. Different digital modulation schemes are described as well. In chapter 3, a series of different monolithic microwave integrated circuit (MMIC) based OOK modulator structures are reviewed and then a novel structured OOK modulator is introduced. An OOK modulator design is implemented with such a structure and the test result for this is presented. An OOK demodulator is also discussed in this chapter. In chapter 4, both coherent and non-coherent QPSK modems are presented and discussed. First, 2.5 Gbps and 5 Gbps fixed data rate FPGA-based D-QPSK E-band links are present. Then a multi-rate D-QPSK radio link is described using a novel encoding scheme. For the coherent QPSK, an injection lock-based QPSK receiver is presented which can support an arbitrary data rate. In chapter 5, a 15 Gbps 8-PSK receiver is described, where a novel carrier recovery block design is discussed in detail. Further in chapter 6, a 5 Gbps 16-QAM receiver design is demonstrated, where a novel analogue symbol time recovery structure is described in detail. Finally, chapter 7 summarizes the thesis and future work is discussed.
Theoretical Background

In this chapter, the theoretical background of this thesis is described. Firstly, an overview of the structure of a single carrier wireless communication system is given. In this overview, the modem and frequency conversion parts are described in detail, which are the focus of this thesis. Secondly, different frequency conversion structures are presented, and their interfaces with the modem part are described. Thirdly, different modulation schemes are reviewed. Finally, the figures of merit for modem designs are introduced.

2.1 Single Carrier Wireless Data Transmission

A digital communication system transmits and receives binary data wirelessly over frequency channels. Single carrier means that there is only one carrier frequency in each channel. A single carrier wireless transceiver normally uses one frequency channel for transmitting and another channel for receiving, thus bidirectional communication can be made simultaneously. This is the so-called frequency division duplexing (FDD).

A typical block diagram of a single carrier transceiver is shown in Fig. 2.1. For different applications, wireless transceivers may take data from various data interfaces, such as optic fiber, Ethernet, digital video interface (DVI). At the bottom of Fig. 2.1, a data interface block provides the connectivity and ensures the data traffic transfer. To receive and transmit data over channels at microwave frequency bands, several blocks operating in either digital domain and/or analog domain are required. The blocks on the left side of Fig. 2.1 form a transmitter and the blocks on the right side form a receiver.

On the transmitter side, source coding is first applied to the input data stream. The purpose of source coding is to reduce the data rate of the data stream by com-
pressing it, which increases the capacity of the transmission system but makes it more vulnerable to the noise and interference. Then the binary stream passes the channel encoder, where redundancy bits are added for error-correlation in the receiver. Source decoding and channel decoding are used correspondingly to recover the data stream on the receiver side. Various source codes and channel codes can be used depending on the system requirements. However, the transmission systems discussed in this thesis contain neither source code nor channel code. In the systems mentioned in this thesis, data interface blocks are directly connected to the modulator and the demodulator. Such systems can be optimized for either noise tolerance or capacity if channel code or source code is added.

This thesis focuses on the baseband modulator and demodulator blocks, as well as their interfaces with front-end blocks. As it is shown in Fig. 2.1, the baseband modulator and demodulator blocks are mixed signal blocks. They have interfaces to both digital signals and analog signals. The modulator block generates analog waveforms, which consist of groups of digital bits that are input to the modulator, which in turn generates the so-called modulated signal. On the receiver side, a demodulator recovers the transmitted bits by classifying the received
2.2. FRONT-END ARCHITECTURE

The modem may not work directly at the radio frequency (RF), thus front-end blocks are used for frequency conversion. RF front-end is a generic term for all the circuitry between the antenna and intermediate frequency (IF)/baseband. The architectures of these front-end blocks define the interfaces to the baseband, thus the baseband modem structure may change depending on the choice of front-end.

A transmitter front-end takes the analog waveform generated by the modulator and converts it up to RF signal. A receiver front-end converts the RF signal down to a lower IF, which can be processed by the demodulator. The transmitter and receiver front-end can use separate antennas respectively, or share a single antenna. In the latter case, a duplexer block is needed in FDD systems.

2.2 Front-end Architecture

There are two types of front-end architectures: the homodyne structure and the heterodyne structure. The homodyne structure provides direct conversion between the baseband signal and the RF signal; the heterodyne structure provides an indirect conversion via an IF. Two examples of front-end receivers are given to illustrate different front-end architectures.

2.2.1 Heterodyne Receiver

The block diagram of a heterodyne receiver is shown in Fig. 2.2(a), its main feature is the frequency conversion from RF to IF. The RF signal is first filtered by a band pass filter (BPF), which removes the unwanted out-of-band signals, as well as reduces the noise floor. Then a low noise amplifier (LNA) is used before the signal is input to the down-converting mixer. The mixer takes the local oscillator (LO) signal as a reference to convert RF signal down to an IF. For a high frequency front-end, it is not easy to generate a low phase noise frequency reference directly at high frequency. Thus the LO source is normally obtained by frequency multiplication of a low phase noise reference signal at low frequency. A phase locked loop (PLL) is normally used to generate a good quality frequency reference. Another BPF is used at the output of the mixer, with frequency centered at the wanted IF frequency. The heterodyne architecture is often used in receiver front-ends. However, the selection of IF and LO frequency requires careful considerations to ensure they do not create any problems. In Fig. 2.2(b), the frequency translations in a heterodyne receiver are illustrated. This figure reveals two typical problems: the creation of an image signal, and the leakage of the LO signal.
THEORETICAL BACKGROUND

(a) Block diagram of a heterodyne receiver

(b) Frequency translation in a heterodyne receiver

Figure 2.2: Architecture of a heterodyne receiver

- Problem 1: Image Signal
  Given a RF signal at $f_{RF}$ and IF is set at $f_{IF}$, the LO shall be set to $f_{LO} = f_{RF} - f_{IF}$. The down-converting mixer takes RF and LO signals as input, and generates harmonics at frequencies: $\pm mf_{RF} \pm nf_{LO}$ ($m$ and $n$ are integer numbers). $BPF_2$ only pass the signal at $f_{IF} = f_{RF} - f_{LO}$ by filtering out harmonics at other frequencies. Assuming the desired signal is located at the upper side band ($f_{RF} > f_{LO}$), any signal located at the frequency $f_{IM} = 2f_{LO} - f_{RF} = f_{LO} - f_{IF}$ can be defined as an image signal. In Fig. 2.2(b), the dotted shape represents an image signal and the striped shape signal represents the wanted RF signal. These signals are symmetrically located at two sides of $f_{LO}$. If $BPF_1$ cannot filter out this image signal, then it would also be down-converted by the mixer to the IF frequency, due to $f_{IF} = f_{LO} - f_{IM}$, and this interference would pass $BPF_2$ to the IF as interference to the desired signal.

- Problem 2: LO-to-IF Leakage
  As is shown in Fig. 2.2(a), there is a low-frequency PLL operating at $f_{LO}/N$. From this PLL, the LO of the front-end is obtained using frequency multiplier. In order to drive the frequency multiplier, the PLL shall give relative high output power. It is important to ensure $f_{LO}/N$ and $f_{IF}$ are not too close,
otherwise the PLL output can become a strong interference to the received IF signal.

### 2.2.2 Homodyne Receiver

![Diagram of a homodyne I/Q receiver](image)

(a) Block diagram of a homodyne I/Q receiver

![Frequency translation in a homodyne receiver](image)

(b) Frequency translation in a homodyne receiver

Figure 2.3: Architecture of a homodyne receiver

The block diagram of a homodyne receiver is shown in Fig. 2.3(a). The RF signal is filtered by a BPF then amplified by a LNA. In this kind of receiver frontend, the RF signal is directly down-converted to the baseband (or zero-IF) in one step by mixing with the LO signal. Thus the LO frequency $f_{LO}$ is equal to the RF signal frequency $f_{RF}$. The baseband signal is then filtered by a LPF.

For quadrature phase-modulated signals, the down-conversion must provide quadrature output in order to avoid loss of information. Two mixers are used to convert in-phase (I) component and quadrature (Q) component of the baseband signal.

The main advantage of a homodyne receiver is that it does not have the same image problem as a heterodyne receiver and the frequency translation is straightforward as is shown in Fig. 2.3(b). However, from an implementation point of
view, there are several challenges in designing such a receiver: LO leakage and baseband coupling.

- **Problem of LO leakage**
  If the LO signal leaks through the mixer, it would produce a severe DC offset by mixing LO leakage with the LO itself. This effect would saturate the later stages. Also, the flicker noise of the mixer is critical, which may be present directly at the baseband output.

- **Problem of Baseband Signal Coupling**
  To avoid baseband DC offset saturating the later stages at baseband, a DC-block can be used to form AC-coupling at the baseband output. However, such a DC-block would also remove the close-to-DC spectrum components of the baseband signal. This effect would limit the receiver’s ability to received baseband signals with rich low-frequency components.

### 2.2.3 Interface between Front-end and Baseband

In a heterodyne architecture receiver, the baseband block shall provide an IF interface for a front-end connection. This implies that the baseband modem shall include an IF-to-baseband conversion. In a homodyne architecture receiver, the baseband modules are connected with front-end modules via two identical interfaces which supply the I and Q channels, respectively.

### 2.3 Baseband Architecture and Modulation Techniques

#### 2.3.1 Different Formats of Baseband Signals

![Signal translation in a baseband modem](image)

Figure 2.4: Signal translation in a baseband modem
Baseband signals may be represented using different formats. Three commonly used signal formats, binary data, I/Q-vector and/or IF signal, are illustrated in Fig. 2.4. Baseband modems translate signals between different formats. Here, the baseband modulation process is used as an example to show how different formats are used in the modulator. In a modulator, a binary data stream is input to the modulator. For a M-order digital modulation, M bits \([b_0, b_1, \ldots b_{M-1}]\) are grouped together, and is called a symbol. Then the modulator maps this symbol into a complex vector \(I_k + jQ_k\) from a set of \(2^M\) vectors on I/Q-plane. It is this mapping between M-bit data to the vector that defines the digital modulation scheme. For the modulator having an I/Q interface, the in-phase (I) and quadrature (Q) components are the output signals. For the modulator with IF interface, the modulator makes another translation from an I/Q vector to an IF signal \(S_{IF}(t)\).

\[
I_k + jQ_k \iff S_{IF}(t) = A_k \sin(2\pi f_{IF} t + \varphi_k) \mid t \in [kT_{sym}, (k + 1)T_{sym}] \quad (2.1)
\]

The duration time of such IF signal is equal to the symbol time \(T_{sym}\), in the binary stream format, and the initial phase of such a signal is equal to the phase of the I/Q vector \(\varphi_k\).

### 2.3.2 Digital Modulation Schemes

As mentioned above, a digital modulation scheme defines a mapping between M-bit data and a vector in an I/Q-plane:

\[
[b_0, b_1, \ldots b_{M-1}] \iff I_k + jQ_k = A_k e^{j\varphi_k} \quad (2.2)
\]

Several commonly used digital modulation formats are illustrated in Fig. 2.5:

![Digital Modulation Schemes](image)

**Figure 2.5: Digital Modulation Schemes**

OOK modulation is shown in Fig. 2.5(a). There are two possible constellation points, thus one symbol contains only one bit. OOK modulation is an amplitude only modulation; it can be modulated and demodulated without quadrature interface.
Three PSK-modulation schemes are presented in Fig. 2.5(b) (c) and (d). For BPSK, each constellation symbol contains only one bit. The symbols have the same amplitude, but they are 180-degree out of phase; for QPSK, each constellation symbol contains two bits; for 8-PSK, each symbol contains 3 bits. All these PSK-modulated signals have a constant amplitude/envelop.

A 16-QAM constellation is shown in Fig. 2.5(e), where 4-bit symbols are mapped into 16 different vectors; such a constellation has vectors with different amplitudes and/or phases.

2.4 Figure of Merit (FoM)

Five figures of merit related to the modem are introduced in this section. Choosing a modulation scheme is an important task in wireless system design. To compare different modulation schemes, spectrum efficiency and bit error probability are important figures that describe the characteristic of a modulation scheme from a theoretical point of view.

To examine the performance of a modem implementation, error vector magnitude, sensitivity and energy efficiency are the three useful figures of merit.

2.4.1 Spectrum Efficiency and Bit Error Probabilities

<table>
<thead>
<tr>
<th>Modulation Scheme</th>
<th>Data Rate (BW=5 GHz)</th>
<th>Spectrum Efficiency $\eta$</th>
<th>Bit Error Probabilities</th>
</tr>
</thead>
<tbody>
<tr>
<td>OOK</td>
<td>2.5 Gbps</td>
<td>0.5 bps/Hz</td>
<td>$Q(\sqrt{E_b/N_0})$</td>
</tr>
<tr>
<td>BPSK</td>
<td>2.5 Gbps</td>
<td>0.5 bps/Hz</td>
<td>$Q(\sqrt{2E_b/N_0})$</td>
</tr>
<tr>
<td>D-BPSK</td>
<td>2.5 Gbps</td>
<td>0.5 bps/Hz</td>
<td>$0.5e^{-E_b/N_0}$</td>
</tr>
<tr>
<td>QPSK</td>
<td>5 Gbps</td>
<td>1 bps/Hz</td>
<td>$Q(\sqrt{2E_b/N_0})$</td>
</tr>
<tr>
<td>D-QPSK</td>
<td>5 Gbps</td>
<td>1 bps/Hz</td>
<td>$0.5e^{-E_b/N_0}$</td>
</tr>
<tr>
<td>8-PSK</td>
<td>7.5 Gbps</td>
<td>1.5 bps/Hz</td>
<td>$\frac{3}{5}Q(\sqrt{6E_b/N_0}\sin(\frac{\pi}{8}))$</td>
</tr>
<tr>
<td>16-QAM</td>
<td>10 Gbps</td>
<td>2 bps/Hz</td>
<td>0.375$Q(\sqrt{2E_b/5N_0})$</td>
</tr>
</tbody>
</table>

Table 2.1: Spectrum Efficiency and Bit Error Probability for common modulation schemes

Spectrum efficiency is defined as the ratio: (data bits transferred per second)/(carrier bandwidth in Hertz), so the unit is bps/Hz. The theoretical spectrum efficiency $\eta$ of several commonly used modulation schemes are listed in Table 2.1. As an example, the maximum data rate a modulation scheme can support, assuming 5-GHz RF bandwidth is available, is listed in the table. It can be seen
that the modulation scheme with higher spectrum efficiency $\eta$ can support a higher data rate.

It is important to be aware that at the same condition, a high-spectrum-efficiency modulation scheme can support higher data rate, but it also becomes more vulnerable to noise and interference.

Bit error probability gives a numerical description of the performance degradation in an additive white Gaussian noise (AWGN) environment. The bit error probabilities of various modulation schemes are listed in Table 2.1, where

$$Q(x) = \frac{1}{\sqrt{2\pi}} \int_x^\infty e^{-y^2/2} \, dy$$  \hspace{1cm} (2.3)

and $E_b/N_0$ is a normalized signal-to-noise ratio (SNR) measure, also known as the “SNR per bit”. $E_b$ is the signal energy received per bit and $N_0$ is the spectral density of AWGN. Assuming the bandwidth is $W$, the SNR can be written as:

$$SNR = \frac{P_{Signal}}{P_{Noise}} = \frac{E_b \cdot R_b}{N_0 W} = \frac{R_b E_b}{W N_0} = \eta E_b/N_0$$  \hspace{1cm} (2.4)

where $\eta$ is the bandwidth efficiency, and $R_b$ is the data rate.

The theoretical BER curves of different schemes are plotted in Fig. 2.6. We can see that the best performance of these schemes are BPSK and QPSK with coherent detection followed by D-BPSK and D-QPSK, 16-QAM, and non-coherent OOK [9].

### 2.4.2 Error Vector Magnitude (EVM) of a Received Constellation

The error vector magnitude or EVM is a measure used to quantify the performance of the modem in a wireless communication system. With an ideal modem, the received constellation points are located at the ideal coordinates. However, various imperfections in the implementation (e.g., carrier leakage, low image rejection ratio, phase noise, nonlinearity) cause the actual constellation points to deviate from the ideal locations.

EVM is a measurement which represents how much the constellation points of the actual modulated signal deviate from the ideal positions. An error vector is a vector in the I/Q plane between the ideal constellation point and the actual
point received by the receiver. The average power of the error vector, normalized to signal power, is the EVM.

When constellation points have been normalized, EVM is defined as the root-mean-square (RMS) value of the power difference between a collection of measured constellation points and ideal points. These differences are averaged over a given, typically large, number of symbols and are often shown as a percent of the average power per symbol of the constellation. It is defined as:

\[
EVM_{RMS} = \sqrt{\frac{1}{N} \sum_{k=1}^{N} |S_{ideal,k} - S_{meas,k}|^2} \sqrt{\frac{1}{N} \sum_{k=1}^{N} |S_{ideal,k}|^2}
\]  

(2.5)

where \( S_{meas,k} \) is the normalized \( k^{th} \) symbol in a stream of measured symbols, \( S_{ideal,k} \) is the ideal normalized constellation point for the \( k^{th} \) symbol, and \( N \) is the number of constellation points in a measured stream.
2.4.3 Input Sensitivity of Demodulator

The interface between baseband receiver and front-end receiver shall be specified in order to optimize system performance. The demodulator sensitivity is defined as the minimum power level of input signal required to produce a specified output signal having a specified signal-to-noise ratio:

\[
S_i = k_B(T_a + T_{RX})B\frac{S_o}{N_o}
\]  

(2.6)

where

\(S_i\) = the sensitivity in [W]

\(k_B\) = Boltzmann’s constant

\(T_a\) = equivalent noise temperature in [K] of the source (e.g. antenna) at the input of the receiver

\(T_{RX}\) = equivalent noise temperature in [K] of the receiver referred to the input of the receiver

\(B\) is the bandwidth in [Hz]

\(\frac{S_o}{N_o}\) = required signal-to-noise ratio at output

2.4.4 Energy Efficiency of the Modem

The spectrum efficiency is the traditional FoM for measuring the efficiency of communication systems. It measures how efficiently a limited frequency resource (spectrum) is utilized, however, it fails to give any insight on how efficient the energy is utilized. Another FoM, energy efficiency, provides this insight, i.e. the bits-per-Joule (bits/J), which was introduced in [10].

In the context of high data rate communication, it is more appropriate to define energy efficiency as the ratio (data bits transferred per second)/(energy consumed per second), so the unit is bps/W. It is easy to notice that this definition is equivalent to bits/J:

\[
\frac{1}{W} \frac{bps}{J} = 1 \frac{bit/s}{J/s} = 1 \frac{bit}{J}
\]  

(2.7)
Chapter 3

Implementation of OOK Modulator and Demodulator

This chapter is based on paper [A] and [H].

The OOK modulation has a low spectrum efficiency and requires a high SNR to achieve a certain BER. The main advantage of this modulation scheme is the simplicity of implementation. In this chapter, different approaches for OOK modulator and demodulator implementation are introduced and discussed.

3.1 MMIC-based OOK Modulator

3.1.1 Different Approaches of OOK Implementation

An OOK modulator is, in principle, a device which turns a carrier signal on and off depending on the input data. There are two major structures for building such a functional block: an impulse-radio-type modulator and an RF-switch-type modulator.

Impulse-radio-type OOK Modulator
The concept of an impulse radio provides a method for implementing an OOK modulator without having a carrier generator. A block diagram and operating waveform of an impulse-radio-type OOK modulator are depicted in Fig. 3.1(a). This modulator is comprised of a pulse generator and a BPF. The time domain waveform and frequency domain spectrum of the pulse generator output and the BPF output are presented below the block diagram.

The binary data stream (e.g., “110101”) is input directly to the pulse genera-
IMPLEMENTATION OF OOK MODULATOR AND DEMODULATOR

(a) Impulse radio type OOK modulator

(b) RF switch type OOK modulator

Figure 3.1: The block diagrams and waveforms of different type OOK modulators

tor, which generates a narrow pulse when the input data is “1”. This narrow pulse is normally referred to as a monocycle [11]. After the BPF, an OOK modulated signal is generated. The time domain waveform of the modulated signal shows high frequency waves (the carrier). This can be explained in the frequency domain. The monocycle occupies a large band in the spectrum [12]. By applying a BPF, the energy which is outside of the specific frequency band is filtered out. The monocycle signal turned into a relative narrow band signal, where the center frequency is defined by the BPF.

The advantage of this structure is that there is no need for a local oscillator. However, in order to transmit a signal at a high frequency (e.g., 60 GHz), the pulse generator must be capable of generating a pulse narrower than 20 ps.

RF-Switch-type OOK Modulator

Another approach to implement an OOK modulator is to use a CW (continuous wave) signal source and an RF switch, which is controlled by the data. The block diagram of this type of OOK modulator is shown in Fig. 3.1(b). The important figures of merit of this modulator are the maximum data rate, the insertion loss,
the off-state isolation, and the frequency range of the carrier. RF switches can be implemented in different technologies.

### 3.1.2 Overview of State-of-the-art OOK Modulator Implementations

The recently-reported OOK modulators are summarized in Table 3.1. There are several approaches to implement the switching function. An amplifier can be used as an RF switch, though the range of carrier frequency is limited. By switching the DC-supply, the power amplifier is turned on and off, and the data can be modulated onto the carrier [13] [14] [15]. For applications which require a wide range of carrier frequencies, a traveling-wave amplifier (TWA) structure can be used [16] [17] [18]. The problem with these solutions is that the off-state isolation is often not sufficient. To improve the isolation, another approach is to switch the carrier oscillator on and off [19]. The data rate, however, is limited due to the time it takes to start up an oscillation. A method of improving isolation is presented in [20], in which a differential amplifier is used. At the off-state, the differential output is added to cancel the leakage of the carrier.

In [A], a new OOK modulator structure is proposed: an emitter-coupled latch is adopted to improve the isolation in the off-state. A high data rate is achieved with this circuit than with alternative approaches the other reported. The OOK modulator presented in [A] was representing the state-of-the-art data rate, until [21] was published. With a more advanced BiCMOS process, a 20-Gbps OOK modulator has been reported using the same structure as we proposed in [A].

### 3.1.3 Latch-based High Datarate OOK Modulator

Referring to Table 3.1, an OOK modulator is normally implemented by a field effect transistor (FET), since a bipolar transistor is not as efficient as a FET device when it is used as a switch. The proposed OOK modulator structure is depicted in Fig. 3.2, which can be implemented in either bipolar or FET technology. The modulator contains an amplifier and a latch block. The bias current passed through them is controlled by an emitter-coupled pair (ECP) which, in turn, is controlled by the input data. An RF carrier signal is applied to the input of the amplifier and the data is applied to the ECP. When the current supply of the amplifier part is switched on, the modulator is in its “on” state and the carrier is amplified and passed to the output ports. When the current supply of the latch is switched on, the amplifier block is turned off and a constant voltage is delivered from the latch to the output port. The switching function is thus realized by the ECP together with the latch, eliminating the need for an RF switch.
A proof-of-concept OOK modulator is designed and fabricated in a commercial GaAs HBT process [22]. The HBT devices used have a transient frequency $f_t = 55$ GHz and a maximum oscillation frequency $f_{Max} = 63$ GHz. All HBTs used in this design are single emitter device with emitter size of $1 \mu m \times 10 \mu m$. The schematic of the design is shown in Fig. 3.3. $Q_1$-$Q_4$ form the amplifier block, and $Q_5$-$Q_8$ form the latch block. $Q_9$-$Q_{14}$ provide bias according to the input data. The design requires only a negative 5 V bias, emitter followers are used to feed the signal and provide bias for the next stages. To verify the performance of the OOK modulator, both time domain and frequency domain measurements are carried out. The time domain measurement is to determine the maximum data rate this OOK modulator can support. In the frequency domain measurement, the maximum carrier frequency this OOK modulator can support can be determined. Fig. 3.4 shows the measured time domain waveform of an 18 GHz carrier signal modulated by a 14 Gbps data signal. The upper waveform is the input data, and the lower waveform is the modulated signal. The peak-to-peak voltage level of the modulated signal at the “on” and “off” states are 270 mV and 49 mV, respectively.

**Table 3.1:** Comparison of recently reported RF-switch-type OOK modulator

<table>
<thead>
<tr>
<th>Ref</th>
<th>Technology</th>
<th>Frequency (GHz)</th>
<th>Data rate (Gbps)</th>
<th>Isolation (dB)</th>
<th>Approach</th>
</tr>
</thead>
<tbody>
<tr>
<td>[13]</td>
<td>90 nm CMOS</td>
<td>60</td>
<td>2</td>
<td>28.4</td>
<td>Switching PA</td>
</tr>
<tr>
<td>[16]</td>
<td>90 nm CMOS</td>
<td>60</td>
<td>8</td>
<td>26.6</td>
<td>Switching TWA</td>
</tr>
<tr>
<td>[17]</td>
<td>0.4 $\mu$m FET</td>
<td>DC-110</td>
<td>1</td>
<td>26.5</td>
<td>Switching TWA</td>
</tr>
<tr>
<td>[18]</td>
<td>0.1 $\mu$m InP HEMT</td>
<td>120</td>
<td>10</td>
<td>20</td>
<td>Switching TWA</td>
</tr>
<tr>
<td>[14]</td>
<td>90 nm CMOS</td>
<td>60</td>
<td>2</td>
<td>28.4</td>
<td>Switching Amplifier</td>
</tr>
<tr>
<td>[15]</td>
<td>90 nm CMOS</td>
<td>60</td>
<td>2.5</td>
<td>16</td>
<td>Switching Amplifier</td>
</tr>
<tr>
<td>[19]</td>
<td>130 nm CMOS</td>
<td>45-46</td>
<td>0.15</td>
<td>50</td>
<td>Switching LO</td>
</tr>
<tr>
<td>[20]</td>
<td>90 nm CMOS</td>
<td>60</td>
<td>3.5</td>
<td></td>
<td>Differential Cancelation</td>
</tr>
<tr>
<td>[A]</td>
<td>1.4 $\mu$m GaAs HBT</td>
<td>DC-28</td>
<td>14</td>
<td>27</td>
<td>ECL+ latch</td>
</tr>
<tr>
<td>[21]</td>
<td>0.25 $\mu$m SiGe BiCMOS</td>
<td>60</td>
<td>20</td>
<td>36</td>
<td>ECL+ latch</td>
</tr>
</tbody>
</table>
which corresponds to -7.4 dBm and -22.2 dBm; a 14.8 dB on/off ratio is achieved. In the frequency domain measurement, a 0 dBm carrier is input, and the output is measured when the data input is kept at logic “1” or logic “0”. By changing the frequency of the carrier, the insertion loss (at the “on” state) and the isolation (at the “off” state) are obtained. The comparison between measurement and simulation is presented in Fig. 3.5. When the 18 GHz carrier is input, the measured insertion loss is 12 dB and the isolation is 27 dB. This indicates a 15 dB on/off ratio, which agrees well with the time domain measurement. The frequency domain measurement shows the isolation is better than 27 dB over the operation band. However, a high insertion loss is observed when the carrier frequency is larger than 15 GHz. The insertion loss is related to the gain of the differential amplifier. Utilizing inductors as collector loads instead of the resistors $R_1$ and $R_2$ may reduce
the insertion loss and increase the bandwidth. Compared with previously reported results in Table 3.1, this design can support a higher data rate while maintaining good isolation.
3.2 MMIC-based OOK Demodulator

3.2.1 OOK Demodulation Solutions

An OOK modulated signal can be demodulated easily using an envelope detector, which can estimate the RF power of the input signal. An envelope detector can be implemented using bipolar [23], Schottky-diode, [24] or FET [25] devices. In [H], we presented two envelope detector designs in a 0.15 \( \mu \)m mHEMT process; one was a Schottky diode passive detector and the other was an active FET detector. The Schottky diode detector has a detection bandwidth of 40 to 60 GHz and a sensitivity of 500 V/W achieved at 60 GHz. The active detector has a wider detection bandwidth of 10 to 60 GHz.

3.2.2 mHEMT Active Envelope Detector

![Figure 3.6: The schematic of the mHEMT based active envelope detector](image)

The schematic of the mHEMT-based active envelope detector is shown in Fig. 3.6. The modulated signal is input at the gate of a 2 \( \times \) 20 \( \mu \)m mHEMT device through a capacitor \( C_1 \) in series with a resistor \( R_2 \) (37 \( \Omega \)). The drain quiescent current \( I_{DD} \) is controlled by the gate bias voltage \( V_{GG} \). The drain is connected through \( R_4 \) (750 \( \Omega \)) to \( V_{DD} \). The capacitor \( C_2 \) (0.66 \( pF \)) is part of the low pass filter (LPF). The \( V_{GG} \) is normally set near the pinch-off voltage of the device. When there is a certain RF signal input into the detector, the FET device conducts half cycle of the RF signal, and the LPF slowly follows the envelope of the RF signal.
signal, which gives a voltage drop from $V_{DD}$. When there is no RF signal input, the device is close to pinch-off and the output is nearly $V_{DD}$. The characteristic of the active detector was simulated as a function of frequency and bias conditions. The optimum bias is $V_{DD} = 2 \text{ V}$ and $I_{DD} = 200 \mu\text{A}$.

### 3.2.3 Detector Measurement

The transfer function of the detector can be characterized by Eq. 3.1,

$$V_{out} = V_{DD} - f(P_{in})$$

(3.1)

where $f(P_{in})$ is a quasi-linear monotonically increasing function. The most straightforward approach to measuring $f(P_{in})$ is to measure the output DC with a certain RF input signal. For low RF-powers, the output of the detector can be characterized by a lock-in amplifier.

![Diagram of measurement setup](image)

**Figure 3.7:** The measurement setup for the envelope detector

The measurement setup is shown in Fig. 3.7. An RF carrier with a certain frequency and power $P_{in}$, is input to an RF switch, which switches on and off under the control of a synchronization clock (around 1 kHz, which is the maximum synchronization rate of the instrument). Thus, the output of the detector is a square wave whose highest voltage is $V_{dd}$, and lowest voltage is related with the input power. Instead of measuring $V_{out}$, we now measure $\Delta V_{out}$ and its relationship with $\Delta P_{in}$. The lock-in amplifier measures the detector output with the synchronization clock as an additional input. By averaging the output according to the timing indicated by the synchronization clock, it can measure the peak-to-peak voltage
3.2. MMIC-BASED OOK DEMODULATOR

Figure 3.8: Measured output incremental voltage as a function of input power and frequency of a signal down to the µV level. The measurement result of the envelope detector is plotted in Fig. 3.8. The measurement shows linear operation from the lowest power up to -5 dBm, and the detector operates from 10 GHz up to 60 GHz. The maximum sensitivity at 20 GHz is 2800 V/W.
This chapter is based on paper [B] [D] [F] and [G].

The OOK modulation mentioned in the previous chapter has two disadvantages. First, a high bandwidth is required to support high data rate transmission due to OOK’s low spectrum efficiency. Second, OOK modulation is vulnerable to environmental noise and interference. These drawbacks limit the application of OOK modulation to short-range wireless applications such as wireless data centers.

On the other hand, referring to Table 2.1 and Fig. 2.6, non-coherent D-QPSK modulation has twice the spectrum efficiency of OOK. Its implementation is more straightforward than QPSK and QAM modulation, since non-coherent detection does not require carrier recovery. The $E_b/N_0$ (to achieve $BER < 10^{-8}$) that D-QPSK requires is also 2.5 dB lower than that required by OOK. With coherent QPSK modulation, the $E_b/N_0$ required to achieve the same bit error probability is 3 dB less than for non-coherent QPSK.

In this chapter, modulation and detection techniques for QPSK are introduced in section 4.1, and several prior types of QPSK baseband transceiver designs are reviewed in section 4.2. In section 4.3, two fixed data-rate modem implementations are described at 2.5 Gbps and 5 Gbps, respectively. These are focused on solutions that can increase the data rate, however, these solutions can only operate at a single data rate. In section 4.4, a 10 Gbps D-QPSK modem implementation is described, which can operate at multiple data rates without hardware modification. In section 4.5, a novel coherent detection method based on the injection locking principle is proposed and a 12 Gbps QPSK receiver implementation is presented.
4.1 QPSK Modulation and Detection Techniques

As discussed in previous sections, an OOK modulated signal as an amplitude-only modulation scheme can be detected by an envelope detector. For phase-shift-keying, the information is modulated on the phase. To extract the information, the receiver needs to perform either coherent detection or non-coherent detection. The principle of both modulation and detection techniques are illustrated.
4.1 QPSK Modulation and Coherent Detection

In Fig. 4.1(a), QPSK modulation and coherent-detection is described. At the modulator (shown on the left), given a binary data input stream, each two binary bits are combined into a symbol. The symbol is then translated into a phase \( \varphi_k \in [0, \pi/2, \pi, 3\pi/2] \). By manipulating the phase of the carrier signal, the modulated signal \( S_{IF} \) is generated:

\[
S_{IF}(t) = A\sin(2\pi f_{IF}t + \varphi_k)
\]  

At the demodulator (shown on the right), the IF signal received at the receiver can be expressed as:

\[
S_{IFr}(t) = A_{path}(t)A\sin(2\pi f_{IF}t + \varphi_k + \varphi_{path}) + N(t)
\]  

where \( A_{path}(t) \) is the amplitude attenuation and fading as result of propagation through a certain channel, \( \varphi_{path} \) is the additional phase due to the propagation delay, and \( N(t) \) is the noise related to the channel and the transceiver. \( \varphi_k \) is the phase modulated with information. To estimate this phase at the receiver, a carrier recovery block is used to recover the carrier signal \( S_{ref}(t) = \sin(2\pi f_{IF}t + \varphi_{path}) \).

A phase detector can be used to recover the phase information \( \varphi_k \).

There are two structures to recover the carrier, a feed-forward structure and a feedback structure. In a feed-forward structure, the carrier recovery block takes the modulated signal \( S_{IFr}(t) \) as an input and generates the carrier signal \( S_{ref}(t) \); in a feedback structure, the carrier recovery block has an internal frequency source which operates at \( f_{IF} + \Delta f \). This internal frequency source would drive the phase detector to generate a phase output \( \varphi_k + \Delta ft \). The carrier recovery block takes this phase output and feeds it back to the internal frequency source, so that \( \Delta f \) can be gradually reduced. When \( \Delta f = 0 \), the carrier is recovered and the phase detector would output \( \varphi_k \) as expected.

4.1.2 D-QPSK Modulation and Non-coherent Detection

D-QPSK modulation and non-coherent detection is described in Fig. 4.1(b). At the modulator (shown on the left), two binary bits from the input data stream are combined into a symbol. For D-QPSK, the symbol is mapped into phase difference \( \Delta\varphi_k \), instead of mapping directly to the carrier phase \( \varphi_k \). The carrier phase \( \varphi_k \) is then generated by \( \varphi_k = \Delta\varphi_k + \varphi_{k-1} \). At the demodulator, non-coherent detection is carried out by taking the received modulated signal \( S_{IFr}(t) \) and generating a signal with one-symbol-period delay \( S_{IFr}(t - T_{sym}) \). A phase detector can be used to recover phase difference \( \Delta\varphi_k \) by comparing \( S_{IFr}(t) \) and \( S_{IFr}(t - T_{sym}) \). In D-QPSK, the information is modulated on phase difference \( \Delta\varphi_k \), thus, there is no need to recover the true carrier phase \( \varphi_k \).
4.2 Overview of Reported QPSK Baseband Transceivers

The data rate of the reported QPSK baseband modems over recent years is shown in Fig. 4.2 and more detailed information on these modems is summarized in Table 4.1. In 2007, a 2.2 Gbps D-QPSK modem was reported as shown in [26], in which the received baseband I and Q signals are sampled by analog track-and-hold amplifiers and stored as voltage levels in capacitors. By comparing the voltage levels in these capacitors, the phase difference $\Delta \varphi_k$ is extracted for demodulation. The structure of such non-coherent detection receiver is complicated due to lots of auxiliary blocks are needed to operate the track-and-hold amplifiers, which also limits the data rate of this work. In 2010, a 2.5-Gbps receiver was demonstrated in [27], where two 4-GS/s-sampling rate analog-to-digital converter (ADC) were designed to sample the I and Q baseband signals, and a digital processor was used to perform feedback-coherent detection. In the same year, we published a 2.5-Gbps FPGA-based D-QPSK modem, which used commercially available components and a simpler structure [G]. In 2011, a 5-Gbps QPSK receiver was published as shown in [28], where a feed-forward carrier recovery is made by using a frequency quadrupler and a 4:1 frequency divider. In the same year, in cooperation with Ericsson, we presented a 5-Gbps D-QPSK E-band radio link with real data traffic from two optical fibers [F]. In 2012, a 10-Gbps QPSK receiver was demonstrated as shown in [29], where both the ADC and DSP were designed and implemented in a 65-nm CMOS technology to perform feedback-coherent detection. In 2013, we presented a D-QPSK modem that can operate in multiple data rates up to 10 Gbps [B] and a feed-forward coherent detection QPSK modem supporting up to 12 Gbps [D]. Another feed-forward coherent detection receiver is proposed in [E], which supports QPSK and 8-PSK with data rates up to 15 Gbps.

4.3 Fixed Rate D-QPSK Modem Implementation

4.3.1 D-QPSK Modulator Implementation

The coding rule of the differential QPSK is shown in Fig. 4.3(a). A D-QPSK symbol contains two data bits $[b_0, b_1]$. These two bits define how the baseband signal $I_k$ and $Q_k$ are generated from the previous baseband signal $I_{k-1}$ and $Q_{k-1}$, where $I_k$ and $Q_k = \pm 1$. An example of this differential encoding process is shown on the right-hand side: given previous symbol $I_{k-1} + jQ_{k-1} = 1 + j$, and with data input $[b_0, b_1] = [0, 0]$, the baseband outputs are $I_k = I_{k-1}$ and $Q_k = Q_{k-1}$. Thus the symbol has a 180 degree phase shift compared to the previous symbol. With other binary data input, the differential encoding will generate other outputs according to the rule given in Fig. 4.3(a).
4.3. FIXED RATE D-QPSK MODEM IMPLEMENTATION

Datarate (Gbps)

Year

Coherent Detection
Non-coherent Detection

Figure 4.2: Reported QPSK baseband modems over years

<table>
<thead>
<tr>
<th>Ref</th>
<th>Technology</th>
<th>Method</th>
<th>Data rate (Gbps)</th>
<th>BER</th>
<th>Year</th>
</tr>
</thead>
<tbody>
<tr>
<td>[26]</td>
<td>90 nm CMOS</td>
<td>Sampling &amp; Compare</td>
<td>2.2</td>
<td>$10^{-9}$</td>
<td>2007</td>
</tr>
<tr>
<td>[27]</td>
<td>90 nm CMOS</td>
<td>Mixed Signal</td>
<td>2.5</td>
<td>$10^{-12}$</td>
<td>2010</td>
</tr>
<tr>
<td>[G]</td>
<td>Hybrid</td>
<td>Analog PD</td>
<td>2.5</td>
<td>$10^{-9}$</td>
<td>2010</td>
</tr>
<tr>
<td>[F]</td>
<td>Hybrid</td>
<td>Analog PD</td>
<td>5</td>
<td>$10^{-9}$</td>
<td>2011</td>
</tr>
<tr>
<td>[28]</td>
<td>0.8 µm SiGe</td>
<td>Multiple divide CR</td>
<td>5</td>
<td>$10^{-5}$</td>
<td>2011</td>
</tr>
<tr>
<td>[30]</td>
<td>65nm CMOS</td>
<td>Costas Loop</td>
<td>2.5</td>
<td>$10^{-9}$</td>
<td>2011</td>
</tr>
<tr>
<td>[31]</td>
<td>0.8µm SiGe HBT</td>
<td>Multiple divide CR</td>
<td>6.248</td>
<td>$10^{-11}$</td>
<td>2012</td>
</tr>
<tr>
<td>[29]</td>
<td>65nm CMOS</td>
<td>Mixed Signal</td>
<td>10</td>
<td>$10^{-12}$</td>
<td>2012</td>
</tr>
<tr>
<td>[B]</td>
<td>Hybrid</td>
<td>Analog PD</td>
<td>up to 10</td>
<td>$10^{-9}$</td>
<td>2013</td>
</tr>
<tr>
<td>[D]</td>
<td>HBT</td>
<td>Injection Locking</td>
<td>up to 12</td>
<td>$10^{-11}$</td>
<td>2013</td>
</tr>
<tr>
<td>[E]</td>
<td>250-nm DHBT</td>
<td>Divider CR</td>
<td>up to 15</td>
<td>$10^{-8}$</td>
<td>2013</td>
</tr>
</tbody>
</table>

Table 4.1: Reported Multi-Gbps QPSK Baseband Receivers
Figure 4.3: Differential encoding rules and D-QPSK modulator implementations
In Fig. 4.3(b), a straightforward way to implement the differential encoder is shown. The input data are packed into 2-bit groups \([b_0, b_1]\) by a serial-to-parallel converter. The baseband outputs \(I_k\) and \(Q_k\) are generated by a binary encoder, which takes \([b_0, b_1]\) and previous baseband \(I_{k-1}\) and \(Q_{k-1}\) as inputs. A delay element or a storage element is used for keeping the baseband signal \(I_k\) and \(Q_k\) for a symbol period, so they would act as previous symbol \(I_{k-1}\) and \(Q_{k-1}\) when new bits are input. The problem with such a structure is that the encoder needs to operate at a speed of multi-Gbps, which is very challenging for components available on the market. To overcome this problem, a modified encoder structure is proposed in [F], which supports a data rate of up to 2.5 Gbps.

A 2.5 Gbps differential encoder is implemented by programming an FPGA, which has internal multi-Gbps serial-to-parallel converters. The structure of such a differential encoder is shown in Fig. 4.3(c). In this topology, a 1:16 serial-to-parallel converter is used to split the high speed serial stream into a lower speed 16-bit data group \([b_0, b_1, ..., b_{15}]\). The differential encoder is implemented by two ROMs (read-only memory) which store all possible encoded output for \(I_{k-1} + jQ_{k-1} = \pm 1 + j\) and \(I_{k-1} + jQ_{k-1} = \pm 1 - j\) respectively. A two-bit memory is used to save the previous baseband status and based on this memory, the correct ROM output is selected and output by using a 16:1 parallel-to-serial converter. This topology is proven to work at a data rate of 2.5 Gbps. However, at higher data rates, a serial-to-parallel converter with a higher parallel ratio is needed (i.e. 20:1). This means the input bit-width (number of parallel bits) of the ROM would be increased correspondingly. Thus this topology is not scalable in terms of the data rate, due to limited ROM resources available in an FPGA.

To increase the data rate, a new method for differential encoding needs to be developed. The function of the encoder can be described mathematically as follows: assuming a 1:20 serial-to-parallel converter is used in the high speed differential encoder, a group of 20-bit input data \([b_0, b_1, ..., b_{19}]\), is divided into 10 groups of 2-bit symbols \([sym_0, sym_1, ..., sym_9]\). According to the differential coding rule, the symbol information is converted as carrier phase difference between two adjacent symbols \([\Delta \varphi_0, \Delta \varphi_1, ..., \Delta \varphi_9]\). Assuming that the initial carrier phase is \(\varphi_{int}\), then the phase of the \(k^{th}\) symbol is given by:

\[
\varphi_k = \varphi_{int} + \sum_{i=0}^{k} \Delta \varphi_i
\]

where \(\Delta \varphi_i \in [\pi/2, \pi, 3\pi/2, 0]\) and \(e^{j\varphi_k} = I_k + jQ_k\).

In [F], an improved algorithm called parallel prefix layer (PPL) is proposed that can increase the speed of calculation, thereby saving operation time and hardware resources. The principle of the PPL is that the calculation is divided into several steps. In the earlier calculation steps, some “intermediate” calculation re-
sults are generated. These results are reused in later calculation steps. The results which are frequently used are calculated in high priority. By sharing such intermediate results, repetitive calculation is avoided, thus the encoding process is more efficient.

The structure of the PPL is depicted in Fig. 4.3(d). The serial-to-parallel converter takes the input data stream, and groups it as 20-bit input to the a “data-to-phase” block, where the phase difference data \([\Delta \varphi_0, \Delta \varphi_1, ..., \Delta \varphi_9]\) is generated. To perform the calculation in Eq. 4.3, four PPL layers are used: layer 0, layer 1, layer 2 and layer 3. Layer 0 takes the phase difference \([\Delta \varphi_0, \Delta \varphi_1, ..., \Delta \varphi_9]\) and produces intermediate output as:

\[
\begin{bmatrix}
L_0(0) \\
L_0(1) \\
L_0(2) \\
L_0(3) \\
L_0(4) \\
L_0(5) \\
L_0(6) \\
L_0(7) \\
L_0(8) \\
L_0(9)
\end{bmatrix} =
\begin{bmatrix}
\Delta \varphi_0 \\
\Delta \varphi_0 + \Delta \varphi_1 \\
\Delta \varphi_2 \\
\Delta \varphi_2 + \Delta \varphi_3 \\
\Delta \varphi_4 \\
\Delta \varphi_4 + \Delta \varphi_5 \\
\Delta \varphi_6 \\
\Delta \varphi_6 + \Delta \varphi_7 \\
\Delta \varphi_8 \\
\Delta \varphi_8 + \Delta \varphi_9
\end{bmatrix}
\] (4.4)

Then layer 1 takes the output from layer 0 and generates:

\[
\begin{bmatrix}
L_1(0) \\
L_1(1) \\
L_1(2) \\
L_1(3) \\
L_1(4) \\
L_1(5) \\
L_1(6) \\
L_1(7) \\
L_1(8) \\
L_1(9)
\end{bmatrix} =
\begin{bmatrix}
L_0(0) \\
L_0(1) \\
L_0(1) + L_0(2) \\
L_0(1) + L_0(3) \\
L_0(3) + L_0(4) \\
L_0(3) + L_0(5) \\
L_0(5) + L_0(6) \\
L_0(5) + L_0(7) \\
L_0(7) + L_0(8) \\
L_0(7) + L_0(9)
\end{bmatrix} =
\begin{bmatrix}
\Delta \varphi_0 \\
\sum_{i=0}^1 \Delta \varphi_i \\
\sum_{i=0}^2 \Delta \varphi_i \\
\sum_{i=0}^3 \Delta \varphi_i \\
\sum_{i=0}^4 \Delta \varphi_i \\
\sum_{i=0}^5 \Delta \varphi_i \\
\sum_{i=0}^6 \Delta \varphi_i \\
\sum_{i=0}^7 \Delta \varphi_i \\
\sum_{i=0}^8 \Delta \varphi_i \\
\sum_{i=0}^9 \Delta \varphi_i
\end{bmatrix}
\] (4.5)
Similarly, the layer 2 takes the output from layer 1 and generates:

\[
\begin{bmatrix}
L_2(0) \\
L_2(1) \\
L_2(2) \\
L_2(3) \\
L_2(4) \\
L_2(5) \\
L_2(6) \\
L_2(7) \\
L_2(8) \\
L_2(9)
\end{bmatrix}
= 
\begin{bmatrix}
L_1(0) \\
L_1(1) \\
L_1(2) \\
L_1(3) \\
L_1(4) \\
L_1(5) \\
L_1(6) \\
L_1(7) \\
L_1(8) \\
L_1(9)
\end{bmatrix}
\begin{bmatrix}
\Delta \varphi_0 \\
\sum_{i=0}^5 \Delta \varphi_i \\
\sum_{i=2}^6 \Delta \varphi_i \\
\sum_{i=4}^9 \Delta \varphi_i
\end{bmatrix}
\]  
(4.6)

Finally, the layer 3 takes the output from layer 2 and the initial phase \( \varphi_{int} \) generates:

\[
\begin{bmatrix}
L_3(0) \\
L_3(1) \\
L_3(2) \\
L_3(3) \\
L_3(4) \\
L_3(5) \\
L_3(6) \\
L_3(7) \\
L_3(8) \\
L_3(9)
\end{bmatrix}
= 
\begin{bmatrix}
L_2(0) + \varphi_{int} \\
L_2(1) + \varphi_{int} \\
L_2(2) + \varphi_{int} \\
L_2(3) + \varphi_{int} \\
L_2(4) + \varphi_{int} \\
L_2(5) + \varphi_{int} \\
L_2(6) + \varphi_{int} \\
L_2(7) + \varphi_{int} \\
L_2(8) + \varphi_{int} \\
L_2(9) + \varphi_{int}
\end{bmatrix}
\]  
(4.7)

where \( \varphi_{int} \) is the phase of the symbol before the data being processed. Compared with Eq. 4.3, the PPL output generates the same result as expected. Each layer in PPL has independent registers to store the intermediate results; the registers are updated synchronously using a clock which is 1/20 of the data rate. The advantage of this structure is that the data rate can be scaled up as long as the serial-to-parallel converter inside the FPGA can support it.

### 4.3.2 D-QPSK Demodulator Implementation

As discussed in section 4.1.2, the data information can be detected by comparing phase differences of two adjacent symbols of the received signal. In the case of D-QPSK modulation, two data bits need to be recovered from this detection. The demodulator structure proposed in [F] is presented in Fig. 4.4(a). First, the received signal is split into two branches. In the upper branch, the signal is delayed by one symbol period and is compared with its replica with a 45 degree phase shift.
A mixer and an LPF (low-pass filter) are used as a phase detector. The upper branch gives high output when two adjacent symbols are 0-90 degrees out of phase; and gives low output when the phase difference is 180-270 degrees. The lower branch has the same structure except that a -45 degree phase shift is applied instead of 45 degree, which gives high output when two adjacent symbols are 270-360 degree out of phase, and gives low output when the phase difference is 90-180 degrees. The upper branch recovers the second data bit as in Fig. 4.3(a), and the lower branch recovers the first data bit.

The structure requires two delay elements, which should be identical, and a 45 degree phase shifter. None of these are standard components, which results in difficulty setting up such a demodulator. Another demodulator structure is proposed in [F], as shown in Fig. 4.4(b). The differences are: firstly, the 45 and -45 degree phase shifters are replaced by a 90 degree coupler, which is a standard component; secondly, two symbol delay elements are combined into one delay element, which eliminates a potential mismatch problem between the two delay elements. In this demodulator, the delay element is tuned to provide a symbol period time delay and 45 degree phase shift at the IF frequency. Thus, the structure is equivalent to the one shown in Fig. 4.4(a).
4.3.3 5 Gbps D-QPSK E-band Radio Implementation

The structure of a full duplex D-QPSK E-band radio is illustrated in Fig. 4.5(a); the upper part is the transmitter and the lower part is the receiver. A single fiber operating at STM-16 (Synchronous Transport Module level-16) transmission standard gives 2.488 Gbps data throughput. In the transmitter, two of these fibers are connected to the fiber interface on the FPGA; inside the FPGA these two data streams are interleaved, forming a 5 Gbps data stream. The differential encoder converts this data stream into I and Q signals, which are modulated onto a 10 GHz IF by an I/Q modulator. A commercial E-band module is used to convert the 10 GHz IF signal to the upper (81-86 GHz) or lower (71-76 GHz) sub-bands of the E-band. In the receiver, the E-band module down-converts the RF to an IF signal, and an analog demodulator is used to recover the transmitted data. A photo of the lab test setup of this radio is shown in Fig. 4.5(b). The radios are connected through an E-band attenuator. The test shows that the radio can achieve error-free transmission.

4.4 Multiple Data Rate D-QPSK Modem Implementation

4.4.1 Data Rate Limitation in a D-QPSK Demodulator

The demodulators mentioned in previous sections require symbol delay blocks. These delay blocks are designed to provide true-time delay equal to the symbol period ($T_{\text{sym}}$), and are implemented by a fixed-length transmission line. The difficulty of adjusting these delay blocks limits the data rates the demodulator can receive. Due to this reason, hardware modification is required to support more flexible data rate transmissions.

4.4.2 Proposed Multirate D-QPSK Modem

In paper [B], a solution which enables multi-rate transmission is proposed. This solution uses a novel differential encoding scheme for D-QPSK modulation, which enables multi-rate transmission with no additional modifications to a receiver as mentioned before.

Traditional D-QPSK Encoding at Different Data Rates

Before introducing the proposed encoding scheme, a traditional D-QPSK modulation scheme that operates at different data rates is reviewed. For a base rate
transmission, the symbol period is $T_{sym}$. Data is modulated on the phase difference $\Delta \phi_k$ between $k-1^{th}$ symbol and $k^{th}$ symbol, as shown in Fig. 4.6(a). At the demodulator, a $T_{sym}$ delay element is used to compare the phase difference between adjacent symbols.

When the transmission rate is doubled, the modulation is performed in the same manner, however, the symbol period is changed into $T_{sym}/2$, as is shown in Fig. 4.6(b). At the demodulator, in order to compare the phase difference, the delay in the demodulator must also be reduced to $T_{sym}/2$.

**Proposed D-QPSK Encoding Rule**

A novel multi-rate differential encoding scheme is proposed, in which multi-rate
4.4. MULTIPLE DATA RATE D-QPSK MODEM IMPLEMENTATION

(a) Conventional differential encoding at base rate

(b) Conventional differential encoding at 2x base rate

(c) Proposed differential encoding at 2x base rate

<table>
<thead>
<tr>
<th>Binary Input</th>
<th>$I_k$</th>
<th>$Q_k$</th>
<th>$\Delta \varphi_k$</th>
<th>$\Delta \varphi_{k+1}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>$[0, 0]$</td>
<td>$\overline{I_{k-N}}$</td>
<td>$\overline{Q_{k-N}}$</td>
<td>180°</td>
<td></td>
</tr>
<tr>
<td>$[0, 1]$</td>
<td>$\overline{Q_{k-N}}$</td>
<td>$I_{k-N}$</td>
<td>90°</td>
<td></td>
</tr>
<tr>
<td>$[1, 0]$</td>
<td>$Q_{k-N}$</td>
<td>$\overline{I_{k-N}}$</td>
<td>270°</td>
<td></td>
</tr>
<tr>
<td>$[1, 1]$</td>
<td>$I_{k-N}$</td>
<td>$Q_{k-N}$</td>
<td>0°</td>
<td></td>
</tr>
</tbody>
</table>

(d) Proposed differential encoding rule

Figure 4.6: Proposed differential encoding scheme
data can be detected using a constant real-time delay provided by a delay element. The proposed encoding rule is shown in Fig. 4.6(d), where $N$ is the multirate factor. When $N=1$, the base rate, the proposed scheme is identical to the conventional scheme.

The operational waveform of the proposed encoding rule at double base rate is shown in Fig. 4.6(c). The basic concept is that when the data rate is doubled, data is modulated to the carrier by applying a phase difference $\Delta \phi_k$ between $k$ and $k^{th}$ symbol, instead of by two adjacent symbols.

By doing this, the necessary real-time delay is now two times the base symbol periods $T_{sym}/2$. Since the symbol period for $N=2$ is a half of the symbol period for the base rate, the desired value of the real-time delay is actually the same as that for $N=1$, the base rate $T_{sym}$.

4.4.3 Experimental Result of Multirate D-QPSK Modem

![Eye diagram of received signal at different data rates](image1)

![Measured Demodulator Sensitivity](image2)

Figure 4.7: (a) Measured eye diagram of received signal; (b) measured demodulator sensitivity

The proposed multi-rate D-QPSK encoding scheme is implemented in the same hardware platform as illustrated in Fig. 4.5(a). With software modification, D-QPSK transmission at a different data rate could be made using a fixed delay element at the demodulator.

To evaluate the performance of the modem, the eye diagram of the received signal at different data rates is measured by a sampling oscilloscope. The eye
4.5. COHERENT QPSK RECEIVER IMPLEMENTATION

Due to the bandwidth limitation of the demodulator mixer, SNR reduces and jitter increases when the data rate increases.

The demodulator sensitivity is measured at different data rates using the traditional encoding as well as the proposed encoding schemes. The result is shown in Fig. 4.7(b), the modem can reach a BER of less than $10^{-12}$ at all the tested data rates. Measurements show that there is no performance difference between the proposed encoding scheme and traditional encoding scheme.

### 4.5 Coherent QPSK Receiver Implementation

As shown in Fig. 4.1(a), coherent receivers can be built using either feed-forward or feedback structure. The Costas loop is one common feedback structure for coherent receivers which can be implemented using analog circuits [30] or DSPs [27] [29]. On the other hand, a feed-forward structure is simpler than a feedback backward structure, since it does not require feedback information. Feed-forward receivers can be implemented using frequency multipliers and dividers [28] [31], or injection locked oscillators.

In this thesis, two feed-forward coherent receivers are presented using “multipliers-dividers” and “injection locking oscillator” approaches, respectively. In this chapter, an “injection locking oscillator” based feed-forward coherent QPSK receiver is discussed, and a ‘multipliers-dividers’ based 8-PSK receiver will be discussed in the next chapter.

#### 4.5.1 Overview of Injection Locked Oscillator-based Receivers

<table>
<thead>
<tr>
<th>Ref</th>
<th>Technology</th>
<th>Topology</th>
<th>Modulation</th>
<th>Frequency (GHz)</th>
<th>Data Rate (Gbps)</th>
</tr>
</thead>
<tbody>
<tr>
<td>[32]</td>
<td>65 nm CMOS Hybrid</td>
<td>Dual SHILO</td>
<td>BPSK</td>
<td>0.75-0.9</td>
<td>0.005</td>
</tr>
<tr>
<td></td>
<td>and 180 nm CMOS</td>
<td>ILO+Mixer</td>
<td>QPSK</td>
<td>2.36-2.5</td>
<td>0.004</td>
</tr>
<tr>
<td>[33]</td>
<td>Hybrid and InP HEMT</td>
<td>DR1LO</td>
<td>QPSK</td>
<td>3.58-3.69</td>
<td>0.1-0.126</td>
</tr>
<tr>
<td>[34]</td>
<td>0.13 μm CMOS Hybrid</td>
<td>DR1LO</td>
<td>QPSK</td>
<td>2.78</td>
<td>0.32</td>
</tr>
<tr>
<td>[35]</td>
<td>Hybrid and HBT</td>
<td>ILVCO+Mixer</td>
<td>QPSK</td>
<td>6-8</td>
<td>up to 12</td>
</tr>
</tbody>
</table>

Table 4.2: Reported Injection Locked Oscillator based QPSK Baseband Receiver

Recent reported injection locked oscillator based QPSK receivers are summarized in Table 4.2. In [32], a 5-Mbps binary BPSK receiver is presented, using two second-harmonic injection locked oscillators (SH-ILO). In [33], a 4 Mbps
single ILO based QPSK receiver is reported. To enable multi-channel operation, a selectable dielectric-resonator injection locked oscillator (DRILLO) is proposed in [34], where the oscillator’s operational band can be tuned by selecting different dielectric-resonator tanks. The highest reported ILO-based receiver data rate is 320 Mbps [35].

In [D], an injection locked voltage controlled oscillator (ILVCO)-based QPSK receiver is presented, which can demodulate a 12 Gbps QPSK modulated signal. The theory of oscillator injection locking to an modulated injection signal is studied.

4.5.2 Injection Locking Principle

**Oscillator Injection Locked to a Sinusoidal Signal**
When a sinusoidal signal is injected to an oscillator, the oscillator would change its oscillation frequency to the injection signal frequency. This effect is called injection locking, and it would happen when the injection signal falls into a frequency range, the so-called lock-in range of the oscillator [36]. The lock-in range of an oscillator can be expressed as an interval \([f_0 - \Delta f_{lock}, f_0 + \Delta f_{lock}]\), where \(\Delta f_{lock}\) is:

\[
\Delta f_{lock} = \frac{f_0}{2Q} \sqrt{\frac{P_i}{P_{out}}} \tag{4.8}
\]

where \(Q\) is the quality-factor of the VCO resonator circuit, \(f_0\) is the center frequency of the resonator, \(P_i\) and \(P_{out}\) are the power level of the injection signal and the oscillator output signal, respectively.

Eq. 4.8 suggests that an oscillator would change its oscillation frequency as long as a sinusoidal signal falls into the lock-in range \([f_0 - \Delta f_{lock}, f_0 + \Delta f_{lock}]\), and its power level is stronger than \(P_i\). The lock-in range reduces as the \(Q\)-factor and/or its output power increases.

**Oscillator Injection Locked to a Modulated Signal**
The theory in [36] cannot explain how an oscillator would operate when a wideband modulated signal is injected. In [D], a theory about how an oscillator can be injection locked to a wideband signal is proposed and verified with experiments.

An example is given to explain the principle of injection locking to a modulated signal. The spectrum of a wideband modulated IF signal is shown in the upper part of Fig. 4.8. The peak power level of the modulated signal at the IF frequency is \(P_i = -20\) dBm. Given an oscillator whose quality factor \(Q = 15\), and oscillates at \(f_0 = 6.98\) GHz and output signal level \(P_{out} = 8\) dBm. The range of
4.5. COHERENT QPSK RECEIVER IMPLEMENTATION

the lock-in window for such oscillator can be calculated from Eq. 4.8:

\[ \Delta f_{\text{lock}} = \frac{f_0}{2Q} \sqrt{\frac{P_i}{P_{\text{out}}}} \approx 9 \text{ MHz} \] (4.9)

Figure 4.8: A QPSK-modulated signal used as injection input

In the left part of Fig. 4.8, part of the modulated signal spectrum is plotted within a 100 MHz frequency range, centered at the IF frequency. The zoomed spectrum is shown on the right-hand side of Fig. 4.8. In the center of this spectrum, a 9 MHz wide white box is marked, which indicates the lock-in windows as described in Eq. 4.8. Experimental results show that the oscillator always locks to the frequency within this lock-in window, where the max spectrum density of the injected signal reaches its maximum. In this example, the maximum spectrum density of the injected modulated signal locates at \( f_{IF} = 6.98 \text{ GHz} \). Thus the oscillator would operate at \( f_{IF} \) instead of \( f_0 \) due to the injection locking effect.

Based on the discussion above, to ensure an oscillator can lock to the IF frequency, these conditions shall be fulfilled:

a). The lock-in range of the oscillator shall cover the IF frequency \( f_{IF} \)

b). The spectrum density of the modulated signal at \( f_{IF} \) shall be higher than that at other frequencies within the lock-in range.

To meet condition a), an oscillator with lower Q-factor is preferred, which would increase the lock-in range \( \Delta f_{\text{lock}} \). However, to meet condition b), a high Q oscillator would limited the lock-in range so that it includes no frequencies whose spectrum density is higher than that at \( f_{IF} \). Thus in this paper, an ILVCO structure is proposed. By tuning the control voltage of ILVCO, the oscillator center frequency \( f_0 \) can be adjusted, thus the lock-in window can be moved. So condition b) can be met even a high Q-factor oscillator is used.
To verify the proposed theory about locking range, sinusoidal and modulated signals are injected to the ILVCO, the frequency ranges that VCO can lock-in with the injection signals are measured. The simulated and measured lock-in ranges for both signals are plotted in Fig. 4.9. The measurement results agree with simulation, which confirms that the injection lock principle for a sinusoidal signal can be extended to the case of modulated signals.

**ILVCO based QPSK Receiver**

The structure of the ILVCO-based carrier recovery block is illustrated in Fig. 4.10. It comprises a differential Colpitts VCO which is implemented in a GaAs HBT process, and a commercial available passive I/Q down-converting mixer. The received IF signal $S_{IF}(t)$ is injected to the VCO via one of the VCO differential output ports. An attenuator is used to adjust the power level of the injected signal. The other VCO output connects to the LO port of the mixer via a phase shifter. Two LPFs are used at the I and Q outputs, so received data bits can be obtained after LPFs. The schematic of this VCO MMIC is shown in Fig. 4.10 and more details of the design have been described in [37]. Its operational frequency is 6–8 GHz, and gives 8 dBm output signal. The Q-factor of this VCO tank is 31. The MMIC VCO is packaged to a circuit board, as is shown in the lower part of Fig. 4.10. A modulated signal as shown in Fig. 4.8 is input to the proposed receiver. When the ILVCO is powered on, it generates single tone at its free-running
Figure 4.10: QPSK receiver based on an ILVCO
frequency which is much stronger than the injected IF signal, as is shown in Fig. 4.11(a). By tuning the control voltage of the VCO, the oscillation frequency can be adjusted towards IF frequency; adjusting the oscillation frequency close to the IF, ILVCO would enter a quasi-lock state. At quasi-lock state, the oscillation frequency is changing between free-running and locking frequencies, this makes the spectrum appears as it is shown in Fig. 4.11(b). By adjusting ILVCO frequency closer to the IF, when the condition discussed in the previous section is met, the ILVCO locks to the IF, and the spectrum during the lock-in state is shown in Fig. 4.11(c). By tuning the frequency of the ILVCO, it can lock to all IF signals within its oscillation range, as long as the condition mentioned in the previous section can be met.

**The Performance of an ILVCO based QPSK Receiver**

A commercial passive I/Q mixer is used to demodulate data from the received IF signal. The signal-to-noise ratio (SNR) of the demodulated signal is measured at different data rates. In Fig. 4.12, the measured SNR at different data rates is plotted when the transmitter and the receiver are using the same LO source (no carrier recovery is needed), as well as when the proposed carrier recovery block is used.

![Figure 4.11: ILVCO spectrum during injection locking](image-url)
4.5. COHERENT QPSK RECEIVER IMPLEMENTATION

at the receiver side. Compared with the case that the transmitter and receiver are fully synchronized, using the proposed carrier recovery block only reduces SNR by less than 0.7 dB. Actually, the phase noise of the recovered carrier signal does not change with the data rate: the decrease of received SNR is due to the performance of the mixer used in the test. The proposed baseband receiver is tested at different data rates. A transmitter modulates a pseudo-random binary sequence (PRBS) stream and sends it over IF, the proposed receiver is used to demodulate the signal and bit error rate tester is used to measure the bit error rate (BER). The measurement results are plotted in Fig. 4.12. The receiver achieves a BER of less than $10^{-12}$ for data rates less than 10 Gbps. At 12 Gbps, the received data has a BER of less than $10^{-9}$, due to the I/Q mixer bandwidth limitation. The eye diagram of the I-channel at 8 Gbps (4 GBaud/s) and 12 Gbps (6 GBaud/s) are shown at the right-hand side of Fig. 4.12. It can be seen that at higher data rates, the quality of the output signal is degraded due to the bandwidth limitation.

Figure 4.12: Measured BER of an ILVCO based QPSK receiver, measured SNR and eye diagram of received signal
Chapter 5

High Data Rate 8-PSK Demodulator Solutions

This chapter is based on paper [E].

Referring to Table 2.1 and Fig. 2.6, the 8-PSK modulation requires 1-dB higher $E_b/N_0$ than a D-QPSK system does (for BER $\leq 10^{-8}$), and it supports 50% more capacity than QPSK does under the same condition.

In this chapter, an overview of different 8-PSK carrier recovery methods is given, and a novel PSK-carrier recovery method introduced. A proof-of-concept 8-PSK demodulator is designed and tested, which can support data rate up to 15 Gbps.

5.1 Different Methods for 8-PSK Carrier Recovery

Coherent detection is used for 8-PSK demodulation and therefore the carrier must be recovered at the receiver. In this section, different 8-PSK carrier recovery methods are reviewed and a newly proposed method is introduced.

5.1.1 Multi-channel Digital Demodulation

Due to the performance limitation of commercially available ADCs, it is difficult to digitize multi-GHz wideband signals. One possible solution is to divide such wideband channels into several narrow band channels that can be processed by conventional digital receivers.

A 6-Gbps 8-PSK digital transceiver has been reported in which a 2.5 GHz channel is divided into four 625 MHz channels [38]. Four sets of digital transceivers
are used for the 8-PSK transmissions. In the receivers, ADCs are used to digitize the received signals, and the 8-PSK demodulation is performed in DSPs.

5.1.2 Frequency Multiplication based Demodulation

The frequency multiplication-based carrier recovery method is commonly used for demodulating PSK signals. It generates a modulation free harmonic of the desired carrier. For $2^M$-PSK the modulated signal can be expressed as:

$$s_{IF}(t) = \cos(2\pi f_{IF} t + \varphi_k)$$ (5.1)

where the modulated phase $\varphi_k = 2m\pi/2^M$ and $0 \leq m < 2^M$ after $2^M$-times frequency multiplication, $s_{IF}(t)$ becomes

$$s'_{IF}(t) = \cos(2^M \times 2\pi f_{IF} t + 2^M \varphi_k) = \cos(2^M \times 2\pi f_{IF} t)$$ (5.2)

where the phase term $2^M \varphi_k = 2\pi$, so it becomes “modulation free”, and $s'_{IF}(t)$ is a sinusoidal signal at frequency of $2^M f_{IF}$, a $2^M$-times divider is used to recover the carrier signal at $f_{IF}$.

This method has been used in [28][31] for the QPSK demodulation. However, for an 8-PSK modulation, it requires the generation of an $8^{th}$ harmonic of the received signal. High order harmonic generation consumes more power due to the limited efficiency of the high order frequency multipliers, thus this method becomes impractical for 8-PSK.

5.1.3 Proposed Selective Harmonic Generator-based 8-PSK Demodulation

In [E], a novel PSK demodulation method is proposed, where the second-order harmonic of the modulated signal is generated and a divide-by-2 frequency divider is used to recover the carrier signal.

In the conventional frequency multiplication approach, the spectrum of the entire modulated signal is frequency multiplied. In the proposed method, the selective harmonic generator produces only the second harmonic of a selected part of the modulated signal spectrum. By doing so, the generated second harmonic contains little modulation-related information, and the carrier can be extracted by using a divide-by-2 frequency divider.

A proof-of-concept 8-PSK receiver is designed and measured to verify the proposed method. The detailed description is presented in the sections below.
5.2 The Proposed 8-PSK Carrier Recovery

A functional diagram of the proposed 8-PSK carrier recovery is shown in Fig. 5.1. It consists of a comparator and two latches. The comparator has a signal input and a reference voltage. When the reference is set at 0, the circuit shown in this diagram works as a divide-by-2 frequency divider; when an appropriate reference voltage is supplied, the comparator acts as a selective harmonic generator.

5.2.1 Comparator-based Harmonic Generator

The principle of comparator based harmonic generation is illustrated in Fig. 5.2(a). When an amplitude-normalized sinusoidal signal \( \sin(2\pi ft) \) is input to the comparator, the signal level is compared with a reference voltage. The comparator gives a high/low voltage if the input is higher/lower than the reference voltage. On the left-hand side of Fig. 5.2(a), an output waveform is shown for reference \( V_{ref} = 0 \). The comparator gives a 50%-50% duty-cycle square wave output \( S_o(t) \); the \( k^{th} \) harmonic power of this output signal can be calculated by:

\[
H[k] = \frac{1}{2f} \int_{-1/2f}^{1/2f} |S_o(t) \times \sin(2\pi kf t)| \, dt 
\]  

(5.3)

This shows the power of each harmonic can be calculated from mathematical integration of \( S_o(t) \) and \( \sin(2\pi kf t) \). When reference voltage \( V_{ref} = 0 \), \( H[k] = 0 \) (when \( k \) is an even number), resulting in an output spectrum that contains only odd harmonic tones, as is shown on the left of Fig. 5.2(a).
(a) Output spectrum with different reference voltages

(b) Conversion gain of different harmonics for different input amplitudes

Figure 5.2: Comparator based harmonic generator
When reference voltage \( V_{ref} = 0.33 \), for example, the comparator generates a 33%-67% duty-cycle square wave output \( S_o(t) \). Due to the symmetry \( S_o(t) = S_o(-t) \), \( H[k] = 0 \) when \( k \) is an odd number, resulting in an output spectrum that contains only even harmonic tones, as is shown on the right of Fig. 5.2(a). Due to \( S_o(t) = S_o(-t) \), \( H[k] = 0 \) (when \( k \) is an odd number), resulting in an output spectrum that contains only even harmonic tones, as is shown on the right of Fig. 5.2(a).

This example shows that the comparator generates different harmonics at different reference voltage conditions, when a constant-amplitude sinusoidal signal is input. However, under realistic operation conditions, the amplitude of the input signal may vary while the reference voltage would be kept constant.

To understand the behavior of the comparator-based harmonic generator, a conversion gain \( CG[k] \) between input signal \( A_{in} \sin(2\pi ft) \) and its \( k^{th} \) harmonic is defined as, when the voltage is constantly set to 1:

\[
CG[k] = \frac{\int_{-1/2}^{1/2} |A_{in} \sin(2\pi ft) \times \sin(2\pi kft)| dt}{A_{in}}
\]

(5.4)

where \( A_{in} > 1 \) is the amplitude of the input signal, and it must be greater than the voltage. The conversion gains of different harmonics are plotted with different input signal amplitudes \( A_{in} \) in Fig. 5.2(b), while the voltage is kept as 1. The horizontal axis of the figure represents \( A_{in}/V_{ref} \), where \( V_{ref} = 1 \), and \( A_{in} > 1 \). When \( A_{in} < 1 \), the reference voltage is always higher than the input signal and the comparator gives a constant low-voltage output, thus no harmonic tone will be generated.

As shown in Fig. 5.2(b), for \( A_{in}/V_{ref} = 1.1 \), the conversion gain of the 3rd harmonic \( CG[3] \) and the 4th harmonic \( CG[4] \) reach their maximum. When \( A_{in}/V_{ref} = 1.25 \), \( CG[2] \) reaches maximum, \( CG[3] \) and \( CG[4] \) start to decrease.

### 5.2.2 Harmonic Generation from a Modulated Signal

An example is given to demonstrate the harmonic generation from a modulated signal input. The spectrum of a modulated signal is shown in the left part of Fig. 5.3, which is centered at \( f \). The spectrum shape of the modulated signal is defined by the pulse response of the transceiver. The spectrum density often reaches maximum at the carrier frequency \( f \), and reduces at frequencies away from center \( f + \Delta \).

A wideband signal can be considered as a sum of a huge amount of single tone sinusoidal signals with different amplitude. Each of these signals represents the spectrum density of the modulated signal at a certain frequency.
To illustrate the harmonic generation process, the spectrum density value is marked out at three frequencies: \( f \), \( f - \Delta \) and \( f + 2\Delta \). The reference voltage is selected to be \( A_{in}(f)/V_{ref} = 1.25 \), where \( A_{in}(f) \) is the amplitude of the sinusoidal signal at \( f \). So \( CG[2](f) \) reaches its maximum for the carrier tone. Thus, the spectrum density at \( 2f \) equals \( S(2f) = CG[2](f)A_{in}(f) \). For the tone locates at \( f - \Delta \), the \( CG[2] \) is much lower than at \( f \), because \( A_{in}(f)/V_{ref} < A_{in}(f - \Delta)/V_{ref} \). For the tone locates at \( f + 2\Delta \), the amplitude is smaller than \( V_{ref} \) thus no harmonic is generated at all. It can be seen that the comparator generates harmonics selectively, so that the carrier tone is enhanced at its 2nd harmonic \( 2f \), but tones at other frequencies are suppressed. By doing so, modulated information is taken away and what is left is a stronger tone, which is the second harmonic of the carrier.

![Harmonic generation from a modulated signal](image)

**Figure 5.3:** Harmonic generation from a modulated signal

### 5.2.3 Static Frequency Divider

Referring to Fig. 5.1, the comparator provides differential outputs “clkp” and “clkn”, which are used to drive two differential latches respectively. The latch block has a signal input port, a signal output port and a clock port. When the clock port is driven high, the signal at the input port is replicated at the output port; when
the clock port is driven low, the output would remain unchanged regardless of the signal at the input port.

The differential outputs of the second latch are linked back to the first one in an inverted manner, so that the two latches are linked as a ring with negative feedback. The output would flip once during each cycle of “clkp” and “clkn”, thus the output operates at half the frequency of “clkp” and “clkn”.

The outputs of the comparator “clkp” and “clkn” are 180 degree out of phase; the latch outputs have a 90 degree phase difference due to the frequency division.

### 5.2.4 Carrier Recovery Principle

As is mentioned in section 5.2.2, the comparator block generates a “modulation free” tone at 2nd harmonic $2f$, and its output “clkp” and “clkn” drives two latches, forming a frequency divider. Thus, the output of the latch operates at the carrier frequency $f$.

### 5.3 8-PSK Demodulator Implementation

The block diagram of the proposed 8-PSK receiver is presented in Fig. 5.4. The received IF signal is input to the comparator, and the differential output of the comparator is used to drive two latch blocks. The latch blocks are connected to form a divided-by-2 frequency divider. The carrier is recovered from this structure when a certain reference voltage is applied at the comparator.

When the latch blocks work as frequency dividers, there is a 90 degree phase difference between the latch outputs. BPFs are used at both latch outputs and the recovered carriers are fed to the mixers from which I and Q channel outputs are generated after LPFs.

A photo of the carrier recovery MMIC is shown in the lower part of Fig. 5.4. The size is $500 \mu m \times 650 \mu m$. It requires a single -3.3 V supply and an adjustable reference voltage. A test pad is preserved to measure the output of the comparator.

The schematic of the latch block is presented in Fig. 5.5(a). The emitter-coupled transistor pair $Q_{11}$ and $Q_{12}$ are controlled by the differential clock inputs “clkp” and “clkn”. When “clkp” is high, current passes through $Q_{11}$, and emitter-coupled transistor pair $Q_2$ and $Q_3$ transfer latch input signal to the differential latch output ports; when “clkp” is low, current passes through $Q_{12}$, and emitter-coupled transistor pair $Q_6$ and $Q_7$ keep the latch output unchanged until “clkp” becomes high.

The schematic of the comparator block is described in Fig. 5.5(b). Two stages of differential amplifiers are used. Both stages are designed to have high differential gain, so this comparator can be used as a comparator. When input signal is
higher than the reference voltage, the “clkp” port gives a low voltage output.

5.4  8-PSK Demodulator Measurement Result

5.4.1 Harmonic Generator Test

The comparator-based harmonic generator is tested. When a 15-Gbps 8-PSK IF signal is provided to the comparator, the spectrum of the “clkp” port output is measured. By adjusting $V_{ref}$, the carrier’s second harmonic is enhanced, and the spectrum is shown in Fig. 5.6(a). A 15 Gbps 8-PSK signal takes 10 GHz of bandwidth centred at $f_{IF} = 6$ GHz; this kind of IF signal is seen at the output of the comparator. Referring to the operation principle discussed in previous section, by adjusting $V_{ref}$ only the carrier tone’s second harmonic is enhanced. As shown in the spectrum, there is a narrow tone at 12 GHz, which indicates the recovered carrier is “modulation free”.

5.4.2 Phase Noise Measurement

From the generated harmonic, thus it can be used as the recovered carrier for the IF signal demodulation. BPFs are used to filter out the unwanted harmonics. The phase noise of the recovered carrier is measured and plotted in Fig. 5.6(b). It shows the phase noise at 100 KHz offset is -93 dBc/Hz.

5.4.3 Demodulated 8-PSK Signals

The 8-PSK demodulator as shown in Fig. 5.4 is tested with a 15 Gbps 8-PSK IF signal as input. Two passive mixers are used for the IF signal demodulation. The received 8-PSK constellation is shown on the left of Fig. 5.6(c), and the eye diagram of the I-channel is shown on the right. An oscilloscope (Tektronix DSA72004) samples the received constellation, and the BER is calculated from the oscilloscope’s stored samples. The oscilloscope can store maximum 106 samples, which limits the resolution of the BER measurement. There is no error detected during our test, which indicates the measured BER is lower than $10^{-6}$. 
Figure 5.4: 8-PSK receiver implementation
Figure 5.5: The schematic of the latch and comparator
5.4. 8-PSK DEMODULATOR MEASUREMENT RESULT

Figure 5.6: 8-PSK demodulator measurement

(a) Generated second harmonic of the carrier

(b) Phase noise of the recovered carrier

(c) Constellation diagram and eye diagram of the demodulated signal
This chapter is based on paper [C]

Even though high bandwidth is available at the E-band and V-band, modulations with high spectrum efficiency are always desired in order to achieve higher capacity. For example, with only 1dB extra $E_b/N_0$, a 16-QAM modem can support four times the capacity of OOK.

In a 16-QAM modulation, the information is modulated into both the amplitude and the phase of the carrier. This makes the 16-QAM modem more difficult to design than modems with lower spectrum efficiency such as OOK and QPSK. The 16-QAM baseband signal has four levels, thus an ADC is normally used by the demodulator. Limited by the maximum sampling rates of commercial available ADCs, the maximum symbol rate of a digital receiver is limited.

In this chapter, recently reported high capacity millimeter-wave wireless transmission systems are reviewed. A novel hardware efficient implementation of 16-QAM receiver baseband is presented. The receiver comprises a novel analog symbol time recovery block and a digital carrier recovery block. The receiver requires only one sample per symbol for demodulation, so the hardware requirement for implementing this receiver is lower than previously reported works. A proof-of-concept test shows that the receiver is capable of demodulating data with a rate of up to 5 Gbps.
Table 6.1: Recently reported high capacity millimeter wave wireless transmission systems

<table>
<thead>
<tr>
<th>Ref</th>
<th>Modulation</th>
<th>Data rate</th>
<th>Channels</th>
<th>Samples/Symbol</th>
<th>Band</th>
</tr>
</thead>
<tbody>
<tr>
<td>[38]</td>
<td>8-PSK</td>
<td>6 Gbps</td>
<td>4</td>
<td>3.2</td>
<td>E-band</td>
</tr>
<tr>
<td>[39]</td>
<td>16-QAM</td>
<td>2 Gbps</td>
<td>1</td>
<td>10</td>
<td>140 GHz</td>
</tr>
<tr>
<td>[40]</td>
<td>16-QAM</td>
<td>10 Gbps</td>
<td>8</td>
<td>3</td>
<td>E-band</td>
</tr>
<tr>
<td>[41]</td>
<td>16-QAM</td>
<td>6.3 Gbps</td>
<td>1</td>
<td>1.33</td>
<td>V-band</td>
</tr>
<tr>
<td>[C]</td>
<td>16-QAM</td>
<td>5 Gbps</td>
<td>1</td>
<td>1</td>
<td>E-band</td>
</tr>
</tbody>
</table>

6.1 High Spectrum Efficiency Transmission System Overview

To further improve spectrum efficiency, higher order modulation format is required. Recently reported high capacity wireless transmission demonstrations of millimeter-wave band are summarized in Table 6.1.

A digital baseband receiver is normally used for demodulating a higher-order modulated signal. A critical component in these digital receivers is the ADC. Commercially available ADCs have sampling rates of up to 5GSa/s today. These ADCs normally operate at a rate several times higher than the receiver bandwidth (oversampling). This limits the maximum bandwidth each digital receiver can handle.

A common solution is to split a wideband channel into multiple narrow-band channels, and use a single digital receiver for each channel. For example a 6 Gbps 8-PSK receiver based on four-channel frequency multiplexing is demonstrated, where the ADCs take 3.2 samples per symbol [38]; A 16-QAM system operating at 140 GHz is described in [39], which supports 2 Gbps in real-time. A 10 Gbps transmission over E-band based on eight-channel frequency multiplexing is demonstrated, where the oversampling factor is 3 [40]. Obviously, the multiple-channel solution requires more hardware and consumes more energy. A single channel 60-GHz 6.3 Gbps 16-QAM receiver is reported in [41], which utilizes a 3.5-GSa/s-ADC and reduces the oversampling rate down to 1.33.

In [C], a novel hardware-efficient 16-QAM receiver is presented. Unlike existing solutions, data is recovered based on a ‘single sample per symbol’ (oversampling=1) scheme. Thus we are able to directly demodulate a single channel wideband modulated signal with commercially available components. A 5 Gbps 16-QAM E-band radio link is implemented based on a 1.25 GSa/s dual channel ADC.
6.2 Sampling Rate and Symbol Time Recovery

The general structure of an I/Q sampling digital QAM receiver is shown in Fig. 6.1. An RF signal is down-converted to baseband I and Q signals using a mixer and an LO reference. To extract the phase and amplitude of the received RF signal, the I and Q signals are sampled by a dual-channel ADC. It is important that the ADC sampling rate $f_s$ shall be no less than $f_{sym}$, otherwise information will be lost in the sampling process. When $f_s = f_{sym}$, the samples must be taken at the optimum sampling position, which is in the middle of each symbol period as indicated by the dotted lines. A sample taken at an instant shifted away from the optimum sampling position may not represent the actual phase/amplitude of the symbol. The receiver can obtain samples at the optimum position only if the symbol timing information $f_{sym}$ is known to the receiver. The procedure of extracting the symbol timing information is called symbol time recovery (STR). STR can be performed by adjusting the ADC sampling clock $f_s$ in different manners:

- **Feed-forward STR**
  A feed-forward STR block takes either an RF signal or a baseband signal as input and generates a clock signal at symbol rate $f_{symbol}$. With a feed-forward STR, the symbol time information is known, so the ADC can operate at the rate $f_s = f_{sym}$, and no oversampling is needed.
Feedback STR
A feedback STR takes the ADC digitized data and processes it using a digital signal processor (DSP). By the digital signal processor, the ADC sampling clock \( f_s \) is adjusted until \( f_s = f_{\text{sym}} \). For a DSP to judge if the sample is taken at an optimal position, it may require multiple samples for each symbol. Thus \( f_s \) shall operate at a higher rate than \( f_{\text{sym}} \) at an initial stage.

STR with constant \( f_s \)
The ADC sampling clock may be difficult to control in practice; an alternative is to set the ADC sampling to a constant sampling frequency \( f_s \). In this case, oversampling, i.e. \( f_s > f_{\text{sym}} \) is normally required, so several samples are taken in each symbol period and the sample that is closest to the optimal sampling position is eventually used for demodulation. To improve the resolution of this approach, more samples can be generated by interpolation between the samples taken by the ADC.

For millimeter wave applications, symbol rates up to several Gsym/s may be used to utilize the available bandwidth. It is difficult to find high speed ADCs that can match the high symbol rates required by this kind of application. The first two approaches are preferred because there is no need for oversampling, so \( f_s = f_{\text{sym}} \).

6.3 The proposed “Single Sample per Symbol” Baseband Receiver

The structure of the proposed single sample per symbol 16-QAM baseband receiver is shown in Fig. 6.2. The receiver takes an IF signal as input and a mixer is
used to convert the IF down to baseband signals \( I'(t) \) and \( Q'(t) \). An analog symbol time recovery block is used to generate the recovered symbol clock \( f_{sym} \) and this drives a dual channel ADC, as mentioned in the previous section. The ADC takes a single sample at each symbol and the sampled data is processed in an FPGA, where the carrier recovery and demodulation are performed. In this section, both the analog STR and the FPGA-based demodulation are described in detail.

### 6.3.1 The Proposed Analog STR

![Proposed STR circuit](image)

The structure of the analog STR block used in the baseband receiver is illustrated in Fig. 6.3. This topology takes the received IF signal \( r_{IF}(t) \) as a reference input and generates symbol clock as output. It contains five different parts: a mixer, a LPF, a limiting amplifier, a CDR and a phase shifter.

The STR block takes an IF signal input \( r_{IF}(t) \), which can be expressed as:

\[
r_{IF}(t) = \sum_{k=-\infty}^{+\infty} A_k g(t - kT_s) \times \cos(2\pi f_{IF}t + \varphi_k)
\]

where \( A_k e^{j\varphi_k} = I_k + jQ_k \). For 16-QAM constellation, \( A_k \in \{ \sqrt{2}, 3\sqrt{2}, \sqrt{10} \} \) and \( \varphi_k \in [\tan^{-1}(\pm3), \tan^{-1}(\pm1/3), \tan^{-1}(\pm1)] \). A mixer takes \( r_{IF}(t) \) and its delayed replica \( r_{IF}(t - T_s) \) as inputs and gives output:

\[
S_1(t) = r_{IF}(t) \times r_{IF}(t - T_s)
\]

\[
= \sum_{k=-\infty}^{+\infty} A_k A_{k-1} g(t - kT_s) g[t - (k - 1)T_s] \cos(2\pi f_{IF}t + \varphi_k) \cos(2\pi f_{IF}(t - T_s) + \varphi_{k-1})
\]

\[
= \sum_{k=-\infty}^{+\infty} A_k A_{k-1} g^2(t - T_s) \cos(2\pi f_{IF}t + \varphi_k) \cos(2\pi f_{IF}t + \varphi_{k-1})
\]

\[
= \sum_{k=-\infty}^{+\infty} 0.5A_k A_{k-1} g^2(t - T_s) [\cos(\varphi_k - \varphi_{k-1}) + \cos(4\pi f_{IF}t + \varphi_k + \varphi_{k-1})]
\]
The mixer output passes through an LPF, which removes high frequency part of the signal, and its output can be expressed as:

$$S_2(t) = \sum_{k=\infty}^{+\infty} 0.5A_k A_{k-1} g^2(t - T_s) \cos (\phi_k - \phi_{k-1})$$  \hspace{1cm} (6.2)

As the signal $S_2(t)$ is input to the limiting amplifier, the output becomes a binary waveform $S_3(t)$ as:

$$S_3(t) = \begin{cases} 
1, & \text{if } S_2(t) > 0; \\
-1, & \text{if } S_2(t) \leq 0;
\end{cases}$$  \hspace{1cm} (6.3)

$S_3(t)$ crosses zero when $S_2(t)$ crosses zero, which occurs at $t = kT_s$. From $S_3(t)$ the symbol timing information $T_s$ can be extracted by using a commercially available clock data recovery (CDR) module Si5023. This module generates a clock from an approximate frequency reference, and then phase-aligns the clock signal to the transitions of the input data stream using a phase-lock loop. When the CDR locks to the input data, the symbol time clock is recovered without frequency offset. However, a phase shifter is used to cancel the constant phase offset of the recovered symbol clock.

The measured waveform $S_3(t)$ and the recovered symbol clock signal are shown in Fig. 6.4. This measurement shows that the proposed STR structure is able to recover a symbol clock. The proposed structure can recover symbol time clock frequencies between 1.245-1.255 GHz, which is related to the phase lock loop bandwidth of the CDR.
6.3. THE PROPOSED “SINGLE SAMPLE PER SYMBOL” BASEBAND RECEIVER

6.3.2 The FPGA-based Carrier Recovery (CR)

The carrier recovery module is one of the most important functional blocks in the receiver baseband. The structure of this FPGA-based CR subsystem is illustrated in Fig. 6.2. The received IF signal is down-converted by a mixer and the baseband signals after the LPF can be written as:

\[
I'(t) = \sum_{k=-\infty}^{+\infty} A_k \times g_{LPF}(t - T_s) \times \cos(\Delta f t + \varphi_k)
\]

\[
Q'(t) = \sum_{k=-\infty}^{+\infty} A_k \times g_{LPF}(t - T_s) \times \sin(\Delta f t + \varphi_k)
\]

(6.4)

where \(g_{LPF}(t)\) is the impulse response of the LPF, and \(\Delta f\) is the frequency difference between the transmitter LO and the receiver LO. In order to correctly extract the phase information \(\varphi_k\), the frequency difference \(\Delta f\) must be estimated through a so-called carrier recovery.
The CR signal process structure is shown in Fig. 6.5. The baseband signals $I'(t)$ and $Q'(t)$ are sampled by a dual channel ADC at a rate of $f_s$, two 12-bit sampled data $I[k]$ and $Q[k]$ are then generated.

Assuming these samples are taken at optimized positions, this can be expressed as:

$$
I[k] = I'(t)|_{t=kT_s} = A_k \times \cos (\Delta f k T_s + \varphi_k) \\
Q[k] = Q'(t)|_{t=kT_s} = A_k \times \sin (\Delta f k T_s + \varphi_k) \tag{6.5}
$$

Limited by the maximum rate of the FPGA input ports, these sampled data are combined into four sample groups (i.e. $I[4k], I[4k+1], I[4k+2], I[4k+3]$) and transferred to the FPGA at a rate of $f_s/4$. In the FPGA, these samples are processed in four parallel tracks. In each track, I and Q samples are combined into a complex sample and sent to a complex-number multiplier. The multipliers work as de-rotators; given an input sample data $I[k] + jQ[k]$, the multiplier generates de-rotated data $I_r[k] + jQ_r[k]$ as:

$$
I_r[k] + jQ_r[k] = (I[k] + jQ[k])e^{j\theta_m} \tag{6.6}
$$

where $e^{j\theta_m}$ is the output of a LUT (look up table). The de-rotation as shown in Eq. 6.6 is performed at each FPGA clock cycle.

The goal of carrier recovery is to estimate the phase term $\Delta f k T_s$ in Eq. 6.5, and to generate a de-rotation signal $e^{j\theta_m}$ that satisfy:

$$
e^{j\theta_m} \approx \Delta f k T_s \\
\text{when} \ \quad 4m \leq k \leq 4m + 3 \tag{6.7}
$$

When the condition Eq. 6.7 is met, the phase term $\Delta f k T_s$ in Eq. 6.5 is removed, so that the de-rotated constellation $I_r[k] + jQ_r[k]$ in Eq. 6.6 can be written as:

$$
I_r[k] + jQ_r[k] = (I[k] + jQ[k])e^{j\theta_m} \\
= A_k \times [\cos (\varphi_k) + j \sin (\varphi_k)] \\
= A_k e^{j\varphi_k} \tag{6.8}
$$

A “phase detector and slicer block” is used to extract data from $I_r[k] + jQ_r[k]$, as shown at the top right of Fig. 6.5. The slicer takes an ideal constellation point (as the solid point in the figure) which is closest to $I_r[k] + jQ_r[k]$, and determines
6.3. THE PROPOSED “SINGLE SAMPLE PER SYMBOL” BASEBAND RECEIVER

The demodulated data output based on this solid point. The phase detector outputs the phase difference ($\varepsilon$) between $I_r[k] + jQ_r[k]$ and the solid constellation point.

The sum of the phase differences $\varepsilon'[m]$ from four tracks is input to a second-order CR loop, as illustrated to the right of Fig. 6.5. The loop reduces the phase differences $\varepsilon'[m]$ by producing a loop output $N_L[m]$ each FPGA clock cycle (FPGA clock rate is $f_s/4$). An LUT is a memory block that stores $2^8$ different pre-calculated values of the de-rotation signal $e^{j\theta_m}$. By using the loop output $N_L[m]$ as an address, a corresponding de-rotation signal $e^{j\theta_m}$ is selected and fed back to the multipliers.

At an initial stage, the value $\varepsilon'[m]$ is high and the loop iterates an output $N_L[m]$ each FPGA clock cycle, so that $\varepsilon'[m]$ is reduced. When $\varepsilon'[m]$ is smaller than a certain threshold value, the condition Eq. 6.7 is met. This indicates the loop has converged and the carrier is recovered.

The loop contains three accumulators (Acc1, Acc2 and Acc3) and their output $N_1[m], N_2[m]$ and $N_3[m]$ are generated by iterating following expressions:

$$
N_1[m] = N_1[m-1] + K_p \varepsilon'[m] \\
N_2[m] = N_2[m-1] + K_f \varepsilon'[m] \\
N_3[m] = N_3[m-1] + N_2[m]
$$

(6.9)

Combining $N_2[m]$ and $N_3[m]$, the loop output can be written as:

$$
N_L[m] = N_1[m] + N_3[m] = N_1[m-1] + K_p \varepsilon'[m] + N_3[m-1] + N_2[m-1] + K_f \varepsilon'[m]
$$

(6.10)

From this expression, it can be seen that the loop can be adjusted by its characteristic parameters $K_p$ and $K_f$. 

![Figure 6.6: Simulated carrier recovery loop with different parameters](image)
The CR loop is simulated using three sets of loop parameters \( K_p \) and \( K_f \), and the output of one phase detector \( \varepsilon \), loop input \( \varepsilon'[m] \), the de-rotation signal \( e^{i\theta_m} \) and frequency offset \( \Delta f/kT_s \) are plotted in Fig. 6.6.

The three sets of loop parameters used are:

- \( K_p = 2^{-8}, K_f = 10^{-4} \)
  As shown in Fig. 6.6(a), with this setting, the loop converge after 4 \( \mu s \) and and phase error \( \varepsilon'[m] \) reduces correspondingly, which indicates the loop is converged.

- \( K_p = 2^{-8}, K_f = 10^{-3} \)
  As shown in Fig. 6.6(b), compared with the case above, \( K_f \) is higher and this makes the accumulators Acc2 and Acc3 more sensitive to the loop input \( \varepsilon \). The result of this setting is that the loop has a tendency to over-compensate the phase difference, which changes the polarity of \( \varepsilon'[m] \) rather than reduces its absolute value. The result is that it takes longer for the loop to converge.

- \( K_p = 2^{-4}, K_f = 10^{-4} \)
  As shown in Fig. 6.6(c), \( K_p \) is set at a high value. Unlike the case mentioned above, only accumulator Acc1 is affected by \( K_p \). This makes the loop very sensitive and it becomes unstable. The result is that the loop cannot converge.

### 6.4 16-QAM Baseband Receiver Measurement

#### 6.4.1 Analog STR Block Test

The proposed analog STR block is tested with a setup as shown in Fig. 6.7(a). An arbitrary waveform generator (AWG) is used to generate baseband signals from a repeated PRBS data sequence. The baseband signals are converted to IF with a mixer. The proposed analog STR takes the received IF signal as input and generates a recovered symbol time clock \( f_{sym} \). A phase shifter is used to adjust the phase of this recovered symbol clock, so the phase of ADC sampling clock is tuned. Another mixer is used to convert the IF signal back to the baseband signals, and its outputs are filtered by LPF before being sampled by the ADC. In this setup, the up-convert and down-convert mixers share the same LO source, so there is no need to perform a carrier recovery. An FPGA is used for analyzing the ADC sampled data and calculating the bit-error rate (BER) of the received data.
Figure 6.7: Analog STR block test
Waveforms from different parts of the receiver are plotted in Fig. 6.7(b). A baseband signal is obtained by using a mixer and an LPF that converts the received IF signal. The ADC samples the received baseband signal using the recovered sampling clock. The optimum sampling position is at the center of each symbol as marked by a dot in the received baseband signal in Fig. 6.7(b). The ADC samples at the rising edge of the recovered sampling clock, and the position difference between the rising edge and optimal sampling position is denoted as $\theta$. By adjusting the phase shifter in Fig. 6.7(a), this sampling position offset can be tuned. When $\theta = 0$, the ADC is sampling at the optimal position; when $\theta = T_{\text{sym}}/2$, the ADC is sampling between the symbols. The BER is measured when the sampling position is tuned at different values to study the system tolerance as a function of the sampling position offset, and the measurement result is shown in Fig. 6.7(c). When $-200 \text{ ps} \leq \theta \leq 200 \text{ ps}$, there is no error detected, which means that the BER is better than the FPGA based bit-error rate tester (BERT) resolution ($10^{-10}$). When $|\theta| \geq 200 \text{ ps}$, the BER increases to more than $10^{-8}$. Compared with the symbol period $T_{\text{sym}} = 800 \text{ ps}$, the sampling position can vary within half of the symbol period without triggering errors.

### 6.4.2 FPGA based CR Block Test

A similar setup is used to test the FPGA based CR block, which is illustrated in Fig. 6.8(a). It can be seen that, unlike the setup mentioned in previous section, in this setup, the up-convert and down-convert mixers are using different LOs, but the frequency difference $\Delta f$ is controlled. An FPGA-based CR block is used for demodulation and calculating the bit-error rate (BER) of the received data. To test the CR block, the modulator local oscillator is set at 7000 MHz, and the demodulator local oscillator is set between 6999 MHz and 7001 MHz with a 0.1 MHz increment. The BER is monitored at different frequency offsets $\Delta f$. The CR loop uses the following parameters: $K_p = 2^{-8}, K_f = 10^{-3}$. The measurement result is plotted in Fig. 6.8(b). When $-0.3 \text{ MHz} \leq \Delta f \leq 0.3 \text{ MHz}$, error-free transmission is obtained in the measurement (BER less than $10^{-10}$). The CR loop will converge provided that the frequency offset is less than 1 MHz. In Fig. 6.8(c), the constellation diagram of the FPGA received samples and the constellation of the carrier recovered signal is shown. These constellation are taken when $\Delta f = 0.1 \text{ MHz}$. 
6.4. 16-QAM BASEBAND RECEIVER MEASUREMENT

(a) Setup for the carrier recovery block test

(b) Measured BERs where different offsets are applied between transmitter and receiver

(c) Measured constellation diagrams before and after carrier recovery

Figure 6.8: FPGA based CR block test
Chapter 7

Conclusion and Future Work

7.1 Conclusion

In this thesis, high data rate baseband modem solutions for wireless communication using OOK, D-QPSK/QPSK, 8-PSK and 16-QAM modulation are presented. Six proof-of-concept modem implementations are described in detail. These modem solutions are optimized for high capacity based on hardware components currently available on the market. These solutions have the ability to support data rates of up to 100 Gbps, given enough bandwidth and more advanced semiconductor devices. The designs mentioned in this thesis constitute an efficient portfolio of baseband modem solutions for different wireless communication applications.

From a system perspective, a high data rate is not the only requirement to fulfill; normally there are also limitations on available bandwidth and/or available power. These limitations can be translated into modem requirements in terms of spectrum efficiency and power efficiency. However, there is a trade-off between these two requirements. High spectrum efficiency requires that the system is able to process a more complicated signal, thus more hardware resources are needed, which consumes more power and reduces its energy efficiency.

To make a comparison between the modem solutions mentioned in the thesis, the energy efficiency of these works are listed and plotted in Fig. 7.1. The OOK modem solution has a high energy efficiency because it is implemented in a MMIC process, and the structure is made with less than 20 transistors. On the other hand, the spectrum efficiency of this OOK solution is relatively low. Several D-QPSK modem solutions have been discussed in this thesis that use FPGAs to build the modulator. An FPGA consumes more than 3 W, thus the energy efficiency of the proposed D-QPSK solution is only 2.43 Gbps/W. The coherent QPSK modem solution requires no differential encoding and can reach a data rate of 12 Gbps, consuming only 0.4 W. This makes the QPSK solution more attractive than the D-
QPSK modem solution. The 8-PSK modem solution mentioned in this thesis uses more than 1 W. The reason for this is the extra blocks that are needed on the demodulator side of the extra binary data from the multi-level baseband signal. Its energy efficiency is 16.3 Gbps/W. The 16-QAM modem mentioned in the thesis uses analog STR that reduces the hardware resource requirement. However, the FPGA and ADC used in the demodulator still require a 19 W power supply. This means this method’s energy efficiency is only 0.26 Gbps/W.

In principle, high spectrum efficiency modulation transmission makes the modulation and demodulation more complicated. More hardware resources and/or more energy are needed for these transmission systems, and this leads to limited energy efficiency as indicated in Fig. 7.1. On the other hand, the D-QPSK and QPSK work demonstrated in this thesis gives a good example, showing that it is possible to improve energy efficiency with a novel structure. For the 8-PSK and 16-QAM modulations, the energy efficiency of current solutions has great potential for improvement.

<table>
<thead>
<tr>
<th>Modulation Scheme</th>
<th>Max Datarate (Gbps)</th>
<th>Power Consumption (W)</th>
<th>Energy Efficiency (Gbps/W)</th>
</tr>
</thead>
<tbody>
<tr>
<td>OOK</td>
<td>14</td>
<td>Mod: 0.25 Demod: 3e-4</td>
<td>56</td>
</tr>
<tr>
<td>D-QPSK</td>
<td>10</td>
<td>Mod: 3.93 Demod: 0.18</td>
<td>2.43</td>
</tr>
<tr>
<td>QPSK</td>
<td>12</td>
<td>Mod: 0.18 Demod: 0.22</td>
<td>30</td>
</tr>
<tr>
<td>8-PSK</td>
<td>15</td>
<td>Mod: 0.18 CR: 0.43 + DeMod:0.5</td>
<td>16.3</td>
</tr>
<tr>
<td>16-QAM</td>
<td>5</td>
<td>DAC: 5 Mod: 0.18 Demod: 19</td>
<td>0.26</td>
</tr>
</tbody>
</table>

Figure 7.1: Comparison between different modem implementations in this thesis
7.2 Future Work

7.2.1 Improving the Energy Efficiency of 8-PSK and 16-QAM Modems

The 8-PSK and 16-QAM modems mentioned in this thesis has lower energy efficiencies than the other modem designs do. One reason is that both 8-PSK and 16-QAM require data interfaces that can convert a binary data stream into a multi-level signal. High power ADCs and DACs are required to construct such data interfaces. It is important that there will be more work conducted in the area of improved performance and energy efficiency regarding data interfacing when high order modulation modems are used. The results would help to complete the modem portfolio as mentioned in this thesis.

7.2.2 Towards a Complete System Solution

The solutions mentioned in this thesis are mainly focused on the baseband modem. System design is critical in order to meet the application requirement, which requires not only knowledge about modulation and interfaces as discussed in this thesis, but also a good insight into the entire wireless system including antenna, power amplifier, power supply, radio link deployment, etc.

For example, the link budget of a wireless communication system is highly dependent on the choice of antenna. The use of high gain antennas would dramatically reduce the sensitivity requirement of the radio front-end, and thus affect the choice of the modem. However, a high gain antenna has a smaller radiation angle and makes it more difficult to align the point-to-point link during link deployment.

I have taken initial steps to make investigations within the field of radio communication, which goes beyond the theme of this thesis—modem design—during my PhD study period. Although it is outside the scope of this thesis, I would like to mention that I have conducted research and development of a 77 GHz antenna leads for paper [a], and the patents [c][d][e]. Furthermore, I have carried out innovative work in automatic point-to-point radio alignment, which lead to the patent [c]. I will continue research with such a broad scope in order to develop proposals for completely new system solutions for various applications.

7.2.3 Analog Signal Processing in other Applications

The modem solutions discussed in this thesis share the common concept of using analog circuits to perform complicated digital signal processing. As mentioned in this thesis, the limited performance of analog-to-digital converters and DSPs are
the main hurdles to pursuing high data rate wireless communication. The solutions discussed in this thesis replace such traditional structures with analog signal processing structures. The same concept can be used in applications in other areas, including radar signal processing, microwave tomography (take paper [d] as an example), etc. I will stay open-minded and continue to investigate problems in various areas, and apply my knowledge to search for solutions.
Acknowledgements

For me it has been a journey of more than five years towards this PhD. I could not have reached this far without help and support from several people. I would therefore like to express my gratitude to those who have supported and encouraged me over the years.

The foremost person to acknowledge would be my supervisor Prof. Herbert Zirath. I met Herbert at RFIT on Dec. 10. 2007 in Singapore, exactly 6 years before the printing date of this thesis. At that time, I had no idea how lucky I was to encounter such a great scientist. He offered me an opportunity to explore the fascinating world of high frequency circuit design freely, even though my background is in an entirely different area. Over these last few years, he has supported me with his encouragement, patiently guided me with his profound multi-disciplinary knowledge, and pushing me forward with his enthusiasm. Also, I would like to thank my co-supervisor Docent. Thomas Swahn and my industry mentor Dr. Yinggang Li for the fruitful discussions, patient explanations and countless hours of proofreading and valuable suggestions. I also want to thank Prof. Christer Svensson for his guidance in algorithm simulation and supervising, and Prof. Thomas Eriksson and Dr. Paul Hallbjörner for the proofreading and technical discussions.

I owe my thanks to my colleagues at Microwave Electronics Laboratory, especially my office mates: Mattias, Yu, Yogesh, Lai, Dan and Vessen for creating such a friendly working environment. I would like to thank Marcus, Sten, Bing, Klas and Olle for being good friends.

In addition, I want to express my sincere gratitude to my colleagues in the industry: Jens and Džana in wyberry Technologies, who work so hard to help me commercialize my patents. Also my colleagues at TLU, Ericsson Research, I want to thank Jingjing, Mingquan, Ola, Bengt-Erik and Lei for the inspiring discussions, as well as Thomas Lewin and Jonas Hansryd as the team leaders.

Lastly but most importantly, I want to acknowledge my beloved wife Wen, who shares my joy and sorrow with all her love and faith in me. I am also thankful to my parents who always allowed me to explore my interests freely, and supported me with their patience. I want to say thanks to My, Joachim and Binz family for being wonderful friends.

This work has been financed by the Swedish Foundation for Strategic Research (SSF, project RE2007-0076:1) and Vinnova (project 2007-02955 & 2009-02992).
Bibliography


Errata for PhD Thesis:
HIGH DATARATE SOLUTIONS FOR NEXT GENERATION
WIRELESS COMMUNICATION

Zhongxia (Simon) He
Jan. 7, 2014

• Page 1, Chapter 1, Section 1.1, Paragraph 1:
The digital universe is made up of images and videos on mobile phones uploaded to YouTube, ... and texting as a widespread means of communications.
Correction:
The digital universe, as is defined in [E01], “is made up of images and videos on mobile phones uploaded to YouTube, ... and texting as a widespread means of communications.”

• Page 1, Chapter 1, Section 1.1, Paragraph 2, Line 3:
“Citing their result, Fig. 1.1 shows ...”
Correction:
“Citing their result [E01], Fig. 1.1 shows ...”

• Page 2, Chapter 1, Section 1.1, Paragraph 3, Last sentence:
On the other hand, the remaining 87% of the digital universe is transient-phone calls that are not recorded, digital TV images that are watched and not saved, packets temporarily stored in routers, digital surveillance images purged from memory when new images come in, and so on.
Correction:
On the other hand, the remaining 87% of the digital universe is transient-phone calls that are not recorded, ... and so on [E01].”

Reference