Integrated Wideband Self-Interference Cancellation Techniques for
FDD and Full-Duplex Wireless Communication

Tong Zhang

A dissertation
submitted in partial fulfillment of the
requirements for the degree of

Doctor of Philosophy

University of Washington
2017

Reading Committee:
Jacques C. Rudell
David J. Allstot
Sumit Roy

Program Authorized to offer degree:
Department of Electrical Engineering
Abstract

Integrated Wideband Self-Interference Cancellation Techniques for FDD and Full-Duplex Wireless Communication

Tong Zhang

Chair of the Supervisory Committee:
Professor Jacques C. Rudell
Department of Electrical Engineering

The continued demand for higher levels of wireless access and increased data rates for a variety of applications from mobile smart phones to back haul point-to-point communication, continues to drive research that enables new spectrum opportunities, reduces form factor and lowers the cost of hardware solutions. The current RF spectrum, which is often referred to as the frequency band from 1-6GHz, has become increasingly crowded with only a limited amount of unused and unlicensed spectrum. This dissertation explores and implements, single-chip hardware front-end solutions which the specific aim to increase data rates for each single-user using two techniques: a. Using in-band full-duplex radio techniques with self-interference cancellation; b. High-speed communication using large bandwidths available at mmWave frequencies. The advantages,
challenges and achievements associated with the two proposed techniques will be described in the following paragraphs.

First, in-band full-duplex communication potentially increases spectral efficiency within existing RF standards. This will allow the combination of dedicated transmitting and receiving bands into a single band which would more than double the spectral efficiency. However, this leads to an extremely challenging problem associated with transmitter self-interference cancellation. To date, two full-duplex chips have been designed, fabricated and tested, each with a self-interference cancellation function. The first IC is an analog full-duplex front-end for Bluetooth (BLE) applications. Inside the proposed chip, a self-interference cancellation (SIC) circuitry, a low-power receiver, combined with a harmonic-rejection power amplifier (HRPA) are implemented to reduce the transmitter-to-receiver self-interference, and enable full-duplex operation. These techniques were applied towards the realization of a prototype silicon device which implements a tunable self-interference mitigation canceler function with a current-mode low-noise amplifier (LNA) and passive-mixer based front-end, and a power amplifier topology to reduce out-of-band emissions. This chip was fabricated in 40nm 6-metal stack CMOS process to achieve more than 30dB measured self-interference cancellation over 4MHz bandwidth, and an integrated power amplifier (PA) which suppresses the 3rd and 5th harmonics by 30dB and 15dB, respectively. The PA delivers a maximum output power of +14dBm with a drain efficiency of 33%. The self-interference cancellation circuitry utilizes an active area of 131×112.5 µm², has a power consumption of 0.25mW, and degrades the receiver noise figure (NF) by less than 0.6dB. The second IC is a transceiver front-end, which includes a dual-injection path self-interference (SI) cancellation circuitry to enable wideband full-duplex communication with a high-power transmitter. The proposed SI cancellation circuitry is implemented using: (1) one feedforward
cancellation path containing a 5-tap analog adaptive filter (AF) between the transmitter (TX) output and the receiver (RX) input; (2) a second cancellation path containing a 14-tap low-frequency AF with a point of injection at the RX baseband output; (3) a phase noise cancellation method which reduces the phase noise (PN) associated with the down-conversion in the BB cancellation path for the TX SI signal; (4) an integrated noise canceling power amplifier (PA). A prototype 40nm TSMC device was fabricated which demonstrates more than 50dB SI cancellation over 42MHz bandwidth and a 10dB attenuation of TX SI PN in the RX signal path. The two cancelling filters dissipates 11.5mW, with a measured $P_{\text{1dB}}$ and $IIP_3$ of 27/26.5dBm and 36/34.5dBm, respectively. The RX noise figure is degraded by less than 1.55dB when both cancelers are enabled. The PA has a measured output power of $P_{\text{1dB}}/P_{\text{sat}}$ of 25.1/26.5dBm, respectively. The total chip die area is 3.5 mm$^2$ with an overall transceiver power consumption of 49mW excluding the integrated power amplifier.

Second, although the lower frequency RF band appears saturated, the vast available spectrum at mmWave frequencies (30 – 300 GHz) presents a potentially attractive solution for high-speed communication. However, communication at mmWave frequencies brings up many new challenging problems for designers and attempting to realize a wideband high accuracy wideband quadrature generator is one of them. One mmWave IC, which includes an integrated two-stage polyphase filter (PPF) with feedback control, is proposed for quadrature local oscillator (LO) generation at millimeter-wave frequencies. To minimize the in-phase (I) and quadrature (Q) mismatch, the second stage of the PPF utilizes triode-region NMOS transistors to implement variable resistors where the resistance is precisely controlled by modulating the shared gate-to-source bias voltage at the gate of NMOS devices. The gate bias voltage of the triode region devices is set by a feedback loop and changes according with variations in process, voltage, and
temperature (PVT). A prototype quadrature signal generator, employing this PPF design, is integrated in 28nm LP CMOS process. A worst-case measured phase/amplitude imbalance of 2°/0.32dB (TT dies) and 2.2°/0.55dB (SS dies) is reported over 7GHz bandwidth for a fixed control current ($I_{\text{Ctrl}}$). By retuning $I_{\text{Ctrl}}$ every 7GHz, this IQ generator would maintain the measured quadrature accuracy from 55-to-70 GHz. The core area occupied by the IQ generator circuitry is 20µm × 40µm and the device consumes less than 192µW, of which 120/72 µW comes from the feedback control-loop / Opamp, respectively. The proposed PPF method has a simulated input impedance of 150Ω in-parallel with 18fF.
Contents

Abstract ......................................................................................................................................... iii

List of Figures ............................................................................................................................... xi

List of Tables ............................................................................................................................ xviii

Acknowledgements .................................................................................................................... xix

1. Introduction ........................................................................................................................... 1
   1.1. Research Objectives ......................................................................................................... 4
   1.2. Overview and Organization of The Thesis ...................................................................... 5

2. Self-Interference Cancellation (SIC) In FDD/ Full-Duplex System And State-of-the-art
   SIC Techniques ............................................................................................................................. 7
   2.1 Issues with TX Self-Interference Cancellation in FDD/ Full-Duplex Radios ................. 8
   2.2 State-of-the-art Self-Interference Cancellation Techniques ........................................... 10
   2.3 Four-Port-Transformer-Based Canceler ......................................................................... 20

3. A Full-Duplex Front-End with a Low-Noise Self-Interference Cancellation And
   Harmonic Rejection Power Amplifier ...................................................................................... 24
   3.1 Introduction .................................................................................................................... 24
   3.2 System Analysis Of An Example Full-Duplex Radio..................................................... 28
   3.3 Proposed Self-Interference Mitigation and Harmonic Rejection Technique ................. 29
      3.3.1 Proposed Self-Interference Mitigation Approach ................................................... 29
      3.3.2 Harmonic-Rejection Power Amplifier .................................................................... 37
3.4 Circuit Implementation of the Cartesian Canceler ................................................................. 39

3.5 Measurement Results ........................................................................................................ 42

3.5.1 Standalone RX Measurement Results ...................................................................... 44

3.5.2 Self-interference Cancellation Measurement Results ............................................. 46

3.5.3 Harmonic Rejection PA Measurement Results ....................................................... 52

3.6 Conclusions ................................................................................................................... 56

4. A Wideband Dual-Injection Path Self-Interference Cancellation Architecture for
Long-Range Cellular Full-Duplex Transceivers ....................................................................... 57

4.1 Introduction .................................................................................................................. 57

4.2 Dual-Injection Path Full Duplex Architecture ............................................................ 62

4.3 Proposed System Design Consideration ...................................................................... 63

4.3.1 The First Coarse RF Canceler .............................................................................. 63

4.3.2 The second fine baseband canceler ...................................................................... 73

4.4 Circuit Implementation of the Dual-Injection Path Full Duplex transceiver Front-end 77

4.4.1 RF Canceler and RX LNA .................................................................................. 77

4.4.2 Baseband canceler ............................................................................................... 79

4.4.3 Power amplifier .................................................................................................... 80

4.4.4 Other blocks ........................................................................................................ 82

4.5 Measurement Results ................................................................................................. 83

4.5.1 Standard RF Measurements ............................................................................... 84
4.5.2 SI Cancellation Measurements ................................................................. 87

4.6 Conclusion ........................................................................................................ 91

4.7 Appendix ........................................................................................................... 92

5.  A Precision Wideband Quadrature Generator ................................................. 94

5.1 Introduction ....................................................................................................... 94

5.2 State-Of-The-Art: Quadrature Generation ....................................................... 98

5.3 Proposed PPF-Based Quadrature Generator .................................................. 102

5.4 Proposed PPF Design Considerations ............................................................. 107

5.4.1 Input Impedance ............................................................................................ 107

5.4.2 Insertion loss ................................................................................................ 108

5.4.3 Parasitic capacitance ..................................................................................... 111

5.4.4 Opamp Design .............................................................................................. 116

5.4.5 Noise ............................................................................................................. 119

5.4.6 Layout Techniques ....................................................................................... 120

5.5 Measurement Circuit Implementation ............................................................. 121

5.6 Measurement Results ...................................................................................... 122

5.7 Conclusion ....................................................................................................... 129

6.  Conclusions and Scope for Future Work and Applications .............................. 130

6.1 Thesis Summary ............................................................................................... 130

6.1.1 Full-Duplex Radio ...................................................................................... 130
6.1.2 Communication at mmWave Frequencies ............................................................ 131

6.2 Future Directions.......................................................................................................... 131

6.2.1 Single-Chip Full-Duplex Radio with Integrated Circulator................................. 133

6.2.2 Simultaneous Stimulation and Sensing for BCI System ...................................... 134

Bibliography .............................................................................................................................. 135
List of Figures

Fig. 1-1 International technology roadmap for semiconductors (ITRS) wireless roadmap ........... 1
Fig. 1-2. Current S-Band spectrum usage.................................................................................. 2
Fig. 1-3. Two methods to address the problem of crowded RF spectrum. (a) in-band full-duplex communication, (b) communication at mmWave frequencies...................................................... 3
Fig. 2-1 Potential sources of RX interference form TX self-interference, in FDD radios .......... 7
Fig. 2-2 Active TX leakage suppression using. (a) feed-forward (FF) techniques, (b) two-point FF technique, with cancellation at both the LNA input & output, (c) feedback (FB) loop incorporates the RX down-converter, (d) a separate feedback loop between the LNA and the RX down-converter................................................................. 11
Fig. 2-3 Active TX leakage filtering techniques using. (a) high-Q passive filter using bond wires, (b) active bandpass sink filter, (c) LMS adaptive filter........................................................ 14
Fig. 2-4 Self-Interference cancellation techniques proposed in full-duplex system. (a) antenna cancellation technique, (b) passive mixer first full-duplexing LNA, (c) RF frequency-domain equalization, (d) Passive vector modulator down-mixer.................................................. 16
Fig. 2-5 Transmitter self-interference cancellation using integrated duplexers...................... 17
Fig. 2-6 TX-to-RX isolation using circulators........................................................................... 18
Fig. 2-7. Transformer-based passive self-interference mitigation system............................ 20
Fig. 3-1 Conceptual diagram and system analysis of the proposed transmitter self-interference cancellation and harmonic rejection front-end ......................................................... 26
Fig. 3-2 Circuit schematic of the proposed Polyphaser Filter (PPF) based Cartesian canceler.... 29
Fig. 3-3 Noise model and simulation results of the proposed canceler. (a) schematic model of the conceptual diagram, (b) simulation results of the receiver noise figure, canceler output equivalent noise resistance vs. canceller gain settings. ........................................................................................................ 31
Fig. 3-4 Effect of insertion loss using the proposed canceler. (a) schematic model of the PPF with loading capacitor \( C_L \), (b) simulation results of the PA efficiency degradation versus PPF insertion loss. ........................................................................................................................................ 34
Fig. 3-5 Architecture of the proposed harmonic-rejection power amplifier ......................... 38
Fig. 3-6 Detailed block diagram of the full-duplex front-end with low-noise same-channel self-interference cancellation and harmonic-rejection power amplifier. ................................................................. 39
Fig. 3-7 Proposed full-duplex TSMC 40nm chip micrograph .................................................. 41
Fig. 3-8 Full-duplex radio (AFE) measurement setup. ............................................................... 42
Fig. 3-9 Measured isolation of two leakage media using either: 1) a circulator or 2) two antennas. ............................................................................................................................................... 43
Fig. 3-10 Measured receiver performance. (a) input matching \( S_{11} \), (b) gain \( S_{21} \), (c) IIP3 versus offset frequency, (d) input-referred \( P_{1dB} \) versus offset frequency................................................................. 44
Fig. 3-11 Measured receiver noise figure performance with the canceller disabled (off).............. 45
Fig. 3-12 Measurement results of TX leakage suppression using a single CW signal with two leakage media, a circulator and two antennas .......................................................................................................................... 47
Fig. 3-13 Measurement results of TX leakage suppression versus bandwidth using both a circulator and two antennas ........................................................................................................................................ 48
Fig. 3-14 Measured TX suppression using a modulated GFSK signal. Baseband output spectrum with both cancellation enabled and disabled, (a) measurement with a circulator, (b) measurement with two antennas. .............................................................................................................. 49
Fig. 3-15 Measured RX noise figure with 0 dBm blocker with the canceller enabled and disabled relative to the baseband frequency. This is measured with TX-to-RX coupling of -20dB and TX-to-RX offset frequency of 100 kHz. .......................................................... 51

Fig. 3-16 Measured PA output spectrum with harmonic rejection enabled and disabled. .... 52

Fig. 3-17 Measured PA output power and drain efficiency............................................. 53

Fig. 3-18 Measured PA output spectrum at peak output power with a modulated Bluetooth GFSK input signal.......................................................... 54

Fig. 4-1 SI cancellation in FD communication (a) SI cancellation requirement in a FD radio with TX output power of 30dBm and RX required sensitivity of -80dBm, (b) SI cancellation distributed in the receiver chain.......................................................... 58

Fig. 4-2 Channel response of the leakage path (a) multiple time delay paths of the leakage signal coupled from PA output to LNA input, (b) impulse response of a discrete circulator from Meca Electronics, (c) phase response of the measured discrete circulator, (d) frequency response for each time delay versions of the leakage signal. .......................................................... 59

Fig. 4-3 Proposed dual-path self-interference cancelling architecture ......................... 61

Fig. 4-4 Proposed dual-path self-interference cancelling architecture ......................... 64

Fig. 4-5 Design tradeoff between number of taps, canceler noise and cancellation BW, simulation performed in Matlab Simulink and Cadence specreRF ................................................. 65

Fig. 4-6 The linearity of the RF canceler and an additional attenuator is introduced at the input of the RF canceler to improve linearity......................................................... 67

Fig. 4-7 RF canceler linearity versus attenuator capacitor C$_1$ (a) conceptual diagram, (b) simulation results of capacitor value versus RX noise figure and canceler IIP$_3$ ........................................ 68
Fig. 4-8 The linearity of the RF canceler and an additional attenuator is introduced at the input of the RF canceler to improve linearity................................................................. 69

Fig. 4-9 Baseband canceler top level block diagram ......................................................................................................................... 72

Fig. 4-10 Baseband canceler for attenuation of TX carrier signal .......................................................... 73

Fig. 4-11 Baseband canceler for attenuation of TX leakage signal reciprocal mixing with LO phase noise in RX signal path ................................................................................................................. 74

Fig. 4-12 A system diagram of the 40nm CMOS full-duplex system ................................................ 75

Fig. 4-13 Transistor level implementations of 5-tap FIR-based RF canceler and the low noise amplifier ................................................................................................................................. 76

Fig. 4-14 Transistor level implementations of 14-tap FIR-based BB canceler and the summing stage .............................................................................................................................................. 78

Fig. 4-15 Transistor level implementations of a three-stage Class AB power amplifier.............. 79

Fig. 4-16 Wideband full-duplex with dual-path SI cancellation TSMC 40nm chip micrograph . 81

Fig. 4-17 FD chip measurement setup ................................................................................................................................. 82

Fig. 4-18 Measured PA performance, (a) PA output power and efficiency with a single tone input at 1.96GHz, (b) PA EVM testing with a input of 40Mbp/s 16QAM signal and an average PA output power of 20dBm at a center frequency of 1.96GHz ................................................................................................................. 84

Fig. 4-19 Measurement results of TX SI suppression versus bandwidth ........................................ 85

Fig. 4-20 Measured TX suppression using a modulated 40Mb/s 16QAM signal, RX baseband output spectrum with cancelers enabled and disabled ......................................................................................... 86

Fig. 4-21 Two-tone linearity testing for the canceler, measured fundamental tone and IM3 components of the TX SI signal at the RX baseband output with canceler enabled and disabled 88
Fig. 4-22 Measured RX noise figure with +15 dBm blocker with the canceler enabled and disabled relative to the baseband frequency ...................................................................................................................... 89

Fig. 4-23 Measured suppression of TX SI signal reciprocal mixing with RX LO phase noise in the RX signal path, (a) measurement setup (b) measurement results at the RX baseband output with cancellation enabled and disabled .................................................................................................................... 90

Fig. 4-24 Power breakdown for the proposed system without including the power amplifier ..... 91

Fig. 4-25 Simplified mathematical model of TX SI cancellation in the baseband canceler ......... 93

Fig. 5-1 Quadrature imbalance issues. (a) Block diagram of quadrature imbalance. (b) EVM variations with gain imbalance. (c) EVM variations with phase imbalance .............................................. 95

Fig. 5-2 Illustration of phase imbalance at low and high frequency due to a mismatch induced timing error. ........................................................................................................................................... 97

Fig. 5-3 Phase mismatch in two-stage PPFs plotted for several process and temperature corners. (a) Schematic of a two-stage PPF, (b) Simulation results of phase mismatch. ......................... 99

Fig. 5-4 Phase imbalance of two-stage PPFs with only one stage R, C value accurately controlled. (a) 1st stage with PVT variations, 2nd stage without PVT variations. (b) 1st stage without PVT variations, 2nd stage with PVT variations. ................................................................. 100

Fig. 5-5 Proposed two-stage PPF with feedback control. (a) Schematic of proposed circuit, (b) Description of auxiliary bias resistors added for the triode-region transistors. ....................... 103

Fig. 5-6 Feedback circuitry for the proposed two-stage PPF ........................................................................ 104

Fig. 5-7 Optimal sizing of two-stage PPF considering the loading effect ...................................... 108

Fig. 5-8 Simulation results of normalized insertion loss as a function of frequency with different values of m .................................................................................................................. 111
Fig. 5-9 Triode region transistor model with parasitic capacitance. (a) Cross section of a MOS transistor. (b) Simplified schematic model of a MOS transistor. ................................................................. 112

Fig. 5-10 Simulation results of normalized insertion loss of two-stage PPF as a function of frequency with different values of $CD$ and $CS$. ........................................................................................................... 114

Fig. 5-11 Simulation results of amplitude/phase mismatch, normalized insertion loss of two-stage PPF as a function of frequency with different values of $CDS$. (a) amplitude mismatch, (b) phase mismatch, (c) normalized insertion loss. ........................................................................................................ 115

Fig. 5-12 The schematic of the opamp and the settling time of the proposed feedback loop. (a) Opamp schematic with a folded-cascode topology, (b) the proposed feedback loop settling time. ..................................................................................................................................................... 117

Fig. 5-13 Phase noise simulation of the proposed quadrature generator versus traditional two-stage PPF with an ideal 60GHz LO and realistic 60GHz LO input. (a) Simulation taken with noiseless LO, (b) simulation taken with actual LO. ................................................................................................................................. 120

Fig. 5-14 Block diagram of the proposed system for IQ balance measurement. ......................... 121

Fig. 5-15 A 7-layer metal GF 28nm chip micrograph ................................................................. 123

Fig. 5-16. IQ generation measurement setup in the lab .............................................................. 124

Fig. 5-17 Measured amplitude/phase mismatch of the proposed tunable IQ generator vs frequency for several values of $Ictrl$. (a) amplitude mismatch, (b) phase mismatch. ...................... 125

Fig. 5-18 Baseband quadrature output waveforms measured using oscilloscope. $fRF = 65$GHz, $fLO = 64.9$GHz, $fIF = 100$MHz, $VRBIAS = 100$mV, $Ictrl = 90$µA. .............................................................. 126

Fig. 5-19 Measured worst-case amplitude/phase mismatch for several available chips. (a) amplitude mismatch, (b) phase mismatch. ........................................................................................................ 127

Fig. 6-1 Single chip full-duplex radio with circulator ................................................................ 132
Fig. 6-2 Simultaneous stimulation and sensing in brain computer interface (BCI) system with artifact cancellation.
List of Tables

Table 2-1 Comparison table for proposed four-port-transformer-based canceler .......................... 22
Table 3-1 Comparison and performance summary for polyphase-filter-based canceler ................. 55
Table 4-1 Comparison and performance summary for wideband full-duplex chip ...................... 92
Table 5-1 Comparison table for mmWave quadrature generator .................................................. 129
Acknowledgements

First, I would like to express my deep and sincere gratitude to my supervisor Prof. Rudell for his guidance, patience and encouragement. His technical guidance and support throughout the past five years have been invaluable. His commitment to excellence helps me to grow as a researcher and engineer, allowing me with the ability to explore ideas and research directions within the broader project. The research project in UW would not be possible without Professor Rudell’s insights for future directions in CMOS integrated circuit design. Professor Rudell always encourages me to think innovatively from both system and circuits side, which really helps me to make the project significant impact. Except the research project, Professor Rudell also spends numerous time to help improve my English speaking and writing skills. Although correcting, polishing and improving text and presentations is a very painful process, I find my English has improved substantially, both with respect to written and verbal skills, since I join Professor Rudell’s group. There is no doubt Professor Rudell is one of the best advisers in this department and in this field.

I would also like to thank Professor Allstot not only for serving as the committee for my Phd general and final exam, but also for his sense of discovery and enjoyment in circuit design. Although I didn’t have a chance to take a class from Professor Allstot when he was at the University of Washington, I learned a lot by reading papers by him and his students. Professor Allstot always has the ability to explain a very complicated circuit theory in a very simple and easy way to understand. His great intuition on fundamental circuit theory helped me to establish a solid circuit background. I am also grateful for Dr. Dennis Yee who served as one of my PhD supervisory committee as well as bringing me to Google for an excellent internship. Although I
have not had the chance to work that closely with Dr. Yee, I have benefited immensely from a
couple of conversations with him as well as reading through all the design documentation
completed by him. This PhD final project would have never been finished without the help of
Professor Sumit Roy. His broad expertise in communication theory, signal processing and circuit
theory really helped speed up the progress of my final project. Also, thank you to Professor Yasuo
Kuga and Professor Shyam Gollakota for serving on my committee and providing so much
valuable feedback for this project. Professor Gollakota is very knowledgeable on the topic of full-
duplex communication and gave me a lot of useful insight while I was working towards my final
project. Although Professor Visvesh Sathe is not on my Phd exam committee, his intelligence and
broad knowledge on the topic of digital integrated circuits, especially how to write top-level
verilog code for our adaptation algorithm, really helped accelerate my final projects.

During my first few years at UW, many of the senior graduate students helped to show me the
ropes and provided a solid foundation for the success of my future PhD. I would like to express
my special thanks to Dr. Venumadhav Bhagavatula. Professor Chris Rudell always points me to
the right direction in the big picture, while Venu is the person who helped me with the circuit
details. When I first joined Professor Rudell’s group, I was completely fresh to the area of
integrated circuit design, having no background on hardware design and software simulation. Venu
was always patient with me, willing to spend hours explaining basic circuit theory to a somewhat
slow student. His talented insights on my project help me to eventually succeed in obtaining my
degree. I still remember when Venu and I spent countless hours in the lab, working together until
3 or 4 am to make a tapeout of our first mmWave receiver chip. As a friend, I wish him well at
Samsung Electronics. I am also grateful for Apsara Ravish Suvarna. She and I work together on
my first chip tapeout. She provided a lot of the useful feedback for my design and helped to make
it a successful chip. Jason Silver, Keping Wang, Chris Mandic and Julie Hu also help me a lot of
great advice during my initial years at UW.

I had the pleasure of working with a group of excellent people from Professor Rudell’s lab
besides Venu and Apsara. Chenxi Huang and Yongdong Chen work directly with me on our
Bluetooth full-duplex chip tapeout. Ali Najafi and Chenxin Su contributed significantly to my final
project which was a wideband full-duplex chip. Eric Pepin is another person who is free flowing
with his advice and continues to advise me both on circuit designs and language skills. Samrat Dey
and I work together on a few class projects. John P. Uehlin was always helpful when I needed a
little, or a lot, of board soldering. I hope all of them enormous success, particularly with respect to
publishing outstanding conference/journal papers and a bright future. I also wish the very best to
the new lab members Kun-Da Chu and Mohamad Katanbaf.

As I started my career in the area of RFIC design, I have had the luxury of being mentored by a
lot of outstanding engineers within our industry. I am very grateful to Professor Rudell who help
me find two great internships, one at Qualcomm Atheros, San Jose, CA, while the other was at
Google Incorporated, Mountain View, CA. I was fortunate to have some truly great
managers/mentors at both companies. Mazi Taghivand hosted me for a nine-month internship at
Qualcomm. He gave me a very interesting project on the design of a wideband mmWave
quadrature generator. He also helped me tapeout a chip in a very advanced 28nm process through
Qualcomm. Ben Mossawir was my manager and host for a three-month internship at Google. I
came to Google with almost no experience on programming software to design circuits. Ben taught
me everything from the basics, to helping me keep up to speed so I could finish my internship
tasks on time. I really enjoyed working at these two companies and appreciated the support-
structure provided by the managers/colleagues including: Beomsup Kim, Magnus Wiklund, Roger

Outside of the research field, I also benefited from the friends I made here during the past five to six years. I would like to acknowledge a number of my friends who gave me the best help, Yishen Wang, Xudong Li, Ce Zhang, Hao Wu, Zeyu Wang, Yuzong Liu, Ruizhi Sun, Xiang Chen, Zhen Li and others whose names escape me at the moment.

Finally, I would never have made through this PhD program without the patience, support, love, and encouragement of my parents, Yulin Zhang and Jin Xue. They have done everything in their power to minimize the things that I should not worry about so that I can focus on what I have to do, to complete my degree. Their unconditional love and willingness to do anything for their child is beyond remarkable. I am so thankful to both of you.

This work was funded in part by NSF #1408575, CDADIC, Qualcomm Incorporated, Google Incorporated and Marvell Semiconductor. The author of this thesis would also like to acknowledge Taiwan Semiconductor Manufacturing Company Limited (TSMC) and GlobalFoundries (GF) for silicon fabrication.
To My Parents,

Yulin Zhang and Jin Xue
1. INTRODUCTION

Fig. 1-1 International technology roadmap for semiconductors (ITRS) wireless roadmap

Over the past two decades, there has been a drastic increase in the available bandwidth and data rates delivered to smart phones, notebook computers and other wireless applications. At present, existing commercial standards in the RF bands, such as WCDMA and Wi-Fi provide as much as 80MHz bandwidth with data rates as high as 100-1000MB/s per user. The demand for more accessible bandwidth and higher data rates is predicted to continue with next generation communication systems, i.e. 5G expected to support 1000 times higher data volume per area, and 10 to 100 times higher data rates per user (10-100 GB/s) with as much as 10x extension in battery life as compared to existing solutions. The international technology roadmap for semiconductors (ITRS) wireless roadmap shows a history as well as projection of a factor of 5-10x data rates increase every 5 years according to Moore’s law; see Fig. 1-1 [1]. From Fig. 1-1, Terabit/s short
range links and Gb/s cellular communication will soon be required to satisfy the increasing
demand for high-speed data rates from mobile consumer electronics.

With the rapid growth of wireless communication, significant engineering challenges remain to
be resolved. One of the most significant technical and market barriers is the limited amount of
available spectrum in the crowded 1-6GHz bands. The current RF spectrum, which is often
referred to as the frequency band from 1-6GHz, has become increasingly crowded with only a
limited amount of unused and unlicensed spectrum; see Fig. 1-2.

![Current S-Band spectrum usage](image)

**Fig. 1-2.** Current S-Band spectrum usage.

There are two potential solutions to this problem. First, the spectral efficiency associated with
existing RF standards could be increased by using in-band full-duplex communication (see Fig.
1-3(a)) [2]–[4] to combine spectrum currently allocated for just transmitting and receiving data.
For example, one of the most popular wireless standards, Wideband Code Division Multiple
Access (WCDMA), is a half-duplex FDD system with a guard band of approximately 100MHz;
assuming the TX and RX bands occupy a bandwidth of 60MHz, the spectral efficiency of the WCDMA system is less than 30%. Removing the guard band or implementing in-band full-duplex radios could improve spectral efficiency up to 100% and enable up to a 2x increase in data rates within the WCDMA band. However, to implement an in-band full-duplex radio requires in excess of a 100dB self-interference cancellation between a full-duplex transmitter and receiver [2]–[4]. This leads to an extremely challenging engineering problem which delves into multiple disciplines including circuit design, electromagnetics, communications theory and signal processing.

The second approach to address the future demand of higher data rate is to utilize available spectrum at mmWave frequencies, see Fig. 1-3 (b). Although the lower frequency RF band appears saturated, the vast available spectrum at mmWave frequencies (30 – 300 GHz) presents a potentially attractive solution for high-speed communication [5], [6]. To date, several successful
CMOS implementations of mmWave transceivers have achieved data rates between 1-10 Gb/s [7]–[11]. However, the main propagation-related obstacles in realizing mmWave systems are free-space path loss due to these extremely high carrier frequencies. The challenges here is the added complexity associated with systems that provide directionality to overcome the high path loss at such high frequencies.

1.1. RESEARCH OBJECTIVES

This dissertation will focus on investigating, analyzing and implementing novel system/circuit level techniques to provide higher data rates to each single user using two aforementioned techniques:

1. Full-duplex communication using self-interference cancellation techniques to improve the spectral efficiency of the existing crowded RF spectrum. Implementing full-duplex radios will eventually double the spectral efficiency and the main challenges of realizing full-duplex radios is how to implement the self-interference cancellation circuits. The objective of this full-duplex project is to analyze the design trade-offs between the power consumption, silicon area, noise, linearity and bandwidth of the self-interference cancellation circuitry and eventually build a full-duplex transceiver front-end.

2. High speed communication using large amount of bands available at mmWave frequencies. One of the major obstacles of communication at mmWave frequencies is how to design a wideband high accuracy quadrature generator over process, voltage and temperature (PVT) variation. The goal of this mmWave project is to investigate the fundamental limitations and find a better way to design a new mmWave IQ generators.

All the proposed prototype chips in this dissertation are implemented using advanced standard bulk CMOS technologies (TSMC 40nm LP and GF 28nm LP). The advantage of using advanced
standard CMOS technologies is that it allows the large integration of high performance, low-power, high density digital circuits with the proposed analog/RF/mmWave circuits on a single die for high volume and low-cost production. The ideas proposed in the prototype chips could be applied to any other techniques such as GaAs, SiGe, InP and any bipolar process.

1.2. Overview and Organization of the Thesis

The rest of this dissertation is organized as follows:

Chapter 2: This chapter provides a discussion of why self-interference cancellation (SIC) is needed for FDD/full-duplex radio and an overview of existing state-of-the-art SIC techniques that have been proposed/implemented by various groups around the world. My prior research projects pertaining to this area (transformer-based self-interference cancellation technique) will also be briefly covered in this chapter.

Chapter 3: This chapter describes an analog front-end which includes a self-interference mitigation (SIM) circuit, a low-power receiver circuitry, combined with a harmonic-rejection power amplifier (HRPA) proposed to reduce the transmitter-to-receiver self-interference, and enable full-duplex operation for Bluetooth applications. A 40nm CMOS prototype chip was fabricated and measurements will be described in this chapter.

Chapter 4: A transceiver front-end including a dual-injection path (RF & Baseband) low-noise TX self-interference cancellation circuit is introduced to enable wideband full-duplex communication with a high-power output transmitter. Two adaptive filters (5-taps RF adaptive filter and 14-taps baseband adaptive filter) have been implemented to create an inverse time domain response of the leakage path while enabling a wideband self-interference cancellation function. A prototype 40nm TSMC device has been tested with more than 50dB self-interference cancellation over 42MHz bandwidth. More measurement results are provided in Chapter 4.
**Chapter 5:** An integrated two-stage polyphase filter (PPFs) with feedback control is proposed for local oscillator (LO) quadrature generation at millimeter-wave band frequencies. A 55-70GHz prototype quadrature signal generator for use in a homodyne 60GHz receiver is integrated in 28nm LP CMOS process. The design and measurement results of the prototype chip are described in Chapter 5.

**Chapter 6:** This chapter summarizes the dissertation and gives a scope for further work and applications. One of the interesting extension of the full-duplex projects is applying the proposed self-interference cancellation techniques for neurological interfaces where clinicians struggle with simultaneously acquiring neural recordings while another site is stimulated. This is similar to the TX self-interference cancellation in a wireless system, where the stimulator inherently interferes with the recording channels. Chapter 6 will discuss briefly about applying the self-interference cancellation techniques to the “stimulation artifacts” cancellation found in neural interfaces.
2. **Self-Interference Cancellation (SIC) In FDD/Full-Duplex System And State-of-the-art SIC Techniques**

As integrated transceiver design enters the era of big data, the demands for both higher data rates and better utilization of existing spectrum become essential for future mobile applications. Frequency division duplexing (FDD) is one method of enhancing wireless network capacity by allowing a single user to simultaneously transmit and receive using different carrier frequencies. Typically, a discrete front-end duplex filter is used to prevent the transmitted signal from appearing (“leaking”) into the RX input. However, these filters provide at most, 50dB of isolation between the TX and RX. Thus, for applications requiring a high output power transmitter, such as cellular radios, a high-performance receiver is necessary, which incurs a power consumption penalty.

![Diagram](image)

**Fig. 2-1 Potential sources of RX interference form TX self-interference, in FDD radios.**
2.1 **ISSUES WITH TX SELF-INTERFERENCE CANCELLATION IN FDD/ FULL-DUPLEX RADIOS**

For the purposes of illustration, a well-known wireless standard, WCDMA, is used to describe some of the challenges associated with FDD transceivers; see Fig. 2-1. The worst-case self-interference scenario for an FDD system occurs when transmitting at maximum output power; for WCDMA, this is +27 dBm. Assuming the duplex filter suppresses the transmitted signal by 50 dB, a -23dBm TX leakage signal will appear at the LNA input. Further assuming the LNA provides 15 dB of gain at the TX frequency, the leakage signal becomes -8dBm. This places a high linearity burden, mainly IIP\(_2\), on the subsequent components following the LNA (usually a mixer). In this example, a mixer IIP\(_2\) > 45dBm is necessary [12], [13] to ensure a sufficiently low-level of mixer intermodulation distortion. In addition, when a strong jammer appears in the vicinity of the RX band, the TX leakage signal will potentially cross-modulate as it passes through the nonlinearities in the LNA. Another mode of interference in the RX band arises from the transmit self-interference reciprocal mixing with the phase noise of the RX oscillator, lowering the carrier-to-interference (C/I) ratio at the output of the RX mixers. Other forms of interference attributed to TX self-interference relate to the effect of a large transmit carrier leaking into the receiver, and modulating the transconductance of individual devices in the RX signal path. This potentially up-converts low-frequency noise (including 1/f noise) from the bias circuitry into the band of interest, which has the effect of raising the noise floor in the RX signal path and further degrading the C/I ratio. Also, the presence of a large TX blocker will cause gain compression in the RX path. All of these effects act in concert to degrade the RX C/I ratio, thus, signifying the importance of mitigating, or cancelling, the TX self-interference as early as possible, in the RX chain; ideally cancellation would happen at the RX input.
The challenges associated with TX leakage signals in FDD systems are exacerbated by future applications such as 5th generation wireless standards, cognitive, and software-defined radios where the duplex band would ideally be kept to a minimum, to improve spectral efficiency. To address these interference issues, commercial FDD transceivers [14], [15] often use an off-chip surface acoustic wave (SAW) filter between the LNA and down-converter to further suppress the TX leakage, reducing the possibility of second order intermodulation distortion in the mixer. However, these band-specific discrete SAW filters [16], [17] prohibit highly programmable broadband transceiver solutions and require an increase in the radio cost and power consumption. Recent efforts have focused on improving RX selectivity in the presence of a self-interfering TX, without the use of off-chip filters [12], [13], [18]–[21]. Some have taken the concept of improving spectral efficiency to the extreme by simultaneously transmitting and receiving, using the same carrier frequency for both the RX and TX [2]–[4]; this is often referred to as a full-duplex system. However, depending on the commercial application, up to a 120dB (cellular) TX self-interference cancellation would be required to share the transmit and receive spectrum.
2.2 State-of-the-art Self-Interference Cancellation Techniques

Numerous efforts have explored techniques to attenuate the effect of TX leakage signals in the RX signal path. These efforts have been mainly focused on eliminating the need for the front-end RF SAW filter. Prior art can be categorized as either attempting to perform cancellation or filtering of the TX signal through the use of active circuits.

A feed-forward canceller [13], shown in Fig. 2-2(a), samples the reference signal using a polyphase filter from the TX output and injects an amplitude-adjusted and phase-rotated signal into the LNA output, to cancel the leakage signal. A 30dB coupler is connected between the vector signal generator and the proposed TX leakage canceller (TXLC), which is mainly made up of an active Gilbert-cell based VGA to achieve a minimum cancellation of 22.5dB using a -25dBm blocker, with a 0.44dB RX noise figure (NF) degradation. In practical, a high output power PA (+27dBm) will be connected directly to the TXLC, which places an enormous linearity demand on the VGA. Also in [13], the canceller is attached to the LNA trans-conductance stage output, which means the LNA must tolerate a strong self-interference signal. This leads to a high demand for the LNA linearity and P_{1dB} performance which would like to need burning considerable more power in the LNA.
Fig. 2-2 Active TX leakage suppression using. (a) feed-forward (FF) techniques, (b) two-point FF technique, with cancellation at both the LNA input & output, (c) feedback (FB) loop incorporates the RX down-converter, (d) a separate feedback loop between the LNA and the RX down-converter.

In [19], [22], researchers have proposed an active low noise two-point cancellation technique which attenuates a 0dBm TX blocker, see Fig. 2-2(b). In this technique, a current-mode noise cancelling receiver architecture [18], [23] with single-ended common source (CS), common gate (CG) LNA is applied to achieve wideband input match and low noise performance. A replica of the TX signal is injected at the gate of a CG device and the drain of another CS device in the LNA. This results in a TX leakage cancellation of greater than 30dB. This two-point cancellation technique utilizes a Cartesian phase rotator to phase and amplitude match the signal in the cancellation path the leakage signal. However, similar to the feed-forward technique, an enormous burden of linearity is placed on the active phase rotator by potentially high output power TX signals.

Alternate TX leakage suppression techniques apply feedback around a set of frequency translation elements to achieve cancellation [24]–[26]. Compared to the feed-forward methods,
which are inherently open-loop systems, feedback TX cancellation techniques are less sensitive to device mismatch effects, however, this is achieved at the expense of introducing noisy elements which consumes significant power consumption. A set of alternate feedback architectures for leakage mitigation are shown in Fig. 2-2 (c) and (d). In [25], a feedback loop which incorporates the RX down-conversion path is proposed (Fig. 2-2 (c)), while Fig. 2-2 (d) has a separate rejection loop after the LNA and uses an additional down-converter [26]. In [25], the down-converter and baseband amplifier get re-used in the feedback loop and connect to a high-pass filter and an up-converter by effectively creating an RF channel-select band-pass filter centered around LO to suppress all the interferes. However, the non-idealities from the baseband amplifier reduce the overall TX leakage suppression. Overall, the extra feedback circuitry (mixers and amplifiers) adds a non-negligible power consumption.

Another class of TX leakage suppression techniques attempts to realize a combination of integrated, or in-package filtering techniques. A front-end on-chip high-Q passive filter using bond wires is presented in [20], see Fig. 2-3(a). Designing an on-chip bandpass filter at the RF front-end, to filter the TX leakage requires extremely sharp filter skirts which is challenging as an integrated solution due to the lack of high-Q integrated inductors in frequency bands up to 5 GHz. A three-pole differential bandpass filter at 2.14 GHz using bond-wire inductors is presented in [20]. Since the TX and RX are usually separated by less than a hundred megahertz, even with a third order bandpass filter, the stop-band attenuation is no more than 10dB. Higher order filters will improve the TX leakage suppression, but will result in a higher pass-band insertion loss due to the finite quality factor of the bond-wire inductors and on-chip capacitors. Another leakage suppression approach related to active bandpass sink filter, is proposed in [12]. The sink filter is created after the down-conversion mixer, filtering out the down-converted TX signal without
affecting the desired baseband RF signal. The bandpass sink filter, which is made up of another down-conversion passive mixer with a trans-impedance amplifier, creates a low impedance node at the mixer output at the down-converted TX frequency. In [12], a CDMA-2000 receiver is targeted, operating at 1.96GHz, with the TX leakage offset by 80MHz from the receive band, see Fig. 2-3(b). To improve the receiver selectivity in the absence of the inter-stage RF filter, a sink filter with an impedance $Z_{\text{sink}}$ is desired at the mixer output. $Z_{\text{sink}}$ provides a large impedance of 400-450 Ohm at the down-converted RX frequency and a low impedance of approximately 50 Ohm at the down-converted TX frequency. The resulting system shows a cancellation of 6.5dB with 1.8dB noise figure degradation and additional 48mW. Due to the switch resistance and the other parasitic, it becomes difficult to realize $Z_{\text{sink}}$ as an extremely low impedance (close to zero) at the TX down-converted frequency, which reduces the TX-leakage suppression. Moreover, the finite bandwidth of the operational amplifier limits this approach to a narrowband solution. Another integrated continuous-time LMS adaptive filter utilizes a 180° out-of-phase copy of the TX leakage injected at the LNA output to perform cancellation [27], [28], see Fig. 2-3(c). The proposed LMS adaptive filter acts as an equalizer and achieves a TX leakage suppression of 28dB, with a 1.3dB noise figure penalty, and requires an additional 0.5mA power consumption. The suppression is limited by the DC offset in the correlators, the reference signal coupling, and the duplexer group delay.
Fig. 2-3 Active TX leakage filtering techniques using. (a) high-Q passive filter using bond wires, (b) active bandpass sink filter, (c) LMS adaptive filter.

The aforementioned self-interference cancellation (SIC) techniques are mainly proposed for the FDD system, where the transmitter and receiver are usually separated by a guard band varying from 10-100MHz. Recently, more SIC techniques have been proposed to enable full-duplex radio transmission in [2]–[4], [29]–[33]. Antenna cancellation method is proposed to attenuate the TX self-interference signal before entering the receiving path [2], see Fig. 2-4(a). In [2], the transmitting and receiving antenna are separated by a special distance (d & d + λ/2) with respect
to the wavelength to demonstrate an electromagnetic cancellation. However, the cancellation is extremely sensitive to the distance between TX and RX antenna. Also the on-board electromagnetic environment could significantly impact the amount of cancellation. Even the slightest change in the impedance of one antenna relative to the other, will lead to significantly reduction in the amount of leakage cancellation. Moreover, with two or more antennas, the benefits of applying this antenna cancellation technique towards FDD/ full-duplex systems becomes unclear since similar improvements in spectral efficiency could be achieved through the use of less complicated MIMO antennas in a half-duplex system with less worry about the coupling from the self-interference transmitter. A passive mixer first baseband full-duplexing LNA approach is proposed to enable full-duplex radio in [33], see Fig. 2-4(b). The concept of duplexing LNA is to intrinsically copy a transmitting signal to their antenna port, while rejecting it at their receiver output. The bi-directional transparency of the passive mixer allows the implementation of duplexing function at baseband rather than at RF front-end to save power consumption. However, the baseband duplexing LNA topology has limited capability to work under high TX power applications. In [33], a baseband full-duplexing LNA chip is demonstrated with a measured TX-RX isolation of 33dB and a maximum TX output power of -17.3dBm. RF frequency domain equalization is another way to achieve TX self-interference cancellation for full-duplex radio [31], [34], see Fig. 2-4(c). Wideband self-interference cancellation is achieved using a bank of tunable second-order RF bandpass filters and an N-path $G_m – C$ filter. The proposed canceler [31], [34] achieves 20dB cancellation over 25MHz bandwidth. However, this canceller takes large silicon area (more than 3mm$^2$ observed from the chip die photo) and consumes 44-91mW power consumption. In [35], [36], passive vector-modulator down-converter is proposed to attenuate the TX leakage signal for full-duplex radio, see Fig. 2-4(d). This vector-modulator down-conversion
self-interference cancellation method achieves a cancellation of 27dB over 24MHz bandwidth. However, the RX noise figure degradation due to the proposed self-interference cancellation circuitry is 4-6dB.

Fig. 2-4 Self-Interference cancellation techniques proposed in full-duplex system. (a) antenna cancellation technique, (b) passive mixer first full-duplexing LNA, (c) RF frequency-domain equalization, (d) Passive vector modulator down-mixer.
Another option for TX leakage suppression uses integrated duplexers, which have been analyzed in [21], [37]–[39], as a three-port device which performs large isolation between the TX and RX and concurrent matching for the antenna, TX and RX (Fig. 2-5). The duplexer shows a measured TX-RX isolation of more than 55dB with a large silicon area (more than 300µm×300µm in [21]) and a fundamental TX/RX 3dB insertion loss/ noise figure penalty.

Circulator is another option to provide TX-to-RX isolation. In [40], [41], integrated magnetic-free N-path-filter-based non-reciprocal circulators are proposed, see Fig. 2-6. Non-reciprocal wave transmission is realized using N-path filters and the proposed circulator shows 20dB cancellation.
over 12MHz bandwidth. However, the circulator itself introduces 6.5dB RX noise figure degradation and 59mW power consumption.

Fig. 2-6 TX-to-RX isolation using circulators.

Virtually all of the aforementioned approaches for TX self-interference mitigation utilize an active cancellation path which is problematic from a noise, linearity, and power perspective. In summary, an ideal integrated TX leakage canceller would have the following characteristics:
• Introduce minimal RX noise, and additional power, while occupying minimal silicon area.
• Perform cancellation as close to the RX input to relax the required performance of subsequent blocks.
• Present negligible loading (high impedance) to the TX/PA output to minimize power loss and any efficiency degradation.
• Present high linearity to reduce any un-wanted non-linear components generated by the canceller coupled to the RX.
• Minimal sensitivity to packaging and EMI effects.

The remainder of this paper explores TX leakage cancellation methods which attempts to embody all of the aforementioned characteristics.

One prior-art TX self-interference cancellation technique, the four-port-transformer-based canceler, will be briefly described in the next section. This idea is formed when the author pursued the master degree at the same lab in the Electrical Engineering Department of University of Washington. More details about the four-port-transformer-based canceler, refer to [42]–[44].
2.3 Four-Port-Transformer-Based Canceler

Transformers have the added advantage of easily coupling in several signals through the use of additional primaries. This four-port-transformer-based canceler utilizes components from a transformer-based RX matching network to inject the cancellation signal. A conceptual diagram of the proposed transformer-based TX leakage canceller is shown in Fig. 2-7.

Integrated transceivers often use differential signal paths to increase the immunity to unwanted common-mode noise from the substrate and power supplies [45]–[47]. However, commercial antennas supply a single-ended input to the receiver, thus a single-to-differential conversion is
necessary between the antenna and receiver interface. A balun serves a dual purpose of performing a single-ended to differential conversion, and impedance matching at the RX input [48], [49]. The proposed canceller exploits the existing transformer topology to inject a component of the cancellation signal. A second, relatively small primary is added to the center of the transformer to couple a component of the TX signal into the receiver signal path. As such, the canceler becomes a component of the RX matching network with minimal additional area. The signal received on the main primary attached to the RX input (antenna side shown as port-1) travels from port-1 to port-2 with the TX leakage. The TX signal from the canceller network is intentionally coupled into the RX with 180° phase shift, through the use of a significantly smaller primary, shown as port-3. Since the discrete duplexer has a rejection of approximately 50dB at the TX frequency, any practical cancellation technique must accurately adjust to match the amplitude and have the opposite phase of the attenuated leakage signal. This is done with two techniques: First, port-3 and port-4 are weakly coupled with port-1 and port-2. Second, the amplitude of the coupled-TX-signal is precisely controlled by the capacitor values in the cancellation path, while the phase is modified by varying the termination reactance on port-3 and port-4 of the transformer.

This particular canceler explored methods for TX leakage cancellation in traditional frequency division duplexing (FDD) radios. The presence of a large TX blocker in FDD systems places stringent linearity (IIP₂ and IIP₃) performance demands on the receiver, which can be achieved at the expense of an increased power consumption. This effort uses a passive SIM feed-forward cancellation path with a four-port canceller, and has a minimal noise figure and power consumption penalty. The proposed device was implemented in 40nm TSMC CMOS technology for a WCDMA application. A measured cancellation of greater than 20dB over a 5MHz signal bandwidth is achieved with negligible impact on the overall power consumption and noise figure.
A comparison and performance summary for the proposed integrated canceller network is shown in Table 2-1.

### Table 2-1 Comparison table for proposed four-port-transformer-based canceler

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Architecture</td>
<td>Active filtering</td>
<td>Feed-forward filtering</td>
<td>Active two port cancellation</td>
<td>Transformer coupling</td>
</tr>
<tr>
<td>Technology/V_{DC}</td>
<td>65μm/2V</td>
<td>180μm/1.8V</td>
<td>65μm/?</td>
<td>40μm/1V</td>
</tr>
<tr>
<td>TX Suppres.(dB)</td>
<td>Single Tone</td>
<td>-NA-</td>
<td>25</td>
<td>&gt;30</td>
</tr>
<tr>
<td></td>
<td>Modulated Signal</td>
<td>-NA-</td>
<td>22.5</td>
<td>-NA-</td>
</tr>
<tr>
<td>NF with cancellation</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>circuits active(dB)</td>
<td>4.9</td>
<td>2.84</td>
<td>5</td>
<td>5*</td>
</tr>
<tr>
<td>NF degradation due to</td>
<td>1.7</td>
<td>0.44</td>
<td>0.8</td>
<td>&lt;0.1dB or 1.2dB</td>
</tr>
<tr>
<td>leakage cancellation(dB)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>RX Gain(dB)</td>
<td>45</td>
<td>25.4</td>
<td>19-34</td>
<td>*18</td>
</tr>
<tr>
<td>RX Power Consumption(mW)</td>
<td>44^1a</td>
<td>16.38^1a</td>
<td>74.6-83^1b</td>
<td>10^1a</td>
</tr>
<tr>
<td>C canceller Power Consumption(mW)</td>
<td>48</td>
<td>18.9</td>
<td>13-72</td>
<td>≈0</td>
</tr>
</tbody>
</table>

^RX Gain/ NF measured with front-end duplexer, which has 1.5-2dB loss.
∧RX includes only the LNA.
^Measured with a 3.84MHz WCDMA signal.
The transformer used for single-ended to differential conversion has an insertion loss of 1.2dB and the added two primaries for TX leakage suppression introduces an additional loss of less than 0.1dB.
^1Power consumption includes LNA only. ^1bPower consumption includes the entire RX.

Although this transformer-based canceller presents several aforementioned advantages with respect to noise and power consumption, drawbacks also exist. One of the biggest issues is the large silicon area introduced by the transformer. Recent research on more modern CMOS receivers has focused on designing inductor-less receivers [18] to significantly reduce the silicon area. Therefore, no single-ended-to-differential transformer will be present at the RX input using this approach. Introducing an extra transformer just for self-interference cancellation will not be area
efficient. The second issue is that the canceller has a measured narrow phase tuning range [43] and this is due to the additional parasitic resistance from the interconnection of tunable capacitors (see Fig. 2-7). Simulation and theory shows the phase tuning range has a high sensitivity to the quality factor of the LC tank. Any additional series routing resistance will reduce the phase tuning range and leads to a reduced self-interference cancellation.

After analyzing two major drawbacks of the four-port-transformer-based canceller proposed in the author’s master thesis, during the PhD, the author tries to address the aforementioned issues and proposes two new self-interference cancellation methods, which will be described in the next two chapters.
3. A FULL-DUPLEX FRONT-END WITH A LOW-NOISE SELF-INTERFERENCE CANCELLATION AND HARMONIC REJECTION POWER AMPLIFIER

3.1 INTRODUCTION

The demand for increased wireless data rates used by transceivers found in mobile smart phones and notebook computing applications, continues to drive hardware research on low-cost integrated technology solutions which more efficiently use existing spectrum. The RF spectrum, which is commonly accepted as the 1-6GHz band, is completely occupied by existing commercial, military and public service wireless standards. As such, researchers are exploring numerous modulation methods, smart antennas, multiple input multiple output (MIMO) systems and other techniques for more efficient use of this highly congested spectrum [29], [50].

Frequency division duplexing (FDD) is one method to enhance wireless network capacity by allowing a single user to simultaneously transmit and receive using different carrier frequencies. However, a key challenge in designing FDD radios relates to the self-interference presented to the receiver (RX), by the transmitter (TX). As an example, consider wideband code division multiple access (WCDMA), a well-known half-duplex FDD standard with a guard band varying from 10MHz to 100MHz. Typically, a discrete front-end duplex filter provides up to 50dB of isolation that suppresses the transmitted signal which appears (“leaks”) to the RX input. However, even with a discrete duplex filter, some of the TX signal (particularly at maximum TX output power) will leak into the RX signal path, thus degrading the RX carrier-to-interference (C/I) ratio [42], [51].
Full-duplex communication allows a single user to both transmit and receive, simultaneously, using the same frequency. This has the primary advantage of freeing spectrum which would otherwise be exclusively dedicated to either transmitting or receiving. Allowing both TX and RX operations at the same time, using the same channel frequency, will to the first order double the spectral efficiency. However, the undesired effect of TX-to-RX coupling will be exacerbated when using a full-duplex communication protocol as compared to FDD systems. Further challenging full duplex radios is the fact that same channel operation removes the possibility of selectively filtering out the TX self-interference using an off-chip duplex filter which provides as much as 50dB suppression/isolation. A full-duplex system necessitates the use of either a circulator or two antennas, which gives a maximum TX-to-RX isolation of 20-30dB [2]–[4]. Although numerous efforts have introduced methods to perform feedforward cancellation, there are still challenges with respect to noise, bandwidth, and linearity performance to realize practical, full duplex transceivers [19], [30]–[34], [36], [42], [51], [52].

Numerous recent efforts have explored methods to attenuate a TX leakage signal that arises from FDD and full-duplex radios; such techniques can be broadly categorized as either active or passive. Active cancellers, which utilize transistors in the canceller signal path [13], [19], [22], [25], [26], [28] are typically problematic from a noise and power perspective, while cancellers based on passive components [42], [51] characteristically consume more area and are challenged by a narrow phase tuning range. Other self-interference cancellation methods which have been explored include an integrated duplexer [21], [37], antenna cancellation [2], active balun cancellation [3], a mixer-first duplexing LNA [33], and vector modulator based cancellation [30], [36].
Switch-mode PAs are widely used in RF systems to achieve high efficiency in wireless TX applications. The non-linear nature of a switch-mode PA inherently generates TX output harmonic spurious content, which requires discrete off-chip components to filter out unwanted spectral
components. On-chip conduction angle calibration [53], [54] and passive filters [55] have been reported for harmonic suppression. However, these methods either significantly increase the TX insertion loss or require extra area to accommodate additional components.

This work explores an integrated low-noise, feedforward transmitter self-interference mitigation (SIM) technique, and a switch-mode power amplifier method which minimizes unwanted TX spurious output, see Fig. 3-1. The self-interference cancellation is performed at the RX input, which has the primary advantage of relaxing the required performance of the RX front-end with respect to linearity, dynamic range and the phase noise of any oscillator associated with the RX signal path. Similar to the proposed SIM technique, the HRPA method provides additional suppression of unwanted spectral components to relax the front-end filtering requirements and potentially free up valuable spectrum, ultimately improving the spectral efficiency. This paper describes an example prototype device which was designed and fabricated for full-duplex operation in the 2.4GHz band. The aforementioned techniques are used to demonstrate interference-immune full-duplex systems with a low-spurious emission transmitter.

The chapter is organized as follows. Section 3.2 analyzes the system level requirements of a full-duplex radio applicable for Bluetooth applications. A detailed description of a polyphase filter (PPF) based SIM technique and a PA output harmonic rejection method is given in Section 3.3. Circuit implementation details of the low noise self-interference mitigation system with HRPA implemented in a 40nm CMOS process are given in Section 3.4 and measurement results of the prototype device are provided in Section 3.5. Lastly, a few concluding comments are given in Section 3.6.
3.2 **System Analysis of an Example Full-Duplex Radio**

To enable full-duplex operation, a transceiver must significantly attenuate the self-interference signal that results from the TX output coupling through a “leaky” media to the RX input. This leakage path can be described by a channel that contains either a circulator or two-antennas [4].

Throughout this manuscript, a low output power wireless system, such as the Bluetooth standard, will serve to provide context on the required performance of a full duplex transceiver. Thus, this design assumes a PA which delivers 0dBm output power, with a signal bandwidth of 1MHz and an RX sensitivity as low as -80dBm when same-channel full-duplex operation is employed. With this scenario, the TX leakage should be attenuated in the RX analog front end (AFE) and digital baseband by more than 90dB, in order to hold the NF degradation to less than 0.5dB, assuming an RX front-end with a 5dB noise figure, see Fig. 3-1.

Assuming a digital canceler at the radio back end provides a TX SI attenuation of 40dB [56] and further assuming a 20dB isolation is provided in the front-end configuration, which is either a circulator or two-antennas, an additional 30dB cancellation is required in the analog/RF domain, see Fig. 3-1.

In summary, any full-duplex transceiver design for a radio with Bluetooth-like performance with respect to power output, bandwidth, and sensitivity must have more than 90dB TX leakage suppression in total. This includes 20dB of isolation from the leakage path between the TX and RX, at least 30dB cancellation in the RF/analog domain, and no less than 40dB cancellation in the digital backend. A more thorough description of the proposed cancellation network is provided in the next section.
3.3 Proposed Self-Interference Mitigation and Harmonic Rejection Technique

This section provides an architectural-level description with an analysis of the self-interference mitigation technique and Harmonic-Rejection PA (HRPA).

3.3.1 Proposed Self-Interference Mitigation Approach

Fig. 3-2 Circuit schematic of the proposed Polyphaser Filter (PPF) based Cartesian canceler.

The cancellation circuitry utilized on the prototype chip is based on a polyphase filter (PPF) followed by an output tunable $G_m$ stage, to achieve a 360° phase rotation which facilitates tracking of the TX signal, independent of the leakage medium. The PPF-canceler input is connected directly to the differential PA output matching network interface (see Fig. 3-2), and generates a four-phase
output with approximately equal amplitude and a phase difference of 0°, 90°, 180° and 270°. The canceler is digitally programmed by the baseband, to select two of the four phases at the PPF output. The gain of the two selected paths is modulated to provide the proper weighting of each of the two phases using digitally-tuned variable gain (\(G_m\)) stages. The two weighted signal paths are then combined at the output of the gain stage. This approach effectively realizes a Cartesian phase rotator in the cancellation signal path by generating a vector signal copy of the TX leakage signal. The phase rotator matches both the amplitude and phase of the TX leakage signal. The output of the phase rotator is injected into the RX input (at the LNA input).

A challenge associated with any feedforward canceler coupled to the receiver input, is the need to minimize the injection of in-band noise generated by the canceler. As described in the subsequent section, the noise performance of the canceler will improve with lower gain in the canceler output \(G_m\) stage.

\(1\) Noise

An ideal canceler must have a negligible impact on the RX noise figure when enabled [42], [51]. Thus, the equivalent noise resistance, looking back into the canceler from the perspective of the RX signal path, at the point of injection, must be comparatively higher as compared to the antenna 50Ω radiation impedance.
Fig. 3-3 Noise model and simulation results of the proposed canceler. (a) schematic model of the conceptual diagram, (b) simulation results of the receiver noise figure, canceler output equivalent noise resistance vs. canceler gain settings.
The dominant sources of noise in the canceler signal path may be attributed to the thermal noise generated from the resistors used in the PPF. However, the noise of these resistors is effectively attenuated by the canceler’s low-gain \( G_m \) output stage, by at least 20dB. It is somewhat counter intuitive to note that the noise performance of the cancellation path benefits from the need for low gain. This is due to the fact that the received signal of interest never passes along the canceler signal path. The dominant noise contribution of the canceler comes from the canceler’s output \( G_m \) stage which for this implementation is realized with CMOS inverters, similar to what is done in [57]–[60], see Fig. 3-3. A simplified noise model for the proposed canceller, which uses a NMOS device to represent the proposed \( G_m \) stage, is shown in Fig. 3-3 (a). At the frequency of interest, which is 2.4GHz, the dominant noise source of the combined NMOS and PMOS devices is the output thermal noise [61]. The thermal noise of a transistor operating in saturation region [62] is given in (3.1),

\[
\overline{\nu_n,\nu}^2 = 4kT\gamma g_m
\]  

(3.1)

Here, \( \gamma \) and \( g_m \) are the excessive noise coefficient and transconductance, respectively. Assuming the antenna and LNA are perfectly matched and have an impedance of \( R_{ANT} = R_{LNA} = R = 50 \ \Omega \), the \( G_m \) cell is loaded by the antenna impedance, \( R_{ANT} \) in parallel with the LNA input impedance, \( R_{LNA} \). The voltage gain of the canceler (A) is shown in (3.2), see Fig. 3-3.

\[
A = g_m \cdot R/2
\]  

(3.2)

The total RX NF including the contribution of the canceller is given in (3.3),

\[
NF_{\text{Total}} = 10 \cdot \log_{10}(NF_{RX} + 2\gamma A)
\]  

(3.3)

Here, \( NF_{RX} \) is the noise figure of just the receiver without any added noise from the canceler. Fig. 3-3(b) shows the RX NF simulation results including the canceller versus the canceler gain. For the purposes of simulation, a typical 4dB RX NF and a noise coefficient \( \gamma \) of 1 are assumed.
As can be inferred from Fig. 3-3(b), the RX NF degradation due to the addition of the canceler is a function of the canceler path gain. This is intuitively pleasing given that the $g_{m}$ associated with the common source output stage of the cancellation path will increase, as the gain of this path increases, so will the drain current noise injected into the RX signal path. Since the TX-to-RX media (circulator or two-antennas) will attenuate the leakage signal by at least 20dB, the maximum gain of the canceler is no greater than -20dB by design. Under this worst-case condition where the isolation between the transmitter and receiver is only 20dB, the NF degradation due to the cancellation path will be 0.4dB, see Fig. 3-3(b). This is equivalent to a noise resistance of at least 250 $\Omega$, looking back into the canceler from the RX point of injection, which is much higher than the 50$\Omega$ radiation impedance of the antenna. Intuitively, this is similar to the situation of minimizing the $G_{m}$ of the active load current sources used in a differential pair [63]. In the case of canceler and active load, the device is outside of the main signal path, requiring a minimization in the device $G_{m}$. Non-minimum length devices (70nm) are used in the transistors in the $G_{m}$ stage, this is predominantly done for two reasons. First, the excessive noise coefficient $\gamma$ gets smaller for a longer channel-length device [62]. Second, a longer channel length leads to a higher device output impedance, which will minimize the impact on the RX input matching network; for the purposes of this device, the output resistance is nominally 500$\Omega$, an order magnitude higher than a 50$\Omega$ front-end.

(2) PPF Sizing

The proposed canceler is made up of passive components (PPF) followed by active devices (a set of common source amplifiers to realize a $G_{m}$ stage), see Fig. 3-4(a). Any insertion loss introduced by the PPF in the cancellation path must be compensated for by applying more gain in the $G_{m}$ output stage, which has the adverse effect of injecting more noise into the RX signal path,
thus the importance of minimizing the insertion loss of the PPF. Conversely, an ideal canceler should present as high an input impedance (maximize the PPF input impedance) to the transmitter signal path as possible, to help minimize any loading effects on the PA output [51]. Thus, there exists a tradeoff between the input impedance and the insertion loss of the PPF.

Fig. 3-4 Effect of insertion loss using the proposed canceler. (a) schematic model of the PPF with loading capacitor $C_L$, (b) simulation results of the PA efficiency degradation versus PPF insertion loss.
The input impedance of the PPF is defined by the resistors and capacitors which realize the PPF [64]. Noteworthy is the fact that the capacitance in the PPF can be absorbed, and resonated with the PA output matching network, see Fig. 3-4(a). Thus, using larger value resistors in the PPF will effectively present a higher real-part impedance looking into the canceler, at the resonant frequency of the PA output matching network, thus minimizing loss and improving the PA efficiency.

The insertion loss of the PPF is a function of the load impedance, which is mainly contributed by the gate-to-source capacitance ($C_{GS}$) and any layout-related parasitic capacitance. A simplified model of the PPF with the loading emulated using a capacitor, $C_L$, is shown in Fig. 3-4(a) and the transfer function is given in (3.4),

$$|H(j\omega)| = |V_0(j\omega) - V_{180}(j\omega)| = \frac{\sqrt{1 + (\frac{\omega}{\omega_0})^2}}{\sqrt{1 + (\frac{\omega}{\omega_0} \cdot (1 + \frac{C_L}{C}))^2}}$$

(3.4)

Here, $\omega_0$ is the carrier frequency, C is the PPF capacitor value, and $C_L$ is the load capacitance, see Fig. 3-4(a). At the frequency of interest, $\omega \approx \omega_0$, (3.4) can be simplified to,

$$|H(j\omega)| \approx \frac{\sqrt{2}}{\sqrt{1 + (1 + \frac{C_L}{C})^2}}$$

(3.5)

Since $\omega_0 = \frac{1}{RC}$, with R representing the resistor value of the PPF, (3.5) can be re-written as,

$$|H(j\omega)| \approx \frac{\sqrt{2}}{\sqrt{1 + (1 + \frac{C_L}{RC})^2}}$$

(3.6)

As discussed above, larger resistors minimize any PA efficiency degradation due to the effects of loading the canceler. Assuming the load impedance presented to the PA by the canceller is $R_{L,PA}$, the PA efficiency degradation ($\eta'$) is described as,
\[ \eta' \approx \eta \cdot \left(1 - \left(\frac{R}{R + R_{L,PA}}\right)^2\right) \]  

(3.7)

Here, \( \eta \) is the standalone PA efficiency without the canceller. Solving equation (3.6) using (3.7) gives,

\[ |H(j\omega)| \approx \frac{\sqrt{2}}{\sqrt{1 + (1 + \omega_0 C_L \cdot \frac{R_{L,PA}}{1} - 1 - \frac{\eta'}{\eta})^2}} \]  

(3.8)

The simulation results of the PA efficiency degradation as a function of PPF insertion loss is shown in Fig. 3-4(b), where \( \eta = 40\% \), \( C_L = 50\, fF \) and \( f_0 = 2.4\, GHz \). The tradeoff between the PA efficiency degradation and PPF insertion loss can be observed in Fig. 3-4(b), and is mainly influenced by the resistor and capacitor sizing of the PPF. A larger input impedance helps to reduce the PA efficiency degradation, but will introduce more insertion loss in the PPF. This particular chip targets a PA output power of 15dBm, which requires an optimal load impedance \( (r_{opt}) \) of approximately 40\( \Omega \) when using a 1.2V supply voltage. In the actual design, the PPF utilizes a 900\( \Omega \) poly resistors and 74fF metal-to-metal capacitors. Thus, as desired, the real-part impedance looking into the canceller is significantly higher than the \( r_{opt} \) desired for a 15dBm output power with a 1.2V \( V_{dd} \).

(3) Canceler Summary

The output of the canceler, which injects a current signal into the receiver input, uses an inverter-based CMOS transconductance \( (G_m) \) stage. The NMOS and PMOS devices realize two common source amplifiers. To first order, the second and third order distortion generated by the \( G_m \) stage will cancel because of the use of complementary devices [57], [58]. The canceler has a measured in-band input \( P_{-1dB} \) of +12dBm. A two-tone test was performed to measure the third order

36
intermodulation product by applying CW tones at 100 kHz offset from a 2.4 GHz center frequency. The measured in-band IIP3 of the canceler is +15dBm. The linearity of the canceler could be further improved by, 1) bootstrapping the switches between the polyphase filter and the variable Gm stages, see Fig. 3-2; 2) providing additional attenuation in the canceller signal chain prior to the Gm gain stages; this would be done at the expense of modestly degrading the RX noise figure. This canceller has a simulated input and output impedances of 900Ω and 500Ω at 2.4GHz, which reduces the impact on the PA output and LNA input matching networks, respectively. The gain of the canceler is tunable using individual stages with 6-bits of resolution to cover a range from -40dB to -20dB. The maximum power consumption of the canceler is less than 0.3mW when the canceler is operated in the highest gain setting.

To reduce any layout-related asymmetry, an L-compensated approach was employed for the PPF similar to [65], [66].

3.3.2 Harmonic-Rejection Power Amplifier

Wireless transmitters employing digital power amplifiers (PA) usually require a filter at the output to suppress higher order harmonics. Such PA topologies require the use of discrete external components which utilize valuable platform (board) area and often introduce an insertion loss [53]–[55]. This work proposes implementing a harmonic-rejection function using the driver stages of the PA to effectively realize a stair-step like function, as was similarly done in [67] for a mixer, see Fig. 3-5. An alternate singled-ended version of this HRPA, which cancels the second harmonic, was published in [68].

There are many embodiments of the harmonic rejection PA, with various combinations of the weighting associated with either the driver stages or main output stage. This chip targets the reduction of the 3rd, and 5th harmonics produced by the PA. The PA outputs combine three parallel
driver stages with a relative phase difference of 0°, 45°, 90°, and gain ratios of $1, \sqrt{2}, 1$, respectively. This approach obviates the need for discrete external components to filter the 3\textsuperscript{rd} and 5\textsuperscript{th} harmonics, for moderate output power applications (e.g. Bluetooth). The proposed topology is appropriate for any class of digital PA, see Fig. 3-5. The harmonic cancellation is mainly dependent on the phase and gain matching between three stages, as with any cancellation method including single-sideband suppression [69], [70].

![Fig. 3-5 Architecture of the proposed harmonic-rejection power amplifier](image)

The HRPA topology combines three parallel signal paths at the PA output, thus necessitating the use of a robust method for power combining between the outputs of different driver stages, while minimizing any efficiency degradation. A switched-capacitor PA (SCPA) is a convenient approach to efficiently combine the signal power using capacitors in the current domain. The SCPA also provides isolation between three parallel signal paths which helps minimize the effect
of mutual loading between driver stages, as the drivers turn on and off with different phases, to achieve a harmonic-reject function.

3.4 **Circuit Implementation of the Cartesian Canceler**

As a demonstration of the proposed full-duplex canceller system with the HRPA, a transceiver front-end was designed to have performance similar to a Bluetooth radio with respect to PA power output, bandwidth, signal modulation, and sensitivity, with the additional feature of enabling full-duplex communication. A block diagram of all the components realized on this chip is shown in Fig. 3-6.

![Fig. 3-6 Detailed block diagram of the full-duplex front-end with low-noise same-channel self-interference cancellation and harmonic-rejection power amplifier.](image)
The receiver realizes a noise-cancelling current-mode architecture including an RF current mode LNA which drives a network of passive mixers, followed by a baseband trans-impedance amplifier (TIA) and re-combination circuitry as done in [19], [22], which helps to improve the out-of-band linearity. By properly weighting the re-combination circuitry, the entire RX front-end realizes a noise-cancelling function. The desired signal is first converted to the current domain at the LNA output, and remains in the current domain until it is converted to the voltage domain with a transimpedance amplifier (TIA) after the mixer. The TIA acts as a low pass filter to provide attenuation by as much as 20dB for a 10MHz out-of-band blocker, which helps to improve the out-of-band linearity.

Integrated on the same chip as the receiver is a differential-input-differential-output switched-capacitor PA (SCPA) which implements a harmonic-reject function. The PA was made fully differential to reduce any unwanted TX-to-RX coupling. The PA matching network consists of a differential-to-single-ended transformer, designed to drive a 50Ω load associated with either a circulator or antenna (can be configured for both), see Fig. 3-6. A nominal gain ratio of $1, \sqrt{2}, 1$ will achieve the maximum cancellation. For this implementation, three signal paths, which include a multi-stage pre-driver and output driver stage, were realized using multiple unit cells in each path with relative path-to-path weighting of 12/17/12 [59], [71]. A digitally controlled phase shifter generates the nominal phase difference of $0^\circ, 45^\circ$ and $90^\circ$. The phase shifter uses a current starving topology, which is similar to [72] and has a measured LSB of 0.6pS. This is equivalent to 0.52$^\circ$ resolution at 2.4GHz.

The canceller is implemented to achieve maximum self-interference mitigation using a one-stage PPF and tuning the gain weights using 6-bit binary $G_m$ cells. A low power integrated divide-by-two circuit generates a four phase 25% duty cycle LO to drive the passive mixers [73].
The chip was fabricated in a 40 nm 6-metal-layer TSMC CMOS process and occupies an area of 1.665 mm×1.17 mm which includes the bond pads (Fig. 3-7). The chip was wire-bonded directly to the test-board using chip-on-board packaging. The RX input and TX output air interface can be either a circulator or two antennas on the test board, with the option of changing between the two configurations for testing purposes.

Any unwanted coupling from the TX to the RX will degrade the self-interference cancellation performance. This chip utilizes four techniques to improve the TX and RX isolation: 1) The TX and RX have separate ground planes on chip and are shorted together through a ferrite bead on the test-board; 2) The layout of the TX and RX are perpendicular with respect to each other, to reduce any unwanted EM coupling; 3) Deep-N well was utilized wherever possible in the PA and LNA;
4) >50µm substrate contact shield is used between the TX and RX. The measured isolation between the TX and RX is larger than 65dB.

3.5 Measurement Results

![Fig. 3-8 Full-duplex radio (AFE) measurement setup.]

To characterize the validity of the proposed self-interference cancellation and harmonic rejection PA, standard front-end measurements with respect to the RX gain, noise figure and TX output spectrum were taken with the cancellation/rejection systems both enabled and disabled; this measurement setup is shown in Fig. 3-8. All self-interference cancellation measurements were performed using two leakage media, a circulator (MECA Electronics: CS-2.500) and two antennas...
(Linx Technologies: ANT-2.4-CW-RCS). The loss associated with each of these two media were measured using Agilent Network Analyzer (N5247A) and found to have isolation of 20-30dB, see Fig. 3-9.

Fig. 3-9 Measured isolation of two leakage media using either: 1) a circulator or 2) two antennas.

The standalone RX gain, linearity, noise figure, and TX output spectrum were measured using Agilent ESG vector signal generator (E4438C), MXG microwave analog signal generator (N5183A) and PSA spectrum analyzer (E4440A). The TX leakage cancellation was measured using a vector signal generator which supplied both a CW and a Bluetooth modulated signal to the PA input. The spectrum analyzer was connected to the RX output and measured the cancellation
in the frequency domain with the canceller enabled/disabled. A similar test setup was used to measure the harmonic rejection at the PA output. A laptop with an Aardvark I2C/SPI host adapter (Total Phase, Inc.) provided digital control. All measurements were taken on eight separate chips.

3.5.1 Standalone RX Measurement Results

![Graphs showing receiver performance metrics](image)

**Fig. 3-10** Measured receiver performance. (a) input matching $S_{11}$, (b) gain $S_{21}$, (c) IIP3 versus offset frequency, (d) input-referred $P_{1\text{dB}}$ versus offset frequency.

The RX operates from 1 to 3 GHz with a wideband input matching ($S_{11}$ less than $-10$ dB, see Fig. 3-10 (a)), gain of 45 dB, a 3 dB channel bandwidth of 1.5 MHz, see Fig. 3-10 (b). Due to the high linearity associated with the current-mode RX architecture, the measured out-of-band IIP3 is $+18$ dBm with an $+8.5$ dBm out-of-band input $P_{1\text{dB}}$, see Fig. 3-10 (c) and (d). The measured in-
band IIP3 and P₁dB of the RX is -25dBm and -35dBm, respectively. The in-band linearity of the RX is limited by the RX gain settings and baseband voltage supply, which could be improved by reducing the RX gain. The remaining RX gain could be provided by highly-linear switch-capacitor gain stages, prior to an ADC, to complete the baseband. The RX consumes 10mA from a 1.1 V supply which includes 4mA for the LNA, 0.7mA for each of the four baseband operational amplifiers and 3mA for the LO dividers.

Fig. 3-11 Measured receiver noise figure performance with the canceller disabled (off).

The RX has a measured double-sideband NF of 4.5 dB across the RX band. A measured RX NF with a fixed LO frequency of 2.4GHz is shown in Fig. 3-11. To measure the noise figure in the presence of a strong out-of-band blocker signal, a RX blocking NF measurement is taken as
outlined in the Bluetooth standard [62]. The worst-case blocking performance is measured when a -67dBm desired signal is applied to the frontend, which is 3dB higher than the required RX sensitivity at 2.4 GHz, while a blocker 1MHz away (2.399GHz) with an amplitude of -27dBm, is also supplied. Under this condition, the measured RX NF degrades by less than 1dB. The total RX NF with the worst case out-of-band blocker signal (-67dBm, 1MHz offset from the carrier signal) is less than 6dB, which is sufficient for applications similar to the Bluetooth standard [62].

3.5.2 Self-interference Cancellation Measurement Results

To measure the functionality of the canceler, a CW signal was injected at the PA input. If the canceler is disabled, and the PA is set to 0dBm, the RX signal path will be saturated by the leakage signal. Thus, to accurately access the amount of leakage suppression, the leakage was measured at the output of the receiver with a lower PA output power of -10 dBm. By properly selecting two of the four PPF outputs, and weighting the gain using the tunable $G_m$ stage at the canceler output, an average suppression of 35dB was observed from 2.4GHz to 2.5GHz using either the circulator or two antennas, see Fig. 3-12. These measurements were taken by changing the magnitude and phase of the proposed canceler at a given frequency, to maximize the cancellation.
Fig. 3-12 Measurement results of TX leakage suppression using a single CW signal with two leakage media, a circulator and two antennas.

Similar to [51], the cancellation bandwidth was measured in two steps. First, the cancellation was maximized at a particular frequency by properly adjusting the magnitude and phase of the canceller for maximum suppression. Then, holding the magnitude and phase settings constant, a CW signal was applied and swept in frequency while the relative cancellation at the output of the RX was measured. Fig. 3-13 shows a cancellation bandwidth of more than 4MHz in the case of the front-end configured with either a circulator or two antennas.
Fig. 3-13 Measurement results of TX leakage suppression versus bandwidth using both a circulator and two antennas.

This amount of same channel SIM suppression is sufficient to achieve a sensitivity for standards similar to Bluetooth, while in full-duplex operation. Again, holding the magnitude and phase settings constant, a modulated Bluetooth GFSK signal was applied while the RX output spectrum was measured with both the canceller enabled and disabled. A minimum 30dB of cancellation was observed over the signal bandwidth, see Fig. 3-14.
Fig. 3-14 Measured TX suppression using a modulated GFSK signal. Baseband output spectrum with both cancellation enabled and disabled, (a) measurement with a circulator, (b) measurement with two antennas.
This canceler is able to rotate 360° relative to the carrier frequency, thus a full period of the carrier can be tracked. However, the question arises as to how much cancellation can be achieved if the delay mismatch between the TX leakage signal and the canceler output, is greater than one period of the carrier. Intuitively, this is best understood by realizing there are several orders-of-magnitude differences between the carrier frequency (2.4GHz) and the BW of the modulated signal (1MHz). If the delay mismatch between the leakage and cancellation paths is greater than one period of the carrier (0.417ns), the canceller is still capable of substantially reducing the leakage signal. This remains true even if the delay mismatch represents several periods of the carrier. This is understood intuitively by considering the slow change in the modulation of the carrier signal from one carrier period to the next. This chip operates on a Bluetooth signal where the envelope of the modulated signal has a period of 1µs (1MHz BW). Thus, the change in the modulated signal relative to the potential delay mismatch is very small (less than several ns). If the cancellation is limited by the time delay mismatch between the two paths (typically phase and gain mismatch sets the limit), the maximum cancellation can be expressed with respect to the delay mismatch and the period of the modulated signal (3.9),

\[
\text{Maximum Cancellation} \propto \frac{\text{Delay Mismatch}}{\text{Envelope Period}} \quad (3.9)
\]

Thus, for a simulated time delay mismatch of 1.5ns, the lower bound to the cancellation is approximately 44 dB. This is well below the measured cancellation of 30dB and does not represent the limitation to the measured cancellation.
RX noise figure measurements are performed to characterize the canceler network. The measurement is conducted with a desired RX signal 100 kHz offset frequency away from the TX signal. After the canceler is enabled, the RX NF degradation is 0.6dB, see Fig. 3-15.
3.5.3 Harmonic Rejection PA Measurement Results

The proposed HR SCPA is measured with both single-tone and modulated signals and was found to reduce the 3rd and 5th harmonics by 30dB and 15dB, respectively (see Fig. 3-16). The PA has a measured 14 dBm maximum output power with a maximum 33% drain efficiency, see Fig. 3-17. Using a GFSK modulated input signal while the PA is set to maximum output power, a 0.6% EVM was measured. The PA output spectrum with a modulated GFSK signal input is shown in Fig. 3-18.
Fig. 3-17 Measured PA output power and drain efficiency.
Fig. 3-18 Measured PA output spectrum at peak output power with a modulated Bluetooth GFSK input signal.

A comparison and performance summary is shown in Table 3-1.
### Table 3-1 Comparison and performance summary for polyphase-filter-based canceler

<table>
<thead>
<tr>
<th>Self-interference Cancellation</th>
<th>T. Zhang' 2015 JSSC</th>
<th>J. Zhou' 2014 JSSC</th>
<th>J. Zhou’ 2015 ISSCC</th>
<th>This Work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Architecture</td>
<td>Transformer Coupling</td>
<td>Active Two Port</td>
<td>Frequency Domain</td>
<td>Polyphase Filter + Active Gm Stage</td>
</tr>
<tr>
<td>Technology/VDC</td>
<td>40nm/1V</td>
<td>65nm/7</td>
<td>65nm/7</td>
<td>40nm/1.1V</td>
</tr>
<tr>
<td>Tx-to-Rx Interface Isolation (dB)</td>
<td>&gt;50</td>
<td>30</td>
<td>30-50</td>
<td>20-30</td>
</tr>
<tr>
<td>Tx Suppression (dB)</td>
<td>&gt;23</td>
<td>&gt;30</td>
<td>&gt;35</td>
<td>&gt;35</td>
</tr>
<tr>
<td>Cancellation BW</td>
<td>Cancellation (dB)</td>
<td>20</td>
<td>20</td>
<td>30</td>
</tr>
<tr>
<td></td>
<td>BW (MHz)</td>
<td>5</td>
<td>1.7</td>
<td>4.8/4</td>
</tr>
<tr>
<td>NF degradation due to leakage cancellation (dB)</td>
<td>&lt;0.1 or 1.2</td>
<td>&lt;0.8</td>
<td>0.9-1.2</td>
<td>0-0.6</td>
</tr>
<tr>
<td>RX Out-of-the-Band IIP3 (dBm)</td>
<td>N/A</td>
<td>12</td>
<td>17</td>
<td>18</td>
</tr>
<tr>
<td>RX Effective Out-of-the-Band IIP3 with respect to SI cancellation (dBm)</td>
<td>N/A</td>
<td>33²</td>
<td>N/A</td>
<td>30⁵</td>
</tr>
<tr>
<td>RX In-Band IIP3 (dBm)</td>
<td>+3</td>
<td>N/A</td>
<td>-20</td>
<td>-25</td>
</tr>
<tr>
<td>RX Effective In-Band IIP3 with respect to SI cancellation (dBm)</td>
<td>N/A</td>
<td>N/A</td>
<td>+2⁵</td>
<td>5.5⁵</td>
</tr>
<tr>
<td>RX In-Band P.1dB (dBm)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>-35</td>
</tr>
<tr>
<td>RX Out-of-the-Band P.1dB (dBm)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>8.5</td>
</tr>
<tr>
<td>RX In-Band SC-FD P.1dB with cancellation (dBm)</td>
<td>N/A</td>
<td>N/A</td>
<td>&gt;&gt;-8⁶</td>
<td>-6⁶</td>
</tr>
<tr>
<td>C canceller Input P.1dB (dBm)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>12</td>
</tr>
<tr>
<td>C canceller IIP3 (dBm)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>15</td>
</tr>
<tr>
<td>C canceller Power Consumption (mW)</td>
<td>0</td>
<td>13-72</td>
<td>44-91³</td>
<td>0.25⁴</td>
</tr>
<tr>
<td>C canceller Area (µm²)</td>
<td>0 or 400*400¹</td>
<td>-NA</td>
<td>-NA</td>
<td>131*112.5</td>
</tr>
</tbody>
</table>

¹The FPC canceller is inside a singled-end to differential balun and has an area of approximately 400um*400um and an insertion loss of less than 1.2dB.
²Measurement taken with an antenna pair. 15MHz BW, 0.9-1.2dB NF degradation is with one filter. 25MHz, 1.1-1.5dB is with two filters.
³The power consumption includes 0.47mW Gm cells and 44mW LO for one filter.
⁴The NF degradation and power are related to the TX-RX isolation and offset. 0.6dB NF degradation and 0.25mW are measured with 20dB TX-RX isolation and 100kHz offset.
⁵Effective IIP3 under FDD SI cancellation from triple beat measurement.
⁶Under SC-FD SI cancellation.

### Table 3-2 Comparison of harmonic rejection

<table>
<thead>
<tr>
<th>Harmonic Rejection</th>
<th>T. Sano’ 2015 ISSCC</th>
<th>A. Ba’ 2014 RFIC</th>
<th>This Work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Architecture</td>
<td>Conduction Angle Calibration</td>
<td>Conduction Angle Calibration</td>
<td>HR SCPA</td>
</tr>
<tr>
<td>Technology/VDC</td>
<td>40nm/1.1V</td>
<td>40nm/1.0V</td>
<td>40nm/1.2V</td>
</tr>
<tr>
<td>Output Power (dBm)</td>
<td>0</td>
<td>-NA</td>
<td>14</td>
</tr>
<tr>
<td>Drain Efficiency (%)</td>
<td>-NA</td>
<td>39</td>
<td>33</td>
</tr>
<tr>
<td>Off-chip Matching Network</td>
<td>No</td>
<td>Yes</td>
<td>No</td>
</tr>
<tr>
<td>HD2 (dBc)</td>
<td>-52.3(average)/-45.8(worst)</td>
<td>-50</td>
<td>-44.3</td>
</tr>
<tr>
<td>HD3 (dBc)</td>
<td>-48</td>
<td>&lt;50</td>
<td>-22.6/-51.9¹</td>
</tr>
<tr>
<td>HD5 (dBc)</td>
<td>-NA</td>
<td>&lt;50</td>
<td>-30.5/-45.3¹</td>
</tr>
<tr>
<td>EVM (%)</td>
<td>-NA</td>
<td>-NA</td>
<td>&lt;0.65</td>
</tr>
</tbody>
</table>

¹Measured with before and after harmonic rejection.
3.6 CONCLUSIONS

This chapter explored methods for TX self-interference cancellation and TX output harmonic rejection for FDD/full-duplex radios using a Cartesian Rotator Cancellation Network. This effort describes two circuit-level innovations (low-noise self-interference cancellation and PA harmonic rejection) which contribute toward reducing the interaction between the transmitter (TX) and receiver (RX), allowing full-duplex operation. The proposed front-end includes a feedforward cancellation circuit capable of reducing the TX leakage signal in the RX signal path by 30dB over a bandwidth of more than 4 MHz. The TX uses a harmonic-rejection PA (HRPA), which reduces the 3rd and 5th harmonics by 30 dB and 15 dB, respectively.

Potential applications for this technique include any future standards targeted for full-duplex communication, current FDD systems, Wi-Fi-Bluetooth coexistence, and any radio which must co-exist with a significant transmitter self-interference signal.
4. A WIDEBAND DUAL-INJECTION PATH SELF-INTERFERENCE CANCELLATION ARCHITECTURE FOR LONG-RANGE CELLULAR FULL-DUPLEX TRANSCEIVERS

4.1 INTRODUCTION

Chapter 3 describes a low noise, low power self-interference cancellation circuitry for Bluetooth applications. In this chapter, a new prototype chip is discussed and attempted to solve the two limitations of the Bluetooth chip and demonstrates a wideband self-interference cancellation with a high-power output transmitter.

Recently, the demand for higher wireless data rates continue to grow with an estimate of a 5-10 times increase over the next five years. This is driven by applications ranging from mobile smartphone, to cloud computing and the internet-of-things (IOT) [74]. However, the existing bands from 1-to-6 GHz are completely occupied by commonly found commercial standards, such as ZigBee, GPS, cellular phones and WiFi networks, in addition to frequency allocations dedicated to government functions including police, fire and military wireless communications. Full-duplex (FD) communication presents an opportunity to increase spectral efficiency by allowing a single radio’s transmitter (TX) and receiver (RX) to simultaneously operate using one carrier frequency (on the same channel) [75]–[77], for better use of the existing commercial radio spectrum and effectively increasing the capacity of existing wireless networks.
Fig. 4-1 SI cancellation in FD communication (a) SI cancellation requirement in a FD radio with TX output power of 30dBm and RX required sensitivity of -80dBm, (b) SI cancellation distributed in the receiver chain.

Although FD communication will potentially increase spectral efficiency by as much as 2X when compared to existing frequency division duplex (FDD) systems, significant engineering challenges exist which includes the cancellation of the TX self-interference (SI) [31], [34], [56],
signal present at the receiver front-end. The large difference between the maximum TX power and the minimum required RX sensitivity demands a method to attenuate the strong TX SI signal, this is particularly true in high-performance applications including cellular and proposed 5th Generation (5G) wireless systems. As an example, the TX of cellular mobile radios, often transmit as much as 30dBm output power while the RX sensitivity is as low as -80dBm. Depending on the modulation method, more than 120dB SI cancellation may be required, Fig. 4-1 (a).

Fig. 4-2 Channel response of the leakage path (a) multiple time delay paths of the leakage signal coupled from PA output to LNA input, (b) impulse response of a discrete circulator from Meca Electronics, (c) phase response of the measured discrete circulator, (d) frequency response for each time delay versions of the leakage signal.

To achieve a high SI cancellation (~120dB) compatible with longer-range radios which require a high-output power transmitter, the function of self-interference cancellation should be distributed
t along the RX chain. For example, assuming an air interface of either one antenna with a circulator or two antennas, an isolation up to 30dB, and further assuming a 40dB cancellation is provided by the digital backend, this leaves approximately 50dB of required cancellation in the analog/RF front-end, see Fig. 4-1(b) [56].

An ideal SI canceler should contribute minimal noise to the receiver front-end, occupying minimal silicon area and have low power consumption [51]. Numerous recent efforts have explored methods to perform SI cancellation in the analog/RF front-end to emulate the function of an ideal canceler discussed in [81]. These cancellation techniques can be categorized as either passive [44], [81]–[85] or active methods [56], [79], [86]–[89]. However, these methods often provide insufficient canceler linearity performance which limits the maximum PA operating power. Moreover, both the depth and bandwidth of the cancellation are insufficient for applications which use high-output power transmitters. Thus, the focus of current SI canceler design moves to achieve wider bandwidth, deeper cancellation, higher canceler linearity than prior art, to allow integration of higher output power PA while operating in the full duplex mode.

It is challenging to design a high-performance wideband SI canceler due to fact that there are multiple time delaypaths of the SI signal over the channel bandwidth of interest. A transceiver front-end with a circulator and a single TX/RX shared antenna is used to illustrate this concept, see Fig. 4-2 (a). The self-interference is a combination of several leakage paths which include: 1) a direct-coupling path through the circulator, chip and board substrate, 2) reflection from the antenna due to the imperfect matching between the antenna and RX, and 3) a combination of other environmental reflections from nearby objects (e.g. human ear). To verify the aforementioned coupling theory of multiple time-delayed versions of self-interference, a commonly found discrete circulator (Meca Electronics #CS-1.950) was measured (see Fig. 4-2(b)), the results of which
support the coupling mechanism shown in Fig. 4-2(a). Similar to a circulator, a front-end with a dedicated TX and RX antenna will also find a leakage path which consist of multiple time-delayed responses. After performing a Fourier transform, the measured phase response of the circulator from the TX port to the RX port is shown in Fig. 4-2(c). The phase of the circulator also changes significantly over the band of interest, by more than 10 degrees over the 40MHz bandwidth, see Fig. 4-2(c). Each of the SI leakage paths will have a different frequency response and transfer function, Fig. 4-2(d). Assuming the leakage path is a linear time-invariant (LTI), in order to achieve a wideband cancellation, the canceler needs to provide an inverse channel response of all the summed leakage paths.

Several recent research efforts attempt to achieve wideband and high SI using a single feed-forward path [31], [87] or a single point of injection [90]. In [87], a 2\textsuperscript{nd} order Gm-C N-path filter was used to perform frequency domain equalization. However, this filter requires generating a LO
with multiple phases, which may dissipate considerable power to drive the LO switches. Also, the
cancellation bandwidth is limited by the order of the filter. An alternative method synthesizes an
inverse leakage signal at the LNA input using a current DAC and up-conversion mixer [90].
However, for applications requiring a high RX sensitivity, the DAC quantization noise will likely
degrade the RX sensitivity. Thus, it is challenging to design a single path feed-forward path
canceller which provides sufficient SI cancellation. Generally speaking, any on-chip single-
component cancellation circuit will achieve no more than 50dB suppression. Examples might
include single-sideband mixers, filters and harmonic traps, to name a few.

The remainder of this chapter explores a SI cancelling architecture that overcomes those
aforementioned challenges. This is followed by an expanded description of the prototype chip,
which was first given in [78].

4.2 DUAL-INJECTION PATH FULL DUPLEX ARCHITECTURE

As mentioned in the aforementioned section, it is challenging to achieve a wideband and high
SI cancellation using a single feed-forward path with ideal canceler characteristics [34], [81], [82],
[87], [90]. In the proposed transceiver, a dual-path SI cancelling architecture is designed to achieve
wideband and high SI cancellation, see Fig. 4-3. Two analog cancelers, both inputs attached to the
TX output matching network, are designed to properly capture not only the TX carrier signal, but
also the noise and non-linearity of the TX.

The first coarse RF canceler is attached to the RX LNA input and is designed to reduce the TX
SI power sufficiently to prevent the RX front-end from saturating while relaxing the linearity
requirements for the LNA and all the subsequent RX blocks. The second baseband canceler down-
converts the leakage signal using a mixer, allowing a low-frequency implementation of a more
complex, higher order, lower power analog cancelling filter. The baseband canceler has an added
advantage of capturing and cancelling the TX leakage reciprocal mixing with the LO phase noise in the RX signal path. The combination of both cancelling filters provides a wideband, high SI cancellation. The design considerations of the proposed cancelling architecture will be discussed next.

4.3 PROPOSED SYSTEM DESIGN CONSIDERATION

Several practical design issues for the proposed system are discussed in this section, which include: 1) system level design tradeoff between the cancellation bandwidth and noise performance, 2) canceler design techniques to improve the linearity, 3) cancellation of TX SI signal reciprocal mixing with the LO phase noise in the second, baseband cancellation path. A discussion of these techniques begins with the RF canceler.

4.3.1 The First Coarse RF Canceler
The RF cancellation path is realized by an analog adaptive filter and the top-level block diagram of the RF canceler is shown in Fig. 4-4. A similar approach to the RF canceler was demonstrated in the discrete form at Sigcomm [75]. As mentioned above, the RF canceler is designed to attenuate the TX SI signal significantly at the RX input to relax the linearity requirements of the LNA and subsequent RX blocks. However, there exists a design tradeoff between the number of taps, RX noise figure degradation and TX SI cancellation BW, which is illustrated in Fig. 4-5. Simulations were performed with Matlab Simulink and Cadence spectreRF. In these simulations, the RX baseline noise figure is assumed to be 4dB and each tap has the same power consumption and contributes equal amount of noise. The tap delay line is modeled as a first-order all pass filter (APF) with a transfer function shown below [91],
Fig. 4-5 Design tradeoff between number of taps, canceler noise and cancellation BW, simulation performed in Matlab Simulink and Cadence specreRF

$$H(jw) = \frac{1 - jw\tau}{1 + jw\tau}$$  \hspace{1cm} (4.1)

Here, $\tau$ is defined as a time constant. The group delay of the first-order APF could be derived using (4.1),

$$\tau_g = \frac{2\tau}{1 + (w\tau)^2}$$  \hspace{1cm} (4.2)

The group delay of the APF is maximized at the frequency of interest when

$$\frac{\partial (\tau_g(\tau))}{\partial \tau} = 0$$  \hspace{1cm} (4.3)
Solving equation (4.3) using (4.2) gives, \( \tau_g(j\omega) \) achieves the maximum value when \( \tau = \frac{1}{w} \).

\[
\tau_g(j\omega)_{\text{max}} = \frac{1}{w} = \frac{1}{2\pi f}
\]  

From (4.4), the maximum delay per each APF stage is limited by the operating frequency. As an example, if the carrier frequency is 2GHz, the maximum delay for each APF stage is 80ps. If the adaptive filter is analyzed using a sampling theory where each tap samples the input signal at different time location and reconstructs the signal using different weightings, an 80ps time delay per tap will be equivalent to a sampling frequency of 12.5GHz or an oversampling ratio of 3.14. In order to emulate the TX SI signal, the delay per APF stage should be small enough to capture the fast-transient change of the TX SI signal and the total delay for all APF stages needs to be long enough to cover the delay spread [92]. The majority time delay paths of the TX SI signal are the direct coupling path and antenna reflection, see Fig. 4-2(b). The delay spread between these two paths is approximately 250-350ps. If the canceler is designed to capture these two primary time delay paths, 4-5 taps is sufficient. More taps could help suppress other leakage paths due to “environmental reflections” which arrive at the RX input later time as compared to the two primary leakage path signals because path of the leakage signal for this type of reflection is usually off the antenna, reflecting of a nearby object, then being re-received, coupling into the RX input. However, more taps will add more active devices and noise to the RX, see Fig. 4-5. Thus, in this design, five taps were determined to be optimal based on a requirement of 40MHz cancellation BW and minimal RX noise figure degradation. This was done as the “environmental reflections” tend to be weaker in strength and can be addressed at baseband, after sufficient gain is added to overcome the noise associated with a second cancellation path that includes more filter coefficients.
In order to operate this FD system with a high output power PA, the canceler must be designed to achieve as high a linearity as possible. This is required as any intermodulation components generated along the feedforward canceller path, due to a non-linearity, will be directly added at the RX input, degrading the RX sensitivity. For this design, the canceler input is attached to the large swing of the PA output (usually $V_{peak} \sim 3V$) and the required output swing of the canceler is usually small, which is less than 0dBm (assuming 30dBm PA and circulator isolation of 30dB), therefore, the linearity bottleneck of the canceler is the large input voltage swing presented at the input. Adding an attenuator at the input of the canceler helps to improve the linearity of the RF cancellation path, see Fig. 4-6.
Fig. 4-7 RF canceler linearity versus attenuator capacitor \( C_1 \) (a) conceptual diagram, (b) simulation results of capacitor value versus RX noise figure and canceler IIP\(_3\).
Fig. 4-8 The linearity of the RF canceler and an additional attenuator is introduced at the input of the RF canceler to improve linearity.

The attenuator was implemented by a simple capacitor, $C_1$, which is in series with the RF canceler, formed as a voltage divider with the input impedance of the canceler ($C_{in}$), to reduce the large voltage swing at the input of the canceler, see Fig. 4-7(a). However, any attenuation from capacitor $C_1$, needs to be compensated by providing more active voltage gain in the RF canceler path, since there needs to be a gain match between the cancellation path and the SI signal path. However, as described in chapter 3, additional voltage gain in the canceler signal path implies more active gain which will have the effect of introducing/injecting more noise at the LNA/RX input. A simulation illustrating the design tradeoff between the capacitor value, canceler IIP$_3$ and RX noise figure may be seen in Fig. 4-7(b). Here, one observes that a lower capacitor value, $C_1$,
signal attenuation at the RF canceler input, will have the effect of improving the linearity, but
degrading the RX noise figure. For this specific design, setting $C_1$ to 130fF provided an ideal
tradeoff between the canceler linearity (36dBm) and the RX noise figure degradation (0.7dB). The
series capacitor $C_1$ has the additional benefit of increasing the input impedance looking into the
canceler which minimizes any loading on the PA output stage. For this design, the real part of the
canceler input impedance is larger than 7k Ohm, which is significantly higher than the $R_{opt}$ of the
PA, see Fig. 4-8.

To reduce the required input-referred linearity of the canceler, the input is connected to the
primary side of the PA output transformer which is a lower impedance than on the secondary side
which has an 50Ω antenna port impedance. As such, because the impedance looking into the
primary is set by the required $R_{opt}$ of the PA, which is 9Ω, see Fig. 4-8, the peak voltage of the
transmit signal will be significantly lower. Thus, the primary side of the transformer sees a much
smaller voltage swing, which relaxes the canceler input-referred linearity requirement. However,
similar to the case of adding an attenuator (capacitor) at the canceler input to improve the linearity,
attaching the input to the low impedance PA output on the primary side of the transformer, will
reduce the effective canceler power gain, see equation (4.5) below:

$$\text{Canceler power gain} = 10 \cdot \log_{10}(A_v^2 \cdot \frac{Z_{tx}}{Z_{rx}})$$  \hspace{1cm} (4.5)

Here, $A_v$ is the canceler voltage gain, $Z_{tx}$ and $Z_{rx}$ are the impedance from the perspective of the
canceler input and output looking back into the TX and RX, respectively. As an example, if $Z_{tx}$ is
9Ω and $Z_{rx}$ is 50Ω, there exists a power gain loss of 7.4dB from the impedance transformation.
To compensate for this power gain loss, the active canceler circuitry is required to be designed
with 7.4dB higher voltage gain, which will increase both the canceler power consumption and
noise contribution to the RX.
The output impedance of the canceler is boosted by using non-minimum length device (L=65nm). The output impedance of the canceler is larger than 500 Ohm, which is significantly higher than the LNA input impedance to reduce any loading effects for the LNA input matching network.
Fig. 4-9 Baseband canceler top level block diagram
4.3.2 The second fine baseband canceler

The argument and design methodology for the baseband canceler is similar to the RF canceler. The baseband canceler uses a down-conversion mixer to translate the carrier frequency down to the baseband and provides a second cancellation path to suppress the TX SI at the RX baseband output. Since the baseband canceler is connected to the RX output, the noise requirement of the baseband canceler is more relaxed benefiting from the RX gain.

![Baseband canceler diagram](image)

Fig. 4-10 Baseband canceler for attenuation of TX carrier signal

The baseband canceler is made up of 14-tap analog adaptive filter with a down conversion mixer, see Fig. 4-9. The baseband canceler serves two functions: 1) It attenuates the TX SI carrier signal as well the noise and non-linearity of the PA and transmitter, 2) It cancels the TX SI reciprocal mixing with the LO phase noise in the RX signal path. These concepts will be described in the following paragraph.
The TX SI carrier signal travels in two paths. In one path, the TX SI signal passes down the circulator and RX chain, while in the other path, it travels through the baseband canceler, see Fig. 4-10. To get a proper cancellation, these two paths must be matched in time. The matching was achieved by a combination of variable time delay blocks $T_D$ and $T_{D3}$.

Fig. 4-11 Baseband canceler for attenuation of TX leakage signal reciprocal mixing with LO phase noise in RX signal path

The baseband canceler could also reduce the TX SI signal reciprocal mixing with the LO phase noise in the RX signal path. In Fig. 4-11, the TX output is considered as a single tone signal for simplification. Since the LO of the RX and the baseband canceler come from the same synthesizer, the phase noise of these two paths are highly correlated. Thus, if the RX and baseband cancellation paths are matched in time, not only the carrier, but also the phase noise associated with the TX SI signal will be cancelled. The mathematical analysis for the baseband canceler is shown in the Appendix. Intuitively, the adaptive filter in the signal path of the baseband canceler ($T_{D3}$ in Fig.
4-11) matches any delay introduced by the baseband circuitry in the RX signal path ($T_D2$). The additional delay in the LO path ($T_D$) matches any delay introduced from the circulator and front-end LNA ($T_D1$). The delay in the LO path could also be implemented before the down-conversion mixer of the baseband canceler, but needs to be designed with high linearity and wide bandwidth.

Next, the circuit implementations will be described.

Fig. 4-12 A system diagram of the 40nm CMOS full-duplex system.
Fig. 4-13 Transistor level implementations of 5-tap FIR-based RF canceler and the low noise amplifier
4.4  CIRCUIT IMPLEMENTATION OF THE DUAL-INJECTION PATH

FULL DUPLEX TRANSCEIVER FRONT-END

A high-level block diagram of the proposed dual-injection path FD chip, which includes all the integrated transceiver components, is shown in Fig. 4-12. All the signal lines are designed differentially to reduce the possibilities of unwanted coupling between various blocks and improve the even order performance, such as the $I_{IP2}$. The inputs of both cancelling filters are attached to the low impedance node of the PA output matching network and perform TX SI cancellation at the RX input and baseband output. An integer-N synthesizer is included in this chip to demonstrate the cancellation of TX SI signal reciprocal mixing with the LO phase noise in the RX signal path. The ground for each circuit blocks are isolated in the chip and shorted together on the board using a ferrite bead to avoid any unwanted noise coupling. Large low-Q bypass capacitors are designed for each circuit blocks to minimize the bond-wire effects and reduce the on-board SI coupling through supply and ground.

Next, the detailed transistor-level implementation details of each block will be described.

4.4.1 RF Canceler and RX LNA

The RF canceler is made up of a 5-tap analog FIR filter, see Fig. 4-13. Each of the tap delay line is implemented using a passive RC-CR first-order all pass filter (APF). This APF is functioned as a true time delay with a value of 65ps. The variable gain amplifier (VGA) was implemented by a 6-bit inverter-based amplifier with one additional bit to determine the signal polarity. Adding an additional bit to the VGA could potentially provide 6dB more dynamic range and better cancellation. But this will reduce the output impedance of the canceler by half, add more undesired load impedance at the LNA input contributing to an increase in the RX input insertion loss. The RF canceler was designed with a gain range from -60dB to -25dB to emulate the magnitude
response of the leakage channel. The low gain of the RF canceler helps to reduce its output current noise that was injected to the RX input. A unity-gain buffer stage was added between each tap delay to relax the loading issues for the APF. The output of each gain stage was summed in the current domain and combined with the desired signal at the RX input, see Fig. 4-13. The RX LNA uses a resistive-feedback topology for a broadband 50Ω match, see Fig. 4-13. A CMOS implementation of the LNA increases the effective $G_m$ almost two times which also helps reduce the LNA noise figure. A source follower is added to avoid the direct feedforward path from the LNA input to the output, which will effectively increase the LNA output impedance, gain and improve the NF [93].

Fig. 4-14 Transistor level implementations of 14-tap FIR-based BB canceler and the summing stage
4.4.2 Baseband canceler

The baseband canceler is made up of a 14-tap analog FIR filter with a passive down-conversion mixer, see Fig. 4-14. If the same tap delay line topology used in the RF canceler is applied to the baseband canceler design, significantly larger sizes will be required of both the resistor and capacitor to achieve the desired delay at the baseband operating frequency which is 100 times lower. Larger value resistors will introduce more noise and substantially increase the silicon area. This design implements the tap delay line used in the baseband canceler using a compact $G_m$-$C$-based all pass filter [94]. Each of the tap delay lines has a time delay of 10ns. The output of each tap will be summed in the current domain and translated into the voltage domain using a trans-impedance amplifier. The baseband canceler signal is combined with the desired RX signal using a resistor-degeneration common-source amplifier.

![Fig. 4-15 Transistor level implementations of a three-stage Class AB power amplifier](image)

Fig. 4-15 Transistor level implementations of a three-stage Class AB power amplifier
4.4.3 Power amplifier

The TX includes a three stage Class-AB power amplifier, see Fig. 4-15. The up-conversion mixer and digital-to-analog converter are not integrated on this chip. A common-source, common gate noise-cancelling topology, which is similar to what has been often used in low noise amplifier, is implemented as the first stage of the PA, to reduce the PA thermal noise floor [95]. To improve the linearity of the power amplifier, a $G_m$-linearization technique is proposed similar to [96]. The input device of the PA main stage is divided into two separate devices. The main device is biased closer to the Class-A region, while the auxiliary device is biased closer to the Class-B region to linearize the effective $G_m$ over the wide range of the input voltage. The power amplifier supply is 2.5V.
Fig. 4-16 Wideband full-duplex with dual-path SI cancellation TSMC 40nm chip micrograph
4.4.4 Other blocks

The RX is made up of a low-noise amplifier, $G_m$ stage, passive mixer and trans-impedance amplifier. The mixer is driven by a four phase 25% duty cycle LO, which was generated from a divide-by-two circuit [73]. The time delay cell in the LO path for the baseband canceler (see Fig. 4-10) uses a current starving topology, which is similar to [97] and has a simulated LSB of 0.5ps and a tuning range more than one period of the carrier. This is equivalent to 0.36° resolution and 0.5ns tuning range at 2GHz. To improve the phase noise performance of the integer-N synthesizer, an impulse sensitivity function (ISF) manipulation technique similar to [98] is used to lower the phase noise performance of the voltage controlled oscillator.
The chip was fabricated in a 40 nm 6-metal-layer TSMC CMOS process and occupies an area of 1.75 mm×2 mm which includes the bond pads (Fig. 4-16). The chip was wire-bonded directly to the test-board using chip-on-board packaging.

4.5 Measurement Results

To test the functionality of the proposed FD chip, standard RF measurements were performed to characterize the gain, noise and linearity at the RX baseband output with the cancelers enabled and disabled; the measurement setup is shown in Fig. 4-17. In the testing, The RX input and TX output were interfaced using a discrete circulator (MECA Electronics: CS-1.950), which could provide an isolation as high as 30dB. Two surface-mount singled-ended-to-differential baluns (Anaren: BD1631J50100AHF) were used to provide a balanced input for both the RX and TX. The RX baseband quadrature output signals were connected to two low-noise high input-impedance (>200kΩ, <1pF) active differential probes (Tektronix: P6246) to buffer the outputs and provide a 50Ohm matching for the next stage, which could be either a spectrum analyzer (Agilent E4440A) or a discrete 14bit 140MSPS analog-to-digital converter (Analog Devices: AD9254). The reference clock of the integer-N synthesizer uses a low-noise crystal oscillator (NEL Frequency Controls: AE-03A2DE-R/40.000MHz).

When performing closed-loop TX SI cancellation testing, a CW or modulated signal was provided to the PA input from the vector signal generator (Agilent E4438C). The analog-to-digital converter (ADC) sampled the RX baseband output signal, which was a down-converted version of the TX SI signal, average the RX baseband signal over ten cycles, and then sent it back to a FPGA board (Altera Cyclone III EP3C120 Development Board) for post-processing. The FPGA board was used to emulate a digital baseband and implemented a blind source brute force adaptation algorithm for both cancelling filters. The algorithm started to adapt a 5-tap RF canceler. After the
RF canceler converged, the adaptation algorithm was then applied to the baseband canceler. During the adaptation period, after the FPGA board finished the previous cycle of processing, all the updated filter coefficients were generated and sent them back to the chip using a scan chain.

All measurements were taken in five chips. Next, the detailed measurement results for each block will be discussed.

Fig. 4-18 Measured PA performance, (a) PA output power and efficiency with a single tone input at 1.96GHz, (b) PA EVM testing with a input of 40Mbp/s 16QAM signal and an average PA output power of 20dBm at a center frequency of 1.96GHz

4.5.1 Standard RF Measurements

The RX operates from 1.7 to 2.2GHz with a measured maximum gain of 36dB and a 4dB in-band noise figure. The measured in-band IIP3 and P-1dB of the RX is -5dBm and -15dBm, respectively. The RX total consumes 22mW from a 1.2/2.5V power supply.
The PA has a measured output $P_{-1dB}/P_{sat}$ of 25.1/26.5dBm and a maximum PAE of 32%, see Fig. 4-18(a). The PA error vector magnitude (EVM) testing is performed using a 40Mb/s 16QAM signal with +20dBm output power, a result of 5.1% EVM was measured, see Fig. 4-18(b).

The integer-N synthesizer has a measured locking range from 3.4 to 4.4GHz while consuming 10.4mW from a 1.2V supply with a phase noise of -116dBc/Hz @1MHz offset at a center frequency of 4GHz. The reference spur of the synthesizer is -55dBc.

The RF/ baseband canceler is supplied by 1.2V/1.8V and has a measured $P_{-1dB}$ and IIP3 of 27/26.5dBm and 36/34.5dBm, respectively. The linearity measurements are performed by setting the last taps of RF/baseband canceler to the nominal gain while disabling the remaining taps. The
was done because the last tap usually have the worst linearity performance in the canceler chain. If the same measurements are performed for the first tap of the RF canceler, the results of $P_{-1\text{dB}}$ and $IIP_3$ will be improved to 32dBm and 42.5dBm, respectively. More linearity measurements were performed with the SI cancellation enabled.

![Graph showing TX suppression with and without cancelers](image)

Cancellation = $-28.43\text{ dB} - (-79.28\text{ dB}) = 50.85\text{ dB}$

Fig. 4-20 Measured TX suppression using a modulated 40Mb/s 16QAM signal, RX baseband output spectrum with cancelers enabled and disabled
4.5.2 SI Cancellation Measurements

To characterize the functionality of the proposed cancelers, both CW signal and a modulated signal have been applied to TX input while the RX baseband output was measured to capture the TX SI cancellation deepness and bandwidth.

Under CW signal testing, the cancellation bandwidth was measured in two steps. First, the cancellation was maximized at a center frequency by adapting the coefficients of both cancelling filters. Then, holding the adaptive filter settings, a series of CW signals with different frequencies were applied across the band. In this measurement, a minimum 50dB cancellation was achieved within the 42MHz cancellation bandwidth, see Fig. 4-19. The TX SI cancellation measurement was then repeated using a modulated signal. A 40MHz 16QAM signal with an average output power of 15dBm was applied to the TX input while the RX output spectrum was measured with both the cancellation network enabled and disabled. In this testing, more than 50dB of SI cancellation was achieved, which is consistent with the testing results using a CW input, see Fig. 4-20.

A two-tone linearity testing was performed to further characterize the canceler performance. In this measurement, an in-band two-tone signal with equal amplitude was applied to the canceler input and the RX baseband output was measured with the cancelers enabled and disabled, see Fig. 4-21. When the canceler is on, the amplitude of third order intermodulation products generated by the canceler is closed to the fundamental tone at a PA output power of 17dBm. The non-linearity of the canceler limits the maximum operating power of the PA. The canceler linearity could be further improved by adding more attenuation at the canceler input, which will be traded with the RX noise figure, see Fig. 4-7. The canceler effective IIP3 under SI cancellation testing is +41dBm, from Fig. 4-21.
Fig. 4-21 Two-tone linearity testing for the canceler, measured fundamental tone and IM3 components of the TX SI signal at the RX baseband output with canceler enabled and disabled.

The RX NF degradation is another key metric to characterize the canceler. A large TX leakage signal will modulate in the bias circuitry noise, raise the noise floor and degrade the RX NF. The RX NF measurements were performed using a desired RX signal, which is 100kHz away from the TX leakage signal and monitoring the C/I ratio at the RX baseband output, and the results are shown in Fig. 4-22. In Fig. 4-22, the black curve is the RX baseline NF with TX turning off and canceler disabled. Turning on the TX and still both cancelling filters, the RX NF degradation is 1.55dB.
To characterize the cancellation of TX SI signal reciprocal mixing with the LO phase noise in the RX signal path, a CW signal is fed into the PA input and RX baseband output is measured with the baseband canceler and additional time delay block $T_D$ enabled and disabled, see Fig. 4-23(a). The measurement result shows a 10dB suppression, see Fig. 4-23(b).

The total chip power consumption is 49mW excluding the power amplifier, see Fig. 4-24. A comparison and performance summary is shown in Table 1.
Fig. 4-23 Measured suppression of TX SI signal reciprocal mixing with RX LO phase noise in the RX signal path, (a) measurement setup (b) measurement results at the RX baseband output with cancellation enabled and disabled
Fig. 4-24 Power breakdown for the proposed system without including the power amplifier

4.6 CONCLUSION

In this chapter, a description of a dual-injection path full-duplex wireless transceiver architecture which achieves a broad bandwidth (>42MHz) and high self-interference cancellation (>50dB) was described. The proposed front-end includes two cancelers, which are both implemented by analog adaptive filters to provide an inverse response of the leakage channel. The first coarse canceler is attached at the RX input to relax the linearity requirements of LNA and the subsequent RX blocks while the second fine canceler is summed with the leakage signal at the RX baseband output to help further reduce the TX SI signal.

Potential applications for this technique include any future wireless standards targeted for full-duplex communication and current frequency division duplex systems. This SI cancellation concept could be further expanded in biological interfaces to suppress any unwanted stimulation artifacts in the sense (recording) electronics [99], [100].
Table 4-1 Comparison and performance summary for wideband full-duplex chip

<table>
<thead>
<tr>
<th>Architecture</th>
<th>J. Zhou ISSCC'2015</th>
<th>D.J van den Broek ISSCC'2015</th>
<th>D. Yang ISSCC'2015</th>
<th>J. Zhou ISSCC'2016</th>
<th>This Work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology/VDD</td>
<td>Frequency Domain</td>
<td>Mixer-First RX+VM-lownpower</td>
<td>Mixer-First+</td>
<td>Integrated</td>
<td>Adaptive Filter + NC PA+</td>
</tr>
<tr>
<td></td>
<td>Equation</td>
<td></td>
<td>Duplexing LNA</td>
<td>Circulator+</td>
<td>LO Sideband Suppression</td>
</tr>
<tr>
<td></td>
<td>65nm/7F</td>
<td>55nm/1.2V</td>
<td>55nm/1.2V, 2.5V</td>
<td>65nm/1.3V, 2.2V</td>
<td>40nm/1.2V, 1.0V, 2.5V</td>
</tr>
<tr>
<td>RX Frequency (GHz)</td>
<td>0.5-1.4</td>
<td>0.15-3.5</td>
<td>0.1-1.5</td>
<td>0.6-0.8</td>
<td>1.7-2.2</td>
</tr>
<tr>
<td>TX-to-RX Interface Isolation (dB)</td>
<td>30-50</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>30.55</td>
</tr>
<tr>
<td>Integrated Power Amplifier</td>
<td>No</td>
<td>Yes</td>
<td>Yes</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>Integrated PLL</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>TX Maximum Suppression (dB)</td>
<td>N/A</td>
<td>27</td>
<td>33</td>
<td>N/A</td>
<td>55</td>
</tr>
<tr>
<td>Cancellation BW</td>
<td>Cancellation (dB)</td>
<td>20</td>
<td>27</td>
<td>33</td>
<td>42</td>
</tr>
<tr>
<td></td>
<td>BW (MHz)</td>
<td>16.25*</td>
<td>16.25</td>
<td>0.3</td>
<td>12*</td>
</tr>
<tr>
<td>RX NF degradation due to leakage cancellation (dB)</td>
<td>0.9-1.2/1.1-1.5*</td>
<td>4-6</td>
<td>N/A*</td>
<td>5.9*</td>
<td>1.05 (RFI=0.5BB)</td>
</tr>
<tr>
<td>Cancellation Power Consumption (mW)</td>
<td>44-91</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A*</td>
<td>30*</td>
</tr>
<tr>
<td>RF Cancellation Area (mm²)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Cancellation IIP3 (dBm)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Cancellation P-out (dBm)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>RX LO Sideband Suppression (dB)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>RX Gain</td>
<td>27-42</td>
<td>24</td>
<td>33-35</td>
<td>42</td>
<td>20-35</td>
</tr>
<tr>
<td>RX Power Consumption (mW)</td>
<td>63-85</td>
<td>23.56*</td>
<td>43.56</td>
<td>70*</td>
<td>22</td>
</tr>
<tr>
<td>Maximum TX Output Power (dBm)</td>
<td>N/A</td>
<td>&gt;10</td>
<td>N/A</td>
<td>N/A</td>
<td>25</td>
</tr>
<tr>
<td>TX PAE (%) @ Maximum Power</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>32</td>
</tr>
<tr>
<td>TX EVM (%)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>5.1</td>
</tr>
<tr>
<td>PLL Phase Noise @MHz (dBc/Hz)</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>-116</td>
</tr>
<tr>
<td>Active Area (mm²)</td>
<td>4.8</td>
<td>2</td>
<td>1.5</td>
<td>1.4</td>
<td>3.5</td>
</tr>
</tbody>
</table>

* Measurement with an antenna gain 15dB BW, 0.3-1.2dB NF deg and with two filters. * Power including 0.47mW Gen cells and 44mW LO for one filter.
* Half-duplex/Full-duplex mode shows NFs of 5.3/10.3-12.3dB. * Power including LO only. RX DSB NF is 5.3-8dB in Full-Duplex mode.
* QdB cancellation 12MHz BW measured including integrated circulator, not including 43dB digital cancellation from Matlab.
* Power including 80mW signal path and 10mW LO path at 7.9kHz.

4.7 APPENDIX

Fig. 4-25 is a simplified mathematical model of the TX SI cancellation in the baseband canceler.

In Fig. 4-25, X\textsubscript{PA}(t) is the PA output signal, X\textsubscript{1}(t) and X\textsubscript{2}(t) are the leakage and canceler signal before the summing stage. The LO signal with phase noise is modeled with \(e^{j(\omega_0 t + \theta(t))}\). In the analysis, we assume there is no amplitude mismatch between the leakage and canceler signal. From Fig. 4-25, X\textsubscript{1}(t) and X\textsubscript{2}(t) could be derived below
In order to achieve TX SI cancellation, $X_1(t)$ and $X_2(t)$ have to be equal, therefore, combining equation (4.6) and (4.7) gives

$$T_{D3} = T_{D1} + T_{D2}$$  (4.8)
$$T_D = -T_{D1}$$  (4.9)

In this design, $T_D$ is implemented with a digital time delay block with a tuning range of one carrier cycle, therefore, $T_D$ could also be modeled as a $360^\circ$ phase rotator. The negative sign in equation (4.9) could be realized by a $180^\circ$ phase shift.
5. A PRECISION WIDEBAND QUADRATURE GENERATOR

5.1 INTRODUCTION

Over the past two decades, there has been a drastic increase in the bandwidth and data rates for smart phones and notebook computers. At present, existing commercial standards in the RF bands (1-6 GHz), such as WCDMA [102], LTE [103], [93] and Wi-Fi [104], [105] provide as much as 160MHz bandwidth with data rates as high as 1000MB/s per user.

The demand for more accessible bandwidth and higher data rate is predicted to grow as the next generation communication systems, i.e. 5G is expected to support 1000 times higher data per area, and 10-100 times higher data rates per user (10-100 GB/s) with as much as 10x extension in battery life compared to existing solutions [106], [107]. The RF spectrum, in the 1-6GHz band, has become increasingly crowded with incremental improvements in spectral efficiency through higher order signal modulation schemes, e.g., 1024QAM. However, utilizing higher order modulation scheme will not achieve a factor of 10-100x improvement in data rates and capacity required by 5G communication systems since the signal-to-noise-ratio (SNR) requirement for such high order modulation methods will become prohibitively large in both the receiver (RX) and transmitter (TX). Multi-Input Multi-Output (MIMO) systems can achieve higher spectral efficiencies via spatial multiplexing at the expense of system complexity, power consumption, and cost [105], [108]. Likewise, new hardware which enables full-duplex transmission by applying self-interference cancellation methods [4], [19], [22], [31], [33], [35], [42], [43], [109] would allow a single radio to simultaneously transmit and receive using the same frequency band (full-duplex communication). However, under ideal conditions, a full duplex transceiver would only improve
the throughput by no more than a factor of two, again significantly less than the 1000x desired by 5\textsuperscript{th} generation wireless systems.

Fig. 5-1 Quadrature imbalance issues. (a) Block diagram of quadrature imbalance. (b) EVM variations with gain imbalance. (c) EVM variations with phase imbalance.

While the low frequency Gigahertz bands are completely occupied by existing standards, communication at millimeter-wave (mmWave) frequencies presents an attractive solution for the 5G standards [5], [6]. The vast spectrum available at mmWave frequencies allows wireless service providers to significantly expand the channel bandwidths (BW) far beyond 20MHz, e.g., 2.16GHz BW for 802.11ad standard [110]. Successful CMOS implementations of mmWave transceivers
achieving data rates between 1-10 Gb/s have been reported [7]–[9], [111]. Recent developments in mmWave transceivers have been focused on enabling channel-bonding [7], MIMO [112] and in-band full-duplex techniques [113], in an attempt to push data rates beyond 10 Gb/s.

At lower frequencies (<10GHz), direct-conversion (zero-IF) transceiver architectures are commonly used in both RX and TX due to the resulting simplicity of the RX/TX signals paths and compatibility with integration. However, utilizing direct-conversion architectures in the design of an mmWave transceiver requires IQ signals with a highly accurate 90° phase-shift and low amplitude mismatch to drive the mixers. Fig. 5-1 illustrates a simple mathematical model for the LO phase and amplitude mismatch and its effect on the receiver’s error-vector-magnitude (EVM). In Fig. 5-1, the receiver’s EVM performance is entirely dominated by the IQ mismatch of the LO, which is a useful metric signifying the lower boundary of the EVM. For a direct-conversion receiver, the IQ imbalance increases the receiver’s EVM and hence its bit error rate (BER) [114]. The IQ mismatch could be mitigated using digital baseband calibration [115]. However, the digital calibration requires a loop-back method from the TX to RX which complicates the radio transceiver design, especially at mmWave frequencies [115]. Furthermore, in many future applications, especially for low cost and low power systems, the modems must be kept very simple and a complicated digital calibration circuitry often proves too costly.

The realization of a highly balanced mmWave IQ generator, without requiring digital calibration, is one of the most challenging aspects of a mmWave transceiver design. This is due to an increased sensitivity of quadrature phase and gain accuracy for a given mismatch as the frequency of operation rises. Fig. 5-2 illustrates this concept of phase imbalance as a function of frequency for a given mismatch induced time delay error. A constant time delay mismatch of $\Delta T$ is dependent on the mismatch in device geometry and is relatively independent of frequency. For example, an
LO IQ mismatch of 1° at 6 GHz as a result of a delay mismatch of $\Delta T$, would increase to 10° at 60GHz, see Fig. 5-2 [116].

![Diagram of phase imbalance](image)

Fig. 5-2 Illustration of phase imbalance at low and high frequency due to a mismatch induced timing error.

This work seeks to realize a wideband mmWave quadrature generation technique which is readily employed in existing 60GHz wireless transceivers (e.g. 802.11ad) and other mm-wave applications requiring accurate I-Q phase generation. In fact, the proposed technique could also be applicable to any low frequency Gigahertz band transceiver design requiring high accuracy quadrature signals. The paper is organized as follows, Section 5.2 overviews existing, reported state-of-the-art quadrature generation techniques. A detailed description of a proposed two-stage polyphase filter (PPF) based quadrature generator with feedback control is given in Section 5.3. Section 5.4 presents design considerations for the proposed PPF. Circuit implementation details of the PFF and integrated phase measurement circuitry implemented in a 28nm CMOS process are
given in Section 5.5, while measurement results of the prototyped quadrature generator are provided in Section 5.6. Lastly, a few concluding comments are given in Section 5.7.

5.2 **State-of-the-Art: Quadrature Generation**

Ideally, to minimize the impact on EVM and bit error rate, the quadrature generator should provide I/Q signals equal in amplitude and 90° out of phase over the frequency band of interest. An ideal integrated quadrature signal generator would have the following characteristics:

1. Perform precision quadrature balance while utilizing minimal silicon area.
2. Result in minimal insertion loss.
3. Have minimal power consumption.
4. Present a large input impedance, as to relax the loading effects on the previous stage (i.e. VCO or driver amplifier).
5. Minimize the sensitivity to process, voltage and temperature (PVT) variations.

Three commonly employed methods to generate I/Q LO signals include: (1) quadrature voltage-controlled oscillators (QVCOs), (2) divide-by-two circuits with a VCO running at twice the LO frequency, and (3) a polyphase filter (PPF). Other methods, which are more often used for mmWave band applications include differential branch-line directional couplers [117], distributed microstrip shunt-stubs [118] and quadrature all-pass filters [119].

QVCOs use two cross-coupled oscillators that inherently produce a 90° phase difference between the outputs. Trade-offs exist with QVCOs between the coupling strength, I/Q phase accuracy, tuning range, and the phase noise (PN) performance. Divide-by-two circuits generate I/Q signals from a VCO running at twice the desired LO frequency; however the VCO must operate at double the carrier frequency (e.g. 120GHz for a 60GHz transceiver), which degrades the PN
performance, reduces the frequency tuning range, and increases the power consumption due to the low quality factor of the passive components at a higher operating frequency [120]. Transmission line based methods mentioned in [117]–[119] often require substantial silicon area to realize the mmWave IQ generation circuitry and are usually narrowband.

Fig. 5-3 Phase mismatch in two-stage PPFs plotted for several process and temperature corners. (a) Schematic of a two-stage PPF, (b) Simulation results of phase mismatch.
Fig. 5-4 Phase imbalance of two-stage PPFs with only one stage R, C value accurately controlled. (a) 1st stage with PVT variations, 2nd stage without PVT variations. (b) 1st stage without PVT variations, 2nd stage with PVT variations.
A PPF is able to generate precision quadrature LO signals with fractional bandwidths (fBW=bandwidth/center frequency) exceeding 30% by cascading more than two stages [64], [121], [122]. However, the insertion loss of each PPF stage limits the number of stages, hence restricting its practical use to only narrow-band systems. Furthermore, the IQ balance of a PPF is dependent on the RC product. Given that each R and C may vary by as much as 30% over process and temperature, this will cause the RC product to vary by as much as 70%. The simulated IQ phase mismatch of a typical two-stage PPF over different process corners and temperatures over frequency is shown in Fig. 5-3. In simulation, both stages are sized to operate at 60GHz using the typical-typical corner (TT) at 55°C. The phase mismatch is shown to be less than 0.5° over a bandwidth of 7GHz. However, the phase mismatch will increase if the RC product varies as a function of the PVT corners. The worst-case corners in this particular process, happen in the slow-slow corner (SS) at high temperature (125°C) or in the fast-fast corner (FF) at low temperature (-20°C). In these corners, the resistor and capacitor values move in the same direction, either increase together (SS, 125°C), or decrease together (FF, -20°C). This leads to a PVT phase mismatch from TT-to-FF and TT-to-SS of up to 10°, due to variation in the resistor and capacitor values.

The remainder of this chapter describes a generic, yet novel, I/Q generation technique which attempts to address all the aforementioned ideal characteristics using a calibrated N-stage PPF. This is followed by an expanded description of the proposed prototype mmWave quadrature generation technique first given in [66], [123].
5.3 PROPOSED PPF-BASED QUADRATURE GENERATOR

The proposed quadrature generation technique derives from the traditional N-stage implementation of a PPF. As shown in Fig. 5-3, the phase imbalance of a PPF is sensitive to PVT variations. Thus, it is important to introduce calibration circuitry which accurately controls the component values used by the PPF. The analysis of phase imbalance could be carried on an N-stage PPF [124], This paper will focus on a two-stage PPF for a couple of reasons. First, the device which was implemented and measured is a two-stage PPF. Second, the analysis leads to a more intuitive understanding of why tuning a single stage of a multi-stage PPF is sufficient rather than tuning each stage in a PPF. In short, the calibration concepts valid for a two-stage PPF are extendable to an N-stage PPF implementation. Fig. 5-4 shows the simulated phase imbalance of a two-stage PPF where one stage is precisely controlled. Compared to Fig. 5-3, the simulated phase mismatch using the two extreme process and temperature corners (SS, 125°C and FF, -20°C), shown in Fig. 5-4, is less than 1.5 degrees over a frequency range from 55GHz to 65GHz, which is sufficient to meet the EVM demanded by the 60GHz 802.11ad standard [7], [114]. In addition, as shown in Fig. 5-4, tuning the RC product of just the second stage in a two-stage PPF, while leaving the first stage un-tuned with PVT variation, produces the identical phase mismatch results as tuning just the first stage, while leaving the second stage un-tuned. The phase difference between the output I and Q signals is given in equation (5.1) derived from KCL and KVL,

$$\frac{V_Q}{V_I}(\omega) = 2 \cdot \tan^{-1} \left( \frac{\omega(R_1C_1 + R_2C_2)}{1 + R_1C_1R_2C_2\omega^2} \right)$$

(5.1)

Here, $R_1, C_1 / R_2, C_2$ are the 1st/2nd stage values of a PPF. Assuming, $x = \omega R_1 C_1, y = \omega R_2 C_2$, (5.1) simplifies to,
\[ \frac{V_Q}{V_I}(w) = 2 \cdot \tan^{-1} \frac{x + y}{1 + xy} \]  

(5.2)

From (5.2), the output phase balance has an equal dependency on the RC product of the first and second stages of a PPF. Thus, having precise control over the RC product of any one stage of an N-stage PPF will produce the same quadrature phase error.

Fig. 5-5 Proposed two-stage PPF with feedback control. (a) Schematic of proposed circuit, (b) Description of auxiliary bias resistors added for the triode-region transistors.
Fig. 5-6 Feedback circuitry for the proposed two-stage PPF.

\[ R_{eq} = \frac{V_{RBIAS}}{N \times I_{Ctrl}} \]

\[ Power = \frac{V_{DD} \cdot V_{RBIAS}}{N \times R_{eq}} \]
In the proposed IQ generation circuit (Fig. 5-5), the second stage is precisely controlled to generate accurate IQ signals taking into account the loading effects of the previous stage, which will be either an oscillator or a buffer amplifier. Compared to a traditional N-stage PPF, it is only necessary to calibrate the resistance of one stage to significantly reduce the phase error of the entire N-stage PPF. To the first order, this can be achieved by replacing the resistors with triode-region transistors. This allows for the modulation of the channel resistance to realize a variable resistor. Calibrating the capacitor values of a PPF will serve a similar function. However, at mmWave frequencies, both switched-capacitor banks and varactors show a poor quality factor (<10) and hence large insertion loss of the PPF [125]. DC blocking capacitors $C_b$ (Fig. 5-5 (a)) and additional bias resistors $R_S$ and $R_D$ (Fig. 5-5(b)) are added to set the DC operating point of the source and drain of the triode-region device, ensuring the transistor remains in triode region. In addition, a 2 kΩ poly-resistor, $R_G$, is placed in series with each gate of the NMOS transistor to ensure the device maintains a relatively constant source-to-gate voltage, and hence a constant channel resistance. Without $R_G$, the large voltage swing of the LO would appear across the gate and source/drain of the NMOS transistor, thus producing an undesired modulation of its channel resistance. A feedback network drives the gate voltage of the triode-region transistors and sets the desired channel resistance independent of variation in process, temperature and supply voltage, see Fig. 5-6. The feedback network includes an operational amplifier, N replica transistors in triode region that are identical to the NMOS transistors of the PPF, a fixed (bandgap) bias voltage, $V_{RBIAS}$, and a constant control current, $I_{Ctrl}$. With the feedback loop closed, the equivalent channel resistance of each NMOS transistor, $R_{eq}$, in the PPF is given by $V_{RBIAS}$ and $I_{Ctrl}$ as,

$$R_{eq} = \frac{V_{RBIAS}}{N \times I_{Ctrl}}$$  \hspace{1cm} (5.3)
The channel resistance, \( R_{\text{eq}} \), is insensitive to the PVT variations, since all variables in (5.3), \( V_{\text{RBIAS}}, I_{\text{Ctrl}}, \) and \( N \) are, to a first order, insensitive to PVT variations.

A detailed circuit diagram of the proposed structure is shown in Fig. 5-6, where there are \( N \) times as many replica transistors placed in series as compared to the device used in the PPF. Given the high frequency associated with mm-wave circuits, the nominal resistance value required in the PPF is as low as (\( \sim 200 \, \Omega \)), leading to large values of \( I_{\text{Ctrl}} \). Thus, to reduce the power, the resistance in the replica is made \( N \) times larger to reduce the value of \( I_{\text{Ctrl}} \), by a factor \( N \). For example, in our implementation of a two-stage PPF, \( N=3 \) reduces both \( I_{\text{Ctrl}} \) and the power, by a factor of three.

The power of this PPF architecture is described by,

\[
\text{Power consumption} = V_{\text{RBIAS}} \times I_{\text{Ctrl}} = \frac{V_{\text{DD}} \times V_{\text{RBIAS}}}{N \times R_{\text{eq}}} \quad (5.4)
\]

It is noteworthy that only one replica bias control loop is necessary in an \( N \)-stage PPF, thus the power necessary to tune the \( N \)-stage PPF would be the same for any value of \( N \). However, compared to a traditional \( N \)-stage PPF which typically uses polysilicon resistors, a triode-region transistor has more parasitic capacitance (\( C_{\text{GS}}, C_{\text{GD}}, C_{\text{DS}} \) and etc), which leads to a higher insertion loss. Thus, trade-off exists between the number of stages for calibration, quadrature imbalance over bandwidth and the insertion loss. This replica technique could be extended to each of the four triode region transistors used in a single stage, by creating four dedicated replica bias feedback loops, to better calibrate any mismatch between each of the resistors in a single-stage of the polyphase filter. However, this would be done at the expense of slightly higher power consumption and occupying more silicon area to accommodate four Opamps, in addition to requiring a longer calibration time.
5.4 PROPOSED PPF DESIGN CONSIDERATIONS

Several practical design issues for the proposed PPF are discussed in this section, which include a derivation of the PPF’s input impedance, insertion loss, the impact of parasitic capacitance, guidelines for the feedback loop opamp design, noise and a layout strategy to minimize parasitics in the PPF.

5.4.1 Input Impedance

To minimize the loading effects on the components driving the quadrature generator (either a VCO or a driver amplifier), the PPF ideally would have as large an input impedance as possible. For a given LO frequency, the PPF RC product would be a known constant. The input impedance would mainly be dominated by the R and C of the first stage [64]. This implies that picking a large resistor size helps to minimize the loading effects of the following stages.

However, several practical considerations limit the size of the resistors. First, all the resistors are in the LO signal path, thus their noise contribution will rise with an increase in the value of R. Assuming the same values of R and C are used in each stage of a multistage PPF, the voltage noise spectral density at the PPF output will be dominated by the resistors in the later stages, approximately 4kTR [64]. Therefore, an upper bound to the resistor values in the PPF will be determined by an acceptable upper level to the broadband LO phase noise floor (larger offsets from carrier). The phase noise produced by a mmWave VCO and PLL is on average worse than the equivalent frequency generation in the lower RF bands. As such, the resistor thermal noise produced by the PPF will be negligible as compared to the phase noise produced by mm-Wave PLL. However, if a similar PPF technique is applied to lower frequency RF applications, then the resistor noise produced at large offset frequencies, could be appreciable as compared to the
synthesizer phase noise. See section 5.4.5 for more thorough discussion on noise performance. Second, all the resistors integrated in silicon have a cut-off frequency, due to the parasitic capacitance to substrate, which will introduce a phase imbalance between I and Q. This implies that for a given sheet resistivity, larger size resistors create more parasitic capacitance increasing an unwanted phase shift [64], [69]. Lastly, the resistor size is further limited by parasitic considerations of the capacitor. Naturally, as the resistor is made larger, for a given frequency, the capacitor size must be lowered. However, if the cap size is too small, both parasitic capacitance and the effects of mismatch will begin to impact circuit performance [126]. Considering all the aforementioned effects, the proposed PPF nominally uses 14.8fF capacitors and 179Ω resistors. Extracted layout simulation results show the input impedance to be 150Ω in-parallel with 18fF at the LO center frequency of 60GHz.

5.4.2 Insertion loss

![Diagram of two-stage PPF](image)

Fig. 5-7 Optimal sizing of two-stage PPF considering the loading effect.
Both the PPF insertion loss and input impedance influence the required number of buffer stages and ultimately the power consumption of the overall solution including the VCO, PPF and buffers. Thus, the insertion loss of the PPF must be minimized. Fig. 5-7 shows a two-stage PPF where the load capacitance and resistance is modeled. The insertion loss can be described by

\[
IL = \frac{\sqrt{|V_i|^2 + |V_Q|^2}}{|V_{RF}|} = \frac{\sqrt{|V_5 - V_7|^2 + |V_6 - V_8|^2}}{|V_{RF}|}
\]  

(5.5)

Exploiting the symmetry of the circuit, equation (5.5) simplifies to (5.6),

\[
IL = \frac{\sqrt{|2V_5|^2 + |2V_6|^2}}{|V_{RF}|} = 2\frac{\sqrt{|V_5|^2 + |V_6|^2}}{|V_{RF}|}
\]  

(5.6)

Applying Kirchoff’s current and voltage laws to the circuit in Fig. 5-7 yields equation (5.7), where 

\[
Z_L = R_L \parallel 1/sC_L.
\]

\[
IL = \frac{\sqrt{2} \cdot Z_L \left( \frac{1}{R_1 R_2} - \frac{sC_1}{R_2} - \frac{sC_2}{R_1} - s^2 C_2 C_1 \right)}{2s(C_2 + C_1) + 2 \left( \frac{1}{R_1} + \frac{1}{R_2} \right) + Z_L \left( \frac{1}{R_2} + sC_2 \right) \left[ s(C_1 + C_2) + \frac{1}{R_2} + \frac{1}{R_1} \right] - s^2 C_2^2 - \frac{1}{R_2^2}}
\]  

(5.7)

Assuming the RC product for each stage is constant to maintain the same operating frequency, the insertion loss reduces to (5.8), where \( m = \frac{R_2}{R_1} = C_1/C_2 \).

\[
IL = \frac{2\sqrt{2} \cdot \left( \frac{1}{R_1^2} + \omega^2 C_1^2 - \frac{2 \cdot j \omega C_1}{R_1} \right)}{4(m + 1) \left( \frac{1}{R_1} + j \omega C_1 \right) Z_L} + 2 \left( 1 + \frac{1}{m} \right) \cdot \left( \frac{1}{R_1} + j \omega C_1 \right)^2 + \frac{2}{m} \left( \omega^2 C_1^2 - \frac{1}{R_1^2} \right)
\]  

(5.8)

Near the center frequency, where \( \omega \approx \frac{1}{R_1 C_1} \), (5.8) simplifies to:
\[ IL(m) \approx \frac{2\sqrt{2} \cdot \left( \frac{1}{R_1^2} + \omega^2 C_1^2 \right) - 2 \cdot j\omega C_1}{4(m + 1) \left( \frac{1}{Z_L} + j\omega C_1 \right) + 2 \left( 1 + \frac{1}{m} \right) \cdot \left( \frac{1}{R_1 + j\omega C_1} \right)^2} \]  

(5.9)

The insertion loss is minimized when,

\[ \frac{\partial IL(m)}{\partial m} = 0 \]  

(5.10)

Solving equation (5.9) using (5.10) gives,

\[ m_{opt} = \frac{1}{\sqrt{2}} \cdot \sqrt[4]{\frac{1}{R_1^2} + \omega^2 C_1^2} \]  

(5.11)

The procedure to size the components used by the implemented PPF is summarized as follows:

1. Pick the first stage \( R_1 \) and \( C_1 \) values for the desired LO frequency, input impedance and phase noise floor.
2. Simulate the extracted load impedance as seen by the PPF output.
3. Pick the optimal \( m \) from equation (5.11) and size the second stage of the PPF according to \( R_2 = m_{opt} \times R_1 \) and \( C_2 = C_1 / m_{opt} \).

Fig. 5-8 shows simulation results of normalized insertion loss as a function of frequency and \( m \). The first and second stage have R and C values of 179 Ohm and 14.8fF and 179*\( m \) Ohm and 14.8/\( m \) fF, respectively, where \( m \) is a scaling factor. From the simulation, the optimal insertion loss happens when \( m \) is close to 0.75, which matches with equation (5.11) and results in a 0.3dB insertion loss reduction as compared to \( m = 1 \). As the load varies, the triode-region resistors can
be tuned to minimize the insertion loss. Next, the impact of parasitic capacitance associated with
the triode-region device is explored in the context of phase accuracy and insertion loss.

5.4.3 Parasitic capacitance

The proposed technique uses triode-region transistors to realize the second-stage resistors,
which will introduce unwanted parasitic capacitance and further affect the amplitude/phase
mismatch and insertion loss.

Fig. 5-8 Simulation results of normalized insertion loss as a function of frequency with different
values of m.
Fig. 5-9 Triode region transistor model with parasitic capacitance. (a) Cross section of a MOS transistor. (b) Simplified schematic model of a MOS transistor.
A cross-section of a triode region transistor model [63] is shown in Fig. 5-9(a). This model includes capacitance from the gate-to-source ($C_{GS}$), drain-to-gate ($C_{GD}$), source/drain-to-bulk ($C_{SB}/C_{DB}$) and the substrate resistance, $R_{sub}$. This model simplifies all the drain and source capacitance to ground as one lumped capacitance at the drain and source, named $C_D$ and $C_S$, respectively, Fig. 5-9(b). A non-negligible source-to-drain capacitance must also be taken into consideration for several reasons. First, a 2 kΩ poly-resistor is placed in series with the gate of each triode-region transistor which ensures the gate impedance is high even at millimeter wave frequencies, see Fig. 5-5. $C_{GS}$ and $C_{GD}$ are in series with the gate resistor, thus forming a series C-R voltage divider from the perspective of the drain-to-gate and source-to-gate. Secondly, in a deep submicron technology, the source and drain are routed with metals which run close to each other, thus the metal-to-metal sidewall capacitance between the source and drain fingers introduces a non-negligible parasitic capacitance, represented as $C_{DS}$, which is included in a lumped model, see Fig. 5-9 (b). Although this parasitic capacitance is small, usually less than 0.5fF for a minimum length and width transistor, it will introduce an amplitude mismatch between the I and Q signals, as well as increasing the insertion loss. The impact of $C_D$, $C_S$ and $C_{DS}$ on the insertion loss and amplitude/phase mismatch will now be explored.
Any additional $C_D$ and $C_S$ will have minimal impact on the amplitude/phase mismatch at the PPF output, as this is equivalent to equal loading among at each PPF output port. However, $C_D$ and $C_S$ will function as a current divider at the output of each PPF stage which has the effect of increasing the insertion loss. Generally, if $C_S \ll C_1 + 2C_L$, $C_D \ll C_1 + C_2$, where $C_1$ and $C_2$ are the first and second stage capacitance of the PPF, respectively, while $C_L$ is the load capacitance, the insertion loss is minimal. Fig. 5-10 shows the simulation results of normalized insertion loss of a two-stage PPF as a function of frequency with different values of $C_D$ and $C_S$. This simulation is taken with both stages of the PPF using ideal resistors and capacitors modeled with additional parasitic capacitance. From the simulation, if parasitic capacitance $C_S$ and $C_D$ are less than 0.7fF, the insertion loss will be less than 0.155dB. In the actual design, the triode region transistors are
sized at minimum length (2.4um/28nm) to reduce parasitic capacitance $C_S$ and $C_D$. $C_S$ and $C_D$ are further reduced by placing the devices inside a deep N well (DNW) which creates less than 0.5fF of parasitic capacitance.

![Simulation results](image)

Fig. 5-11 Simulation results of amplitude/phase mismatch, normalized insertion loss of two-stage PPF as a function of frequency with different values of $C_{DS}$. (a) amplitude mismatch, (b) phase mismatch, (c) normalized insertion loss.

The indirect coupling from the drain-to-source produces an effective capacitance, $C_{DS}$, which introduces an amplitude mismatch by providing a coupling path from the output of first stage to the second stage. The simulation results of amplitude/phase mismatch, and normalized insertion loss of a two-stage PPF as a function of frequency with different values of $C_{DS}$ are shown in Fig.
5-11. From simulation, \( C_S, C_D \) and \( C_{DS} \) will have negligible impact on amplitude/phase mismatch and insertion loss, if the capacitance is less than 0.5\( \text{fF} \).

As mentioned earlier, minimum size triode-region transistors are desired to reduce unwanted parasitic capacitance. Thus, the performance of this PPF calibration technique will improve with future technology scaling, as unwanted parasitic capacitance scales down.

5.4.4 Opamp Design

The proposed PPF tuning circuit includes a feedback loop to modulate the channel resistance of the replica devices shown in Fig. 5-6.

The loop transfer function is made up of the opamp input-to-output response and gain from the \( V_G \) (opamp output) to \( V_P \) (opamp positive input). The small signal gain from \( V_G \) to \( V_P \) will be analyzed below.

Assuming all the stacked transistors are operated in triode region and equal sized, \( V_P \) can be expressed as,

\[
V_P = \sum_{i=1}^{N} V_{DS_i} = \frac{I_{ctrl}}{\mu_n C_{ox}} \sum_{i=1}^{N} \frac{1}{V_{GS_i} - V_{th}} \approx \frac{I_{ctrl}}{\mu_n C_{ox}} \frac{W}{L} 
\]  

(5.12)

Using equation (5.12), the small-signal gain between \( V_G \) and \( V_P \) can be expressed as,

\[
\frac{\partial V_P}{\partial V_G} = \frac{N I_{ctrl}}{\mu_n C_{ox}} \frac{W}{L} \times \frac{-1}{(V_G - V_{th} - \frac{(N - 1) \cdot V_{RBIAS}}{2N})^2}
\]

(5.13)

Assuming each cascode device has the same \( V_{ds} \), this gives,
Fig. 5-12 The schematic of the opamp and the settling time of the proposed feedback loop. (a) Opamp schematic with a folded-cascode topology, (b) the proposed feedback loop settling time.
\[ I_{ctrl} = \mu n C_{ox} \frac{W}{L} (V_G - V_{th}) \frac{V_{RBIAS}}{N} \]  

(5.14)

Combining equation (5.13) and (5.14), yields,

\[ A_{PG} = \frac{\partial V_P}{\partial V_G} = \frac{V_{RBIAS}}{V_G - V_{th}} \]  

(5.15)

Equation (5.15) further reveals the loop stability is independent of the number of cascoded transistors. In this design, \( V_{RBIAS} \) is nominally a fixed voltage of 100 mV. However, \( V_G \) varies from 0.7V to 1.05V, to achieve a desired resistor tuning range from 120 \( \Omega \) to 520 \( \Omega \). Lower \( V_{RBIAS} \) could help to reduce the power consumption of the PPF, see equation (5.4), but the calibration would become more sensitive to the opamp input offset voltage. Also, the PPF power consumption is lower bounded by the opamp.

A folded cascode opamp with an open loop gain of 80dB is designed for the feedback loop, Fig. 5-12 (a). From simulation, when the feedback loop is enabled, it takes 1.5\( \mu \)sec to settle the control voltage, Fig. 5-12 (b). The settling behavior of the feedback loop deviates from the traditional characteristics of an OTA in feedback. To minimize the power consumption, the current, \( I_P \), in the PMOS active load is smaller than the tail current, \( I_{SS} \). Thus, while \( V_G \) (opamp output) is charging up the load capacitance, \( M_7 \) turns off and the voltage at node A falls to a level that pushes \( M_1 \) and the tail current source (shown as ideal), into the triode region, Fig. 5-12 (a). This slows down the opamp and creates unusual “kinks” in the settling behavior. The slew rate of the opamp can be improved by clamping node A to \( V_{DD} \), using transistors \( M_{11} \) and \( M_{12} \) \([127] \), Fig. 5-12 (b). This produces a settling response shown in blue, Fig. 5-12 (b).
5.4.5 Noise

Compared to the traditional two-stage PPF, with passive polysilicon resistors R and metal-oxide-metal (MOM) capacitor C, the proposed structure replaces polysilicon resistors with active devices, which in turn may degrade the LO phase noise.

Fig. 5-13(a) shows the simulated phase noise plots at the output of traditional/proposed PPF using a noiseless LO as the input. As expected, the phase noise of the traditional PPF exhibits a flat spectrum due to the thermal noise contribution of the polysilicon resistors. The proposed PPF structure shows higher phase noise contribution, which is mainly attributed to the active devices in the PPF and feedback circuitry. However, an actual PLL output has a phase noise skirt which is significantly higher than the added phase noise from the proposed PPF structure. Fig. 5-13(b) shows a phase noise simulation result with an actual LO signal as the input of PPF. This simulation models the phase noise of the VCO and synthesizer by modeling the LO input signal to the PPF with a phase noise profile with data from a state-of-the-art 60GHz frequency synthesizer, given in [128]. The phase noise simulations are performed with a 1V peak-to-peak swing applied to the PPF input. As such, the noise from opamp, control current/voltage and replica transistors have been taken into account with these simulation results. The simulation shows almost the same input/output phase noise performance with both a traditional and the proposed PPF structure, see Fig. 5-13(b). Thus, the noise added by the PPF circuitry is negligible.
Fig. 5-13 Phase noise simulation of the proposed quadrature generator versus traditional two-stage PPF with an ideal 60GHz LO and realistic 60GHz LO input. (a) Simulation taken with noiseless LO, (b) simulation taken with actual LO.

5.4.6 Layout Techniques

An L-compensated approach, which is identical to [65], is employed to reduce the layout-related asymmetry and further improve the quadrature balance.
Two methods are commonly used to measure and characterize the quadrature imbalance. The first approach measures the IQ phase imbalance directly at the carrier frequency (mmWave band) using high-frequency probes. However, the measurement accuracy becomes limited by the phase mismatch introduced by the probes, the cables, the differential balance of the input signal and the finite Short-Open-Load-Termination calibration accuracy at mmWave frequencies. The second approach attempts to perform the I-Q accuracy at a lower frequency by first down-converting the LOs where an accurate IQ imbalance measurement can be done. However, the down-converter design mandates the use of large device sizes in the amplifiers, mixers and buffers to minimize any mismatch introduced by the test circuitry. The method used in this device, down-converts the LO before sending the signals off-chip where more accurate measurements are easier to obtain.
Two down-conversion mixers with linear buffers were used to measure the quadrature phase and gain accuracy, see Fig. 5-14. A single-to-differential (STD) power-splitter is used to generate two differential 55GHz-to-70GHz signals. An STD balun (XF1, see Fig. 5-14) generates differential signals for the input of proposed quadrature generator (QG). To improve the differential balance of the input LO signals, two sets of buffers (BUF_A and BUF_B) are added between the STD and proposed QG. Another set of linear buffers (BUF_C_1 and BUF_C_2) are added after the QG to improve the voltage swing before driving the passive mixer (Mixer_1 and Mixer_2). Baseband linear amplifiers are connected after the mixers and drive a 50 Ohm port impedance on board with a simulated -3dB bandwidth of 10MHz-500MHz. Monte Carlo mismatch simulation shows the worst case IQ amplitude/phase imbalance introduced by the buffers, amplifiers and mixers is 0.1dB/0.2°, respectively, which is minimal compared to what is introduced by the QG. In addition, any unexpected phase imbalance introduced by the test circuitry may be compensated for by the proposed PPF tuning circuit.

5.6 Measurement Results

This chip was fabricated in a 28nm CMOS process with 1 UTM layer, 7-metal stack and occupies 0.936mm x 1.013mm including the bond pads. The core two-stage PPF with the feedback control is compact and occupies an area of less than 20µm x 40µm. The die is assembled with the test board using chip-on-board packaging. A die photo is shown in Fig. 5-15.
To test the validity of the proposed quadrature generator design, the chip was characterized using Cascade 12000AP Summit on-wafer probe station. Most measurements were performed using an Agilent N5247A PNA-X network analyzer and Agilent 25GHz bandwidth DSA-X 92504A signal analyzer.
Fig. 5-16. IQ generation measurement setup in the lab

The measurement setup is shown in Fig. 5-16. In the measurements, 55-70GHz RF/LO signals were provided by an Agilent network analyzer N5247A and performed using on-wafer probing. All of the lower frequency signals, including the baseband output and DC supplies, were routed to the chip, using a chip-on-board packaging strategy. The entire testboard was mounted on a custom chuck in the probe station, to allow probing. The Agilent 25GHz bandwidth DSA-X 92504A signal analyzer was connected to the baseband output and measured the IQ imbalance in both the time and frequency domains. A laptop with an Aardvark I2C/SPI host adapter (Total Phase Inc.) provided digital control. Measurements were taken with five boards with three of the five assembled with TT die, while the other two were mounted with SS die. FF die is not available for testing.
Fig. 5-17 Measured amplitude/phase mismatch of the proposed tunable IQ generator vs frequency for several values of $I_{ctrl}$. (a) amplitude mismatch, (b) phase mismatch.
Using the TT die, both the phase and gain error were measured and plotted versus frequency using several control currents ($I_{\text{ctrl}}$) values from 40µA to 90µA which ultimately modulates the triode region resistors in the PPF; see results in Fig. 5-17. A worst-case measured phase/amplitude imbalance of 2°/0.32dB (TT dies) and 2.2°/0.55dB (SS dies) is reported over 7GHz bandwidth for a fixed value of $I_{\text{ctrl}}$, see Fig. 5-19. When $I_{\text{ctrl}}$ is retuned every 7GHz, this quadrature generate would maintain the measured quadrature error from 55-to-70 GHz, Fig. 5-17. As mentioned earlier, a bandgap reference supplies the tuning current, $I_{\text{ctrl}}$, with sufficient range to maintain the phase imbalance to less than 2° over 7GHz bandwidth which is compliant with the 802.11ad standard. It
is worthy to mention that a fixed resistance given by $V_{RBIAS}/I_{Ctrl}$ may not perfectly match the optimum resistance that gives a minimum quadrature phase error, over all PVT, see Fig. 5-17. The additional parasitic capacitance (see section 5.4.3) associated with the triode region transistor will shift the center frequency of the PPF. In the actual system, a look-up table will tell which control current should be used to cover the desired bandwidth.

Fig. 5-19 Measured worst-case amplitude/phase mismatch for several available chips. (a) amplitude mismatch, (b) phase mismatch.
A screen capture from an oscilloscope is shown in Fig. 5-18 for a typical IQ signal supplied by the IQ generator. This measurement was taken with a TT die at $f_{RF}=65\text{GHz}$, $f_{LO}=64.9\text{GHz}$ and the control voltage/current are $100\text{mV}/90\mu\text{A}$, respectively.

The proposed quadrature generator consumes less than $192\mu\text{W}$, of which $120\mu\text{A}$ comes from a $1\text{V}$ supply for the control current of the feedback circuitry and $40\mu\text{A}$ from a $1.8\text{V}$ supply for the opamp. To drive the proposed quadrature generator, which has a simulated input impedance of $150\Omega$ in parallel with $18\text{fF}$, a $6\text{mA}$ LO buffer is designed from a $1\text{V}$ supply.

Several sets of measurements were taken to explore the variation in the IQ imbalance using all five available chips. Fig. 5-19 shows the measured worst-case amplitude/phase mismatch between five different prototype devices. Both TT and SS die give similar phase mismatch results, however the SS chips exhibit a $0.2\text{dB}$ greater amplitude mismatch. The increased amplitude mismatch could be the result of a mismatch between the I and Q channels at baseband, i.e. baseband amps in Fig. 5-14. This process dependent offset could be mitigated with the use of larger transistor sizes for the buffers and amplifiers in the measurement signal path. In the actual system, the amplitude imbalance in the LO signal path has less impact on the RX EVM performance as compared to the phase imbalance. The switching activity in the mixer, especially passive mixer, reduces the impact of the LO I/Q amplitude mismatch. A detailed circuit performance comparison with other state-of-the art IQ generators is given in Table 5-1.
Table 5-1 Comparison table for mmWave quadrature generator

<table>
<thead>
<tr>
<th>Architecture</th>
<th>B. A. Floyd JSSC '05</th>
<th>C. Marcu JSSC '09</th>
<th>S. Y. Kim TMTT '12</th>
<th>This Work</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Branch Line Coupler</td>
<td>Distributed Microstrip Shunt-Stubs</td>
<td>Quadrature All-Pass Filter</td>
<td>Two-Stage PPFs with Triode-Region Transistor and FB Control</td>
</tr>
<tr>
<td>Frequency (GHz)</td>
<td>57-64</td>
<td>55-65</td>
<td>55-78.5</td>
<td>55-70(^c)</td>
</tr>
<tr>
<td>Phase Imbalance (°)</td>
<td>&lt;15</td>
<td>&lt;5</td>
<td>&lt;9.5</td>
<td>&lt;2(^f)</td>
</tr>
<tr>
<td>Amp. Imbalance (dB)</td>
<td>&lt;1</td>
<td>&lt;1.5</td>
<td>&lt;0.5</td>
<td>&lt;0.32(^f)</td>
</tr>
<tr>
<td>Insertion Loss (dB)</td>
<td>-x-</td>
<td>2.5/4</td>
<td>&gt;3dB(^e)</td>
<td>3.5dB(^b)</td>
</tr>
<tr>
<td>Input Impedance</td>
<td>-x-</td>
<td>-x-</td>
<td>40 Ohm // -x-</td>
<td>150 Ohm // 18(\Omega)(^a)</td>
</tr>
<tr>
<td>Area ((\mu m^2))</td>
<td>-x-</td>
<td>-x-</td>
<td>-x-</td>
<td>155 Ohm // 17(\Omega)(^a)</td>
</tr>
<tr>
<td>Power Consumption</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>127-162(\mu W)(^e)</td>
</tr>
<tr>
<td>Process</td>
<td>0.12(\mu m) SiGe</td>
<td>90nm CMOS</td>
<td>0.13(\mu m) SiGe</td>
<td>132-192(\mu W)(^e)</td>
</tr>
</tbody>
</table>

\(^a\) Extracted simulation results. \(^b\) Extracted simulation results without real loading from the following buffer stage. \(^c\) Calculated from the ideal amplitude response of QAF with \(\frac{R_2}{R_1} = 1\). \(^d\) Area doesn’t include the operational amplifier. \(^e\) Power consumption includes the opamp, which has 40\(\mu A\) from a 1.8V supply. The power doesn’t include a 6mW LO buffer to drive the proposed quadrature generator. \(^f\) The reported quadrature mismatch was obtained by fixing \(I_{ctrl}\) every 7GHz.

### 5.7 CONCLUSION

A feedback-controlled PPF based method is proposed for an accurate wideband high frequency IQ generation. A prototype chip is designed with a measured IQ imbalance of less than 2°/0.32dB (TT dies) and 2.2°/0.55dB (SS dies) over a 7GHz bandwidth centered from 55GHz to 70GHz.

Potential applications for this technique includes 60GHz transceivers, millimeter wave wideband systems and any radios requiring a wideband low-power highly-accurate quadrature generation function.
6. CONCLUSIONS AND SCOPE FOR FUTURE WORK AND APPLICATIONS

Over the past few years, the saturated RF spectrum (100MHz – 5GHz) combined with an acceleration in consumer trends toward more wireless network capacity and higher data rates, has motivated researchers to explore new ways to improve the existing wireless communication systems. This dissertation proposes hardware front-end solutions which aims to increase data rates for each single user using two techniques: a. In-band full-duplex radio with self-interference cancellation; b. High-speed communication using large bandwidth available at mmWave frequencies.

6.1 THESIS SUMMARY

6.1.1 Full-Duplex Radio

Two full-duplex ICs have been designed, taped out and measured to demonstrate the validity of the self-interference cancellation architecture.

In the first prototype chip (for details, please refers to Chapter 3), a polyphase-filter-based canceler has been used and tested with more than 30dB self-interference cancellation over 4MHz bandwidth, which is sufficient for narrowband applications like Bluetooth. The proposed canceler only introduces a RX noise figure degradation of 0.6dB and a silicon area of $131 \times 112.5 \ \mu m^2$. Two major drawbacks of the first prototype full-duplex chip include: 1) the cancellation bandwidth is relatively narrow; 2) the linearity of the proposed canceller is poor, $P_{1dB} / IIP_3$ of the canceler is 12/15 dBm, which is incompatible with transceivers requiring a high-TX output power.
The second prototype chip (for details, please refer to Chapter 4) attempts to solve the two limitations of the first chip and demonstrates a wideband self-interference cancellation with a high-power output transmitter. The second prototype chip utilizes two banks of adaptive filters, one is at RF and the other is at baseband, for self-interference cancellation. The prototype chip has a measured self-interference cancellation greater than 50dB over 42MHz bandwidth. The linearity of the canceller has been significantly improved. The measured $P_{1dB}$ / $IIP_3$ of the canceller is 27/36 dBm, respectively. An Altera Cyclone III EP3C120 Development Board that emulates a digital BB that would otherwise implement a blind-source-adaptation algorithm for both cancelling filters. Overall, the second prototype chip successfully demonstrates a high-power output closed-loop full-duplex system for next generation wireless standards.

6.1.2 Communication at mmWave Frequencies

Although the low frequency RF bands are saturated with applications, the large amount of available spectrum at mmWave frequencies opens new opportunities for high-speed point-to-point communication. However, using a much higher carrier frequency for wireless communication presents many hardware design challenges such as realizing a wideband mm-Wave high-accuracy quadrature generator. One mmWave IC has been fabricated and demonstrated with a wideband highly accurate quadrature generation. A worst-case measured phase and amplitude imbalance of 2° and 0.32dB across a frequency range of 55-70GHz is reported.

6.2 Future Directions

With all the building blocks for the full-duplex radio and mmWave transceivers starting to become more mature, the next step for the researchers and industrialist is to come up with a single-chip hardware solution to integrate all the building blocks into one die to significantly reduce the
area and cost. Besides that, extending the techniques developed in the area of wireless communication to biomedical applications also seems promising. From a circuit and system engineering point of view, future research will need to focus in two different areas: 1) Single-chip full-radio with an integrated circulator and ADC that demonstrates more than 130dB self-interference cancellation; 2) Extending the full-duplex radio into biomedical applications, simultaneous stimulation and sensing for a brain-computer interface system.

Fig. 6-1 Single chip full-duplex radio with circulator
6.2.1 Single-Chip Full-Duplex Radio with Integrated Circulator

Full-duplex communication draws a tremendous interests in the past five years and lots of research work [19], [31], [33], [36], [42], [51] focused on designing low power, low noise and high linearity self-interference cancellation circuitry for full-duplex radios. However, the aforementioned full-duplex systems rely on an off-chip circulator to provide 20-30dB transmitter-to-receiver isolation. Recently, there is a strong interest of designing a CMOS integrated circulator [40], [41] on the silicon. Circulator is a three-port non-reciprocal network, which indicates it could not be implemented by fully passive components without insertion loss. Active devices have to be used when implementing the circulator, which places an extremely challenging linearity problems for the circulator since one port of circulator is directly connected to the output of the power amplifier. While the feasibility of integrating the circulator into a silicon substrate has been demonstrated in [40], [41], meeting the stringent linearity requirement of the circulator remains a primary barrier to practical circulator implementations. Trade-off between the linearity, noise, insertion loss, power consumption and silicon area of the circulator still needs to be investigated.

The goal of the full-duplex project will be eventually integrating a circulator with the existing state-of-the-art self-interference cancellation circuitry to develop a fully-electronic solution for future in-band full-duplex radio communication, see Fig. 6-1.

If successful, the complete full-duplex radio could be implemented on a single chip in a few years for less than a dollar, thus easily deployable to a variety of devices including smart phones, notebook computers and etc.
6.2.2 Simultaneous Stimulation and Sensing for BCI System.

Fig. 6-2 Simultaneous stimulation and sensing in brain computer interface (BCI) system with artifact cancellation.

The techniques developed in the proposed full-duplex radios are already being extended to a similar technical challenge found with electronics used for neurological interfaces where clinicians struggle with simultaneously acquiring neural recordings, while another site is stimulated [129], [130]. Similar to the TX self-interference cancellation in a wireless system, the stimulator inherently interferes with the recording interface. Our lab is already trying to build up a neural interface through the Center for Sensorimotor Neural Engineering (CSNE) which will adaptively cancel “stimulation artifacts” found in neural interfaces.


