# Radiation Hardening by Design of Asynchronous Logic for Hostile Environments

David J. Barnhart, *Member IEEE*, Tanya Vladimirova, *Member IEEE*, Martin N. Sweeting, *Senior Member IEEE* 

> Surrey Space Centre University of Surrey Guildford Surrey GU2 7XH United Kingdom +44 (0) 1483 686025 david.barnhart@ieee.org

Kenneth S. Stevens, *Senior Member IEEE* Department of Electrical and Computer Engineering University of Utah Salt Lake City UT USA +1 801-585-9176 kstevens@ece.utah.edu

*Abstract*—A wide range of emerging applications is driving the development of wireless sensor node technology towards a monolithic system-on-a-chip implementation. Of particular interest are hostile environment scenarios where radiation and thermal extremes exist. Radiation hardening by design has been recognized for over a decade as an alternative open-source circuit design approach to mitigate a spectrum of radiation effects, but has significant power and area penalties. Similarly, asynchronous logic design offers potential power savings and performance improvements, with a tradeoff in design complexity and a lesser area penalty. These side effects have prevented wider acceptance of both design approaches. A case study supporting the development of monolithic system-on-a-chip wireless sensor nodes is presented. Synchronous, hardened, and asynchronous/hardened implementations of a textbook microprocessor in 0.35µ austriamicrosystems SiGe BiCMOS technology are compared. The synergy of this novel asynchronous/hardened design approach is confirmed by simulation and hardware results.

Keywords: radiation hardening by design, asynchronous logic, system-on-a-chip, environmental tolerance

#### I. INTRODUCTION

A new dimension of wireless sensor network architecture design is emerging where hundreds to thousands of ultra-light (<10 g) low-cost sensor nodes are required to collectively perform a spectrum of distributed sensing missions in hostile environments, including those encountered in space. Research is underway to investigate the feasibility of fabricating survivable self-powered system-on-a-chip (SoC) wireless sensor nodes monolithically with commercially available silicon-germanium bipolar complementary metal-on-silicon (SiGe BiCMOS) technology [1]. Of particular interest are hostile environment scenarios with radiation and thermal extremes [2].

Pairing radiation hardening by design (RHBD) and asynchronous logic has emerged as a potential solution to improve system tolerance of radiation, process variations, voltage fluctuations, and temperature (PVT) extremes. This paper presents a case study of these concepts by comparing synchronous, hardened, and asynchronous/hardened implementations of a textbook microprocessor. Section II and III briefly introduce the concepts of RHBD and asynchronous logic used in this work, respectively. Section IV discusses jointly leveraging RHBD and asynchronous design concepts and presents the comparative results. This work supports all bare-die SoC applications, including satellite-on-a-chip [2].

## II. RADIATION HARDENING BY DESIGN

Synergistically combining RHBD with asynchronous logic design improves the tolerance to radiation and semiconductor PVT extremes. Additionally, the power penalty of RHBD can be dramatically reduced by the application of asynchronous design techniques.

#### A. Motivation

Extreme radiation conditions are usually experienced in nuclear power plants, some industrial process plants, and in space. Surprisingly, in the early days of IC development, alpha particles from impurities in plastic packaging caused mysterious anomalies in terrestrial systems. Neutrons

occasionally cause errors in airplane avionics systems flying at normal cruising altitudes [3]. Space and various nuclear environments are more challenging, where the total ionizing dose (TID) of radiation causes gradual system degradation, resulting in an increase in power consumption. In addition, high-energy particles, such as electrons, protons, and heavy ions/galactic cosmic rays (GCRs), can cause a range of single-event effects (SEE), predominantly single-event upset (SEU), single-event latchup (SEL), and single-event transient (SET). Unnatural effects, such as enhanced dose rate, prompt neutron dose, and system electromagnetic pulse (EMP) can also be a concern [3].

Mitigating these effects has historically been accomplished with a system-level approach. Heavy shielding of various types can be used to reduce TID and system EMP, but is ineffective against SEE. SEE are tolerated and detected, typically through triple (or more) modular redundancy (TMR) or other voting schemes [3]. Additionally, hardening can be achieved at the IC level via specialized processes used in a hardened foundry. Hardened foundries typically employ epitaxial or insulating substrates to reduce SEE and carefully control oxide growth and chemistry to improve TID hardness. These approaches can be quite expensive, are frequently export controlled, and are typically several generations behind their commercial counterparts. One open source radiation-hardening solution at the IC level is the application of RHBD [4], which can be used on any generation process including the most recent. The guiding principle behind RHBD is to mitigate as many of the radiation effects as possible by using unconventional layout techniques at the transistor device and circuit level.

### B. RHBD Library Design

The first step in this work is the design of a new RHBD digital cell library for the austriamicrosystems 0.35  $\mu$ m SiGe BiCMOS (AMS S35) process (HITKIT 3.70) in the Cadence

DFII framework (2006-2007 5.1.41). The creation of this library is essential to this work, as RHBD libraries are not freely available, as they are regarded as intellectual property and are usually foundry process dependent. RHBD libraries generally use a sea of gates or gate array approach with a base transistor pair. The base transistor pair developed in this work is shown in Fig 1. TID effects are minimized by the use of annular geometry nMOS transistors. This geometry minimizes the threshold voltage shift preventing the buildup of trapped charge near the active region and eliminates edge leakage. The transistors are surrounded with highly doped guard rings, which prevent leakage through the field oxide separating the transistors, due to meeting minimum design rules for the annular nMOS then balancing with pMOS, increases the SEU threshold and reduces SET by increasing the critical charge. The drawback of the gate array approach is the increased area while the annular nMOS and matched pMOS directly contribute to the increased power requirements.

## Figure 1. RHBD layout of core transistor pair

The actual layout and geometry of the transistor pair is governed by minimum process design rules. The height and width of the base pair is governed by compatibility with place and route tools. One typical complication of RHBD libraries is that the transistor parameter extraction tools, including Cadence Assura, do not properly determine the annular transistor parameters [5]. Specifically, they cannot accurately calculate the transistor length, width, source area, source perimeter, drain area, and drain perimeter. These must be calculated by manually measuring the design then corrected in the extracted netlist.

As CMOS technologies mature, the minimum feature size (currently at 45 nm) continues to shrink. Recently, annular transistors have received new attention as a technique to improve circuit

reliability for mission-critical systems. Furthermore, the work in [6] demonstrates through experimentation and test that by choosing the interior contact of the nMOS transistor as the source, the reliability is further enhanced.

Numerous RHBD efforts have demonstrated considerable radiation hardness. As long as the basic approach is followed, the hardness of the developed library should be comparable to similar libraries. For example, a recent design and test campaign in 0.25 µm CMOS achieved the following typical results, which are suitable for most bare die SoC applications in hostile environments [7]:

- TID > 1 MRad (Si)
- SEL > 110 MeV-cm<sup>2</sup>/mg @ 125 °C (SEL immune) SEU <  $1x10^{-12}$  errors/bit-day @ 2.25V

A complete list of cells required to complete all designs are listed in Table I and II. The simplest cell in the library is the inverter (INV0) with the most complex being the D-flip flop with active low preset (DFP1). Metal 2 is the highest metal layer used in any cell, with most cells being routed with only Metal 1. Library characterization, through tools such as Signal Storm is intentionally not accomplished, as RHBD libraries are ideally suited as a one-to-one replacement of standard commercial cells. The justification is that RHBD cells inherently have higher drive strengths as discussed, which improves SEU and SET hardness. The various optimization stages will incorrectly decrease drive strength with a matching timing library, thereby lowering the SEU hardness. For example, high drive strength inverters and buffers used to ensure proper timing of distributed signals would have their drive strength reduced during an optimization stage, as the optimizer would see that the higher RHBD cell drive strength. Using the commercial timing library with RHBD cells prevents this problem. While hardware description language (HDL) simulations are not ideal in this situation, extracted layout HDL simulations confirm proper

timing and performance before fabrication. An overview of the library development process is presented in Table V of the appendix.

| Cell       | Description                     | Standard Library | RHBD Size | %        |
|------------|---------------------------------|------------------|-----------|----------|
|            | •                               | Size (µm×µm)     | (µm×µm)   | Increase |
| AOI210     | 2-Input AND into 2-Input NOR    | 5.6×13           | 16.8×13   | 200      |
| AOI220     | 2x2-Input AND into 2-Input NOR  | 7×13             | 22.4×13   | 220      |
| AOI310     | 3-Input AND into 2-Input NOR    | 7×13             | 22.4×13   | 220      |
| BUF2       | Buffer                          | 4.2×13           | 11.2×13   | 160      |
| DF1        | D Flip Flop                     | 21×13            | 67.2×13   | 220      |
| DFC1       | D Flip Flop w/active low clear  | 23.8×13          | 78.4×13   | 230      |
| DFP1       | D Flip Flop w/active low preset | 23.8×13          | 78.4×13   | 230      |
| INV0       | Inverter                        | 2.8×13           | 5.6×13    | 100      |
| MUX21      | 2:1 Multiplexor                 | 8.4×13           | 33.6×13   | 300      |
| NAND20     | 2-Input NAND                    | 4.2×13           | 11.2×13   | 160      |
| NAND30     | 3-Input NAND                    | 5.6×13           | 16.8×13   | 200      |
| NAND40     | 4-Input NAND                    | 7×13             | 22.4×13   | 220      |
| NOR20      | 2-Input NOR                     | 4.2×13           | 11.2×13   | 160      |
| NOR30      | 3-Input NOR                     | 5.6×13           | 16.8×13   | 200      |
| NOR40      | 4-Input NOR                     | 7×13             | 22.4×13   | 220      |
| OAI210     | 2-Input OR into 2-Input NAND    | 5.6×13           | 16.8×13   | 200      |
| XOR20      | 2-input XOR                     | 9.8×13           | 28×13     | 186      |
| TIE0/1     | Tie lo and hi logic             | 2.8×13           | 5.6×13    | 100      |
| Fill cells | Fill cells for SOC Encounter    | Various          | Various   | -        |

TABLE I. RADIATION HARDENED LIBRARY CORE CELLS

TABLE II. RADIATION HARDENED LIBRARY INPUT/OUTPUT CELLS

| Cell  | Description             | Standard Size (µm) | RHBD Size (µm) |
|-------|-------------------------|--------------------|----------------|
| BBC1P | 1 mA bi-directional pad | 95×334             | same           |
| BU1P  | 1 mA output buffer      | 95×334             | same           |
| ICP   | Input buffer            | 95×334             | same           |

## III. ASYNCHRONOUS CIRCUIT DESIGN METHODOLOGY

Asynchronous logic concepts have existed since the 1950's, offering potential power savings and performance improvements depending on the application [8]. Analogous to RHBD's shortfalls in power and area penalties, asynchronous logic design is more complex when compared to the synchronous commercial standard and carries a potential area penalty. Perhaps the best-reported comparison of power, performance, and area impact of applying asynchronous design to a large commercial circuit, such as the Asynchronous Pentium Front End, can be found in [9]. Recent advances in automating the asynchronous design process have made the idea more attractive, resulting in new commercial offerings.

#### A. Introduction to Asynchronous Design

Asynchronous logic offers potential power savings and performance improvements with a tradeoff in design complexity and usually a small area penalty. In its purest form, this circuit design approach aims to minimize transistor switching. Due to the variety of circuit types and implementation techniques, the design process can be quite complex.

Traditional synchronous circuit designs feature a global clock that drives latches surrounding combinational logic, which as a system, performs a particular function. The clock rate is determined by the critical path through the system. This approach has remained an industry standard largely due to the entrenched design flow, which includes design synthesis from HDLs. However, synchronous designs have periodic power peaks, which produce electromagnetic interference (EMI). Additionally, the global clock tree consumes a significant fraction of the required power.

Asynchronous SoC architecture, which offers numerous advantages, has only recently been considered by the SoC community [10]. Typically, asynchronous implementations can potentially require a fraction of the power of their clocked counterparts and produce very little EMI. Asynchronous designs are event triggered, processing new data using the minimum number of gate transitions possible. Asynchronous SoC design also promises to solve the global clock delay problem, which increases as the size of SoCs grow with increased functionality and performance. Asynchronous designs are based on the concept of modular functional blocks with

intercommunication using handshaking protocols. The overall function of the circuit resembles that of the synchronous one. Recently, considerable progress has been made to improve the design automation of this particular asynchronous characteristic through de-synchronization [11].

De-synchronization does not yet realize all the advantages of asynchronous logic. Although removing the global clock tree and replacing it with a fabric of handshaked interconnections does flatten the power spectrum and reduce EMI generation, it is generally accepted that the opportunity is missed to significantly lower the energy requirements and improve the performance. This can be achieved by recognizing that most synchronous circuits often have redundant operations depending on the system state and that not all operations take the same amount of time. Unfortunately, automating this process has not been achieved due to the variety of power and latency reduction techniques that can be applied, and each one design dependent.

#### B. Asynchronous Design Approaches Implemented

A custom design approach was chosen for this work to demonstrate the best possible benefits of asynchronous logic, leveraging the assumption that others are continuing to improve asynchronous design automation. The asynchronous building blocks explored in this effort fall into four categories [12]. The fundamental mode bounded delay methodology is used for blocks with relatively fixed completion times. The delay insensitive design methodology applies to functional blocks with widely varying completion times. Burst mode design methodology applies to components that serve as controllers or asynchronous finite state machines (AFSMs). The speed independent model specifies the handshaking protocols between major functional blocks. Additionally, ripple-latching and clock-gating are used to further lower EMI and energy use.

Fundamental mode bounded delay is used for functional blocks that have little variation in completion time, such as a latch. This methodology assumes that the delay time through a

functional block is a known constant. Worst-case delay, with a margin of safety, is used similar to a clocked circuit. Difficulty arises in synthesizing this structure since timing information cannot be synthesized from behavioral HDL, but can be back-annotated from layout simulations. Fig. 2 illustrates a delay element used to model the latch completion time. An acknowledge (ACK) signal is asserted when the data is latched after the request (REQ) is generated.

## Figure 2. Fundamental mode bounded delay applied to a latch

A delay element is not suitable for functional blocks with widely varying completion times, such as the basic add/subtract unit shown in Fig. 3. Additional logic can be added to this type of block to detect when its execution is complete to implement the delay insensitive approach. Synthesis tools do not yet have the ability to generate the completion detection circuit for a particular functional block.

## Figure 3. Full adder without completion detection

A dual-rail adder scheme such as the Manchester propagate, generate, kill (PGK) adder can be used to implement completion detection [13]. The dual rail adder works on the principle that each stage will have either a carry out (COUT) or no carry out (NOCOUT) condition based on the inputs to the stage. Adding 0 and 0 will never result in a carry out, even if there is a carry in. Similarly, adding 1 and 1 will always result in a carry out, even if there is a carry in of 0. Therefore, the carry condition in these cases can be determined by the data to be summed alone and gives early completion detection. Adding a 0 and 1 or 1 and 0 may or may not have a carry in (CIN) or no carry in (NOCIN) value. The end result is that the completion detection circuit simply becomes the NOR of the COUT and NOCOUT values. Whenever one of these conditions exists,

it indicates that all input values necessary for evaluating the sum are present and DONE is asserted. A design with improved average throughput is shown in Fig. 4.

## Figure 4. Full adder with completion detection [13]

The burst mode design methodology is used to design asynchronous controllers or finite state machines. Synchronous finite state machines are easily synthesized by using latches, flip-flops and clock circuitry. Asynchronous controllers or AFSMs are synthesized using specialized design tools, such as 3D [14].

Functional blocks in an asynchronous design must have a standard handshaking protocol in order to interface with other blocks. A generic functional block in an asynchronous design is shown in Fig. 5. The REQIN signal represents the external request to the block to input new data. The ACKIN signal is asserted when the new input data is fully latched or accepted. The REQOUT signal represents the request of the functional block to send processed data out. The ACKOUT signal is the external acknowledgement from the next block that the processed data was latched or accepted.

#### Figure 5. Asynchronous functional block

The speed independent methodology describes two standards for handshaking between connecting blocks or in this case, the external interface. The four-phase model is illustrated in Figure 6. It has a four-cycle handshake for each data exchange.

## Figure 6. Four-phase handshaking model

Finally, clock gating is a technique developed in the mid-1990's, analogous to asynchronous design, with the aim of reducing the amount of switching to an absolute minimum [15]. Clock gating relies on the intelligent application of control logic at various points in the circuit to prevent redundant clocking. The control signal is logically ANDed with the global clock signal to

provide a local clock that only switches when necessary. This also allows the use of standard data latches instead of those with an enable circuit. This technique is combined with the unique application of ripple latching to flatten the power spectrum and lower EMI.

## IV. CASE STUDY DESIGN AND RESULTS

The purpose of the case study, presented in this section, is to demonstrate the advantages of using RHBD and asynchronous timing together. Although area is sacrificed, the aim is that these techniques can offer higher performance, a flatter power spectrum, and similar energy consumption when compared to a synchronous design. The combined use of RHBD and asynchronous logic has been previously investigated in [16]-[18], however, these initial efforts lack a quantitative comparison in simulation and silicon. To make a convincing argument, a common design is selected and implemented in three ways: synchronous with commercial cell library (SC), synchronous with RHBD cell library (SR), and asynchronous with RHBD cell library (AR).

It should be noted that several other approaches have been investigated for space applications of asynchronous logic. For example, fault tolerance and deadlock have been addressed by works such as [19]-[21]. These approaches focus on logic gate and circuit level redundancy techniques to improve SEU hardness. However, they exclude TID and SEL considerations, which are mitigated by RHBD, including SEU. Additionally, asynchronous logic alone has been applied directly in the design of low power wireless sensor nodes [22].

#### A. Reference Design

The textbook MIPS multi-cycle microprocessor architecture is used as the baseline design as illustrated in Figure 7 (adapted from [23]). To keep the size small and affordable, a 16-bit fixed-point 4-register variant (versus 32-bit floating point 32-register) is implemented with a simplified

instruction set shown in Table III. The Cadence design flow is given in Table VI in the appendix. The baseline design is then copied and renamed as the synchronous/RHBD variant, where the commercial cells are replaced 1:1 with RHBD cells. The only exception is the smaller selection of inverters and buffers in the RHBD library. Both synchronous variants are fabricated on AMS S35 run 1725. The final layout of the synchronous/commercial cell design is shown in Figure 8 and the RHBD design in Figure 9.

Figure 7. MIPS architecture [23]

| Instruction      | Meaning                     | 16-bit Instruction | Cycles |
|------------------|-----------------------------|--------------------|--------|
| Add              | rd = rt + rs                | 0000rsrtrd000000   | 4      |
| subtract         | rd = rt - rs                | 0000rsrtrd000010   | 4      |
| logical AND      | rd = rt (bitwise and) rs    | 0000rsrtrd000100   | 4      |
| logical OR       | rd = rt (bitwise or) rs     | 0000rsrtrd000101   | 4      |
| set on less than | set $rd = 1$ if $rt < rs$   | 0000rsrtrd001010   | 4      |
| load word        | rt = mem[rs + addressx]     | 0001rsrtaddressx   | 5      |
| store word       | mem[rs + addressx] = rt     | 0010rsrtaddressx   | 5      |
| branch on equal  | if $rs = rt$ go to addressx | 0011rsrtaddressx   | 3      |
| Jump             | jump to addressx            | 0100000000000000   | 3      |

TABLE III. SIMPLIFIED MIPS INSTRUCTION SET

Figure 8. Synchronous baseline design with core area of 400×400µm

Figure 9. Synchronous RHBD design with core area of 700×700µm

The final design in the case study is an asynchronous/RHBD variant. The un-pipelined MIPS architecture turned out not to be the ideal asynchronous demonstration vehicle, but it does offer the observer direct insight to the design process. For example, it does not make sense to break down this architecture into smaller blocks where handshaking can be applied. Instead, the MIPS circuit should be thought of as a design block in a larger asynchronous SoC. The external interface of the asynchronous MIPS implementation is shown in Figure 5 with four-phase handshaking as in Figure 6. ACKOUT is hardwired to ACKIN externally.

As discussed in section III.B, several asynchronous design methodologies are applied to the synchronous MIPS architecture. This approach is different from the de-synchronization method as defined in [11], as it has a unique focus on overall power reduction and flattening of the power spectrum. The global clock is removed, but instead of replacing the flip-flops with master-slave latches and delay elements as in de-synchronization, a phased sequence of latching with delay elements (buffers in series) are carefully applied across the latches and multiplexers in the data path, as shown in Figure 10. Care is taken to ensure a hazard-free sequence and no double switching of elements. The synchronous FSM control block is improved to minimize latching of the MDR and ALUOut registers. Additionally, clock gating is applied within all registers, which allows the use of basic D-latches without enables. This also requires latches to be placed on all control signals and phased in as appropriate. Although not included in the final fabricated design due to increased energy requirements, an experimental design with ALU completion detection and a coordinating AFSM is implemented in parallel and reported on. The applied asynchronous design procedure is summarized in the following sequence of steps:

1. Remove global clock— clock tree synthesis (CTS) eliminated, power reduced

2. Add phased latching sequence—flattens power spectrum

3. Add delays within registers—flattens power spectrum

- 4. Improve MIPS control—eliminates redundant latching
- 5. Add clock gating—power reduced
- 6. Remove unused inverting outputs—power and area reduced

## Figure 10. Phase-latching asynchronous approach

The custom re-design of most elements in the MIPS architecture discussed above affects all steps in Table VI in the appendix. Most notably, CTS and optimization are prevented. The asynchronous/RHBD variant is fabricated on AMS S35 run 1791. The final layout of the asynchronous/RHBD design is shown in Figure 11.

#### Figure 11. Asychronous RHBD design with core area of 720×720 µm

#### B. Simulation and Test Results

A common test bench is used for NC-Verilog simulation, UltraSim simulation, and hardware testing with National Instruments (NI) Digital Waveform Editor and LabVIEW. NC-Verilog is a functional simulator that uses library timing information for each element. UltraSim is based on Spice, as it uses extracted parameters for a more accurate simulation, but uses a proprietary algorithm to allow for full-chip simulations in a reasonable amount of time. The UltraSim results are advertised to be within 5% of Spice.

An NI PCIe-6537 digital I/O interface is used for hardware evaluation of the test chips. The I/O interface is mounted in a PCI Express slot of a PC running NI LabVIEW 8.5 and Digital Waveform Editor 3.0. The interface connects to a connector block NI CB-2162 with a NI C68-D4 cable. A zero insertion force socket is used on the connector block with a custom PCB interface to route the socket pin signals to the appropriate connector block pins. A 1.3 Ohm resistor is used between the test chip ground and system ground, where a Tektronix TDS 2024 captures the test bench current draw by measuring the voltage drop across the resistor.

For all three designs, the final hardware functional results at all operating frequencies match the expected results as determined by NC-Verilog and UltraSim. The maximum frequency of all designs is 16.67 MHz in simulation, but the NI test interface is limited to 12.5 MHz.

Although correct functionality is essential to verify, the most important aspects in this work are the power performance and required core area. NC-Verilog is not able to report on power consumption, so UltraSim is used to compare the design performances before fabricating the devices. A comparison of results is given in Table IV. In this case study using a common design, the application of RHBD resulted in a 206% core area increase from the baseline design and required 154% more energy for the same testbench at any frequency, as determined through UltraSim simulations. Fig. 12 clearly illustrates that all the asynchronous approaches taken to reduce the power and smooth the power spectrum are indeed effective as the power profile is significantly flattened in comparison. The most important result is that the asynchronous approach reduced the energy penalty to 82% (from 154%) for a 6% area increase with no performance impact. An experimental asynchronous version with ALU completion detection required an additional six nJ in simulation. In all cases, simulations reveal that the I/O pads consume 28% of the reported energy.

TABLE IV. DESIGN COMPARISON SUMMARIES

| Design | Core Area | Total Transistor | Simulated   | Simulated          |
|--------|-----------|------------------|-------------|--------------------|
|        | (µm×µm)   | Width (µm)       | Energy (nJ) | Average Power (mW) |
| SC     | 400×400   | 16,088           | 28          | 6                  |
| SR     | 700×700   | 60,450           | 71          | 17                 |
| AR     | 720×720   | 55,973           | 51          | 12                 |

Figure 12. Single clock cycle comparison in UltraSim

Fig. 13 verifies that the final hardware results are correlated with the predicted simulation results, across the 1.25 to 12.5 MHz test points. Each hardware data point is found by averaging the results of ten test bench acquisitions. Core-only power measurements were not possible.

## Figure 13. Comparison of UltraSim to hardware results

Full test bench and single cycle comparisons of power measurements are shown in Fig. 14-25 in the appendix. In all cases, a significant power increase is seen from the SC to SR case, then dramatically reduced and flattened in the AR case. Additionally, two samples each of the SC and

AR test chips were subjected to a brief 100 krad (Si) TID radiation exposure using a Cobalt-60 source. As expected, the baseline SC design experienced a dramatic increase in leakage and operational current draw while the AR version experienced little change. The complete range of TID and SEE testing would be required to qualify the RHBD library.

#### V. CONCLUSION

Radiation hardening by design and asynchronous logic have been investigated as a complementary solution for bare die system-on-a-chip applications in hostile environments. The synergy of these two design approaches yields a circuit design that can tolerate extremes in radiation, power, process variance, and temperature. A case study using a textbook microprocessor compared the area, power, and performance of baseline synchronous design to design hardened and asynchronous/design hardened variants, all in the same SiGe BiCMOS technology. Radiation hardening by design alone levied a 206% area and 154% energy penalty. The additional application of asynchronous logic reduced the energy penalty to 82% for an additional 6% area with no performance impact. An initial TID radiation screening of 100 krad (Si) revealed the softness of the baseline design while the hardened design showed little response.

#### APPENDIX

| Step | Tool                 | Action                                                        |
|------|----------------------|---------------------------------------------------------------|
| 1    | Library Manager      | Copy CORELIB, GATES, IOLIB, and PRIMLIB to *_RHBD             |
| 2    | Virtuoso (Pcell)     | Create/compile nmos4 and pmos4 pcells in PRIMLIB_RHBD         |
| 3    | CDF                  | Edit descriptions of nmos4 and pmos4 in PRIMLIB_RHBD to match |
| 4    | Virtuoso (Schematic) | Verify/update width and length parameters in GATES_RHBD       |
| 5    | Virtuoso (Schematic) | Design synthesis to Layout XL                                 |
| 6    | Virtuoso (XL)        | Manually place and route pcells, label terminals              |
| 7    | Assura               | Copy/edit extract.rul file to extract annular nMOS properly   |
| 8    | Assura (DRC)         | Run design rule check, correct errors as needed               |
| 9    | Assura (LVS)         | Run layout versus schematic, ensure designs match             |
| 10   | Assura (RCX)         | Run parasitic extraction and verify av_extracted view         |
| 11   | DFII (Export Stream) | Create gdsII files from layout view                           |
| 12   | Library Manager      | Create functional (Verilog)                                   |
| 13   | Abstract Generator   | Complete abstract generation process for each cell            |

TABLE V. RADIATION HARDENED LIBRARY DESIGN DEVELOPMENT PROCESS

| Step | Tool                  | Build Action(s)                                                                                            |
|------|-----------------------|------------------------------------------------------------------------------------------------------------|
| 1    | Library Manager       | New design library                                                                                         |
| 2    | Virtuoso (Schematic)  | 16-bit multiplexors (MUX): 2:1, 3:1, 4:1                                                                   |
| 3    | Virtuoso (Schematic)  | Arithmetic Logic Unit (ALU) basic block: 1-bit add/sub                                                     |
| 4    | Virtuoso (Schematic)  | 16-bit ALU blocks: add/sub, and, or, slt, zero detect                                                      |
| 5    | Virtuoso (Schematic)  | Top-level ALU                                                                                              |
| 6    | Virtuoso (Schematic)  | ALU control (ALU C)                                                                                        |
| 7    | Virtuoso (Schematic)  | 16-bit registers: Program Counter (PC), Memory Data Register (MDR), Instruction Register (IR), A, B,       |
| 8    | Virtuoso (Schematic)  | Hardwired blocks: Shift Left 2 (SL2), Sign Extend (SE),<br>Four (4), Zero (0)                              |
| 9    | Virtuoso (Schematic)  | Top-level register file (3 registers + hardwired 0)                                                        |
| 10   | RTL Compiler          | Synthesis of Control block from Verilog description                                                        |
| 11   | DFII (Import Verilog) | Import synthesized logic into schematic                                                                    |
| 12   | Virtuoso (Schematic)  | Top-level MIPS                                                                                             |
| 13   | NC-Verilog            | Verilog testbench of all instructions with accurate timing                                                 |
| 14   | Virtuoso (Schematic)  | Top-level chip (adding I/O pads)                                                                           |
| 15   | NC-Verilog            | Re-verify testbench, export netlist                                                                        |
| 16   | RTL Compiler          | Pass-through of netlist to satisfy SOC Encounter format                                                    |
| 17   | SOC Encounter         | Import netlist, place I/O and core, route, clock tree synthesis (CTS), export netlist, export gdsII stream |
| 18   | NC-Verilog            | Import layout netlist to schematic, re-verify testbench                                                    |
| 19   | DFII (Import Stream)  | Import gdsII stream to layout                                                                              |
| 20   | Virtuoso (Layout)     | Inspect layout and add pin labels                                                                          |
| 21   | Assura                | Run DRC, LVS, RCX                                                                                          |
| 22   | UltraSim              | Run full-chip simulation, compare results with Verilog                                                     |
| 23   | DFII (Export Stream)  | Export gdsII file for fabrication, submit design                                                           |

TABLE VI. CADENCE DESIGN FLOW

Figure 14. Synchronous/Commercial design power spectrum (UltraSim)

Figure 15. Synchronous/RHBD design power spectrum (UltraSim)

## Figure 16. Asynchronous/RHBD design power ppectrum (UltraSim)

- Figure 17. Synchronous/Commercial design power spectrum (Hardware)
  - Figure 18. Synchronous/RHBD design power spectrum (Hardware)
  - Figure 19. Asynchronous/RHBD design power spectrum (UltraSim)
  - Figure 20. Synchronous/Commercial design single cycle (UltraSim)
    - Figure 21. Synchronous/RHBD design single cycle (UltraSim)
    - Figure 22. Asynchronous/RHBD design single cycle (UltraSim)
  - Figure 23. Synchronous/Commercial design single cycle (Hardware)
    - Figure 24. Synchronous/RHBD design single cycle (Hardware)
    - Figure 25. Asynchronous/RHBD design single cycle (UltraSim)

#### ACKNOWLEDGMENT

This effort is sponsored by the Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant number FA8655-06-1-3053. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purpose notwithstanding any copyright notation thereon. The views expressed in this article are those of the author and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the U.S. Government. This material is declared a work of the United States Government and is not subject to copyright protection in the United States. The authors gratefully acknowledge the National Physics Laboratory of the United Kingdom for their no-cost collaborative support of the total ionizing dose evaluation.

## REFERENCES

- B. W. Cook, S. Lanzisera, and K. S. J. Pister, "SoC Issues for RF Smart Dust," Proc. of the IEEE, vol. 94, no. 6, Jun. 2006, pp. 1177–1196.
- [2] D. J. Barnhart, T. Vladimirova, and M. N. Sweeting, "Design of Self-Powered Wireless System-on-a-Chip Sensor Nodes for Hostile Environments," in Proc. IEEE Int. Symp. on Circuits and Systems, Seattle, WA, 2008, pp. 824–827.
- [3] A. Holmes-Siedle and L. Adams, Handbook of Radiation Effects, 2nd Ed., Oxford University Press, 2002.
- [4] C. P. Brothers and D. Alexander, "Radiation Hardening Techniques for Commercially Produced Microelectronics for Space Guidance and Control Applications," in Proc. 20th Annual American Astronautical Society Guidance and Control Conf., Breckenridge, CO, 1997, pp. 169–180.
- [5] K. Strohbehn and M. N. Martin, "Spice Macro Models for Annular MOSFETs," in Proc. IEEE Aerospace Conf., Bozeman, MT, 2004, vol. 4, pp. 2370–2377.
- [6] D. C. Mayer, R. C. Lacoe, E. E. King, and J. V. Osborn, "Reliability Enhancement in High-Performance MOSFETs by Annular Transistor Design," IEEE Trans. on Nuclear Science, vol. 51, no. 6, Dec. 2004.

- [7] M. Hartwell, C. Hafer, P. Milliken, and T. Farris, "Megarad Total Ionizing Dose and Single Event Effects Test Results of a Radhard-by-Design 0.25 Micron ASIC," in Proc. Radiation Effects Data Workshop, Atlanta, GA, 2004, pp. 104–109.
- [8] S. H. Unger, Asynchronous Sequential Switching Circuits, New York: Wiley-Interscience, 1969.
- [9] K. Stevens, S. Rotem, R. Ginosar, P. Beerel, C. Myers, K. Yun, R. Kol, C. Dike, and M. Roncken, "An Asynchronous Instruction Length Decoder," IEEE Journal of Solid State Circuits, vol. 36, no. 2, Feb. 2001, pp. 217–228.
- [10] A. J. Martin and M. Nystrom, "Asynchronous Techniques for System-on-Chip Design," Proceedings of the IEEE, vol. 94, no. 6, June 2006, pp. 1089–1120.
- [11] I. Blunno, J. Cortadella, A. Kondratyev, L. Lavagno, K. Lwin, and C. Sotiriou, "Handshake Protocols for De-synchronization," in Proc. Int. Symp. on Asynchronous Circuits and Systems, Hersonissos, Crete, 2004.
- [12] S. Hauck, "Asynchronous Design Methodologies: An Overview," Proc. of the IEEE, vol. 83, no. 1, Jan. 1995, pp. 69–93.
- [13] C. Farnsworth, D. A. Edwards, and S. S. Sikand, "Utilising Dynamic Logic for Low Power Consumption in Asynchronous Circuits," in Proc. Int. Symp. on Asynchronous Circuits and Systems, Salt Lake City, UT, 1994, pp. 186–194.
- [14] K. Y. Yun, "Synthesis of Asynchronous Controllers for Heterogeneous Systems," Ph.D. dissertation, Stanford University, Stanford, CA, 1994.
- [15] G. E. Tellez, A. Farrahi, and M. Sarrafzadeh, "Activity-Driven Clock Design for Low Power Circuits," in Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, 1995, pp. 62–65.
- [16] B. W. Hunt, K. S. Stevens, B. W. Suter, and D. S. Gelosh, "A Single Chip Low Power Asynchronous Implementation of an FFT Algorithm for Space Applications," in Proc. Int. Symp. on Asynchronous Circuits and Systems, San Diego, CA, 1998, pp. 216–223.
- [17] D. J. Barnhart, "An Improved Asynchronous Implementation of a Fast Fourier Transform Architecture for Space Applications," M.S. thesis, Air Force Institute of Tech., Wright-Patterson AFB, OH, Mar. 1999.

- [18] D. J. Barnhart, P. Duggan, B. Suter, C. Brothers, and K. Stevens, "Total Ionizing Dose Characterization of a Commercially Fabricated Asynchronous FFT for Space Applications," Government Microcircuit Applications Conf. Digest of Papers, Anaheim, CA, March 2000.
- [19] D. F. Cox, "Asynchronous Logic Design with Subcells with an Application for Space," in Proc. IEEE Int. Conf. on Electronics, Circuits and Systems, Paphos, Cyprus, 1999, pp. 1225– 1230.
- [20] W. Jang and A. J. Martin, "SEU-tolerant QDI Circuits," in Proc. Int. Symp. on Asynchronous Circuits and Systems, New York, 2005, pp. 156–165.
- <sup>[21]</sup> M. Renaudin and Y. Monnet, "Asynchronous Design: Fault Robustness and Security Characteristics," in Proc. IEEE Int. On-Line Testing Symp., Como, Italy, 2006.
- [22] L. Necchi, L. Lavagno, D. Pandini, and L. Vanzagoin, "An Ultra-low Energy Asynchronous Processor for Wireless Sensor Networks," in Proc. IEEE Int. Symp. on Asynchronous Circuits and Systems, Grenoble, France, 2006.
- [23] D. A. Patterson and J. L. Hennessy, Computer Organization and Design, 3rd ed., Morgan Kaufmann, 2007, pp. 282–339.



Figure 1. RHBD layout of core transistor pair



Figure 2. Fundamental mode bounded delay applied to a latch



Figure 3. Full adder without completion detection



Figure 4. Full adder with completion detection [17]



Figure 5. Asynchronous Functional Block



Figure 6. Four-phase handshaking model



Figure 7. MIPS architecture. Abbreviations: multiplexers (MUX), Arithmetic Logic Unit (ALU), Program Counter (PC), Memory Data Register (MDR), Instruction Register (IR),

ALUOut (AO), Shift Left 2 (SL2), Sign Extend (SE), and ALU Control (ALU C)



Figure 8. Synchronous baseline design with core area of  $400 \times 400 \ \mu m$  (transistor test





Figure 9. Synchronous RHBD design with core area of  $700 \times 700 \ \mu m$ 



Figure 10. Phase-latching asynchronous approach



Figure 11. Asychronous RHBD design with core area of  $720 \times 720 \ \mu m$ 



Figure 12. Single clock cycle comparison in UltraSim



Figure 13. Comparison of UltraSim to hardware results



Figure 14. Synchronous/Commercial design power spectrum (UltraSim)



Figure 15. Synchronous/RHBD design power spectrum (UltraSim)



Figure 16. Asynchronous/RHBD design power ppectrum (UltraSim)



Figure 17. Synchronous/Commercial design power spectrum (Hardware)



Figure 18. Synchronous/RHBD design power spectrum (Hardware)



Figure 19. Asynchronous/RHBD design power spectrum (UltraSim)



Figure 20. Synchronous/Commercial design single cycle (UltraSim)



Figure 21. Synchronous/RHBD design single cycle (UltraSim)



Figure 22. Asynchronous/RHBD design single cycle (UltraSim)



Figure 23. Synchronous/Commercial design single cycle (Hardware)



Figure 24. Synchronous/RHBD design single cycle (Hardware)



Figure 25. Asynchronous/RHBD design single cycle (UltraSim)