Peter Mehlitz and John Penix have authored a NASA paper on creating radiation hardened software.
Radiation induced Single Event Effects (SEEs – single protons or heavy ions hitting the computing device) are a serious problem for spacecraft flight software, potentially leading to a complete loss of mission. Conventional risk mitigation has been focused on hardware, leading to slow, expensive and outdated on-board computing devices, increased power consumption and launch mass.
Our approach is to look at SEEs from a software perspective, and to explicitly design flight software so that it can detect and correct the majority of SEEs. Radiation hardened flight software will reduce the significant residual risk for critical missions and flight phases, and enable more use of inexpensive and fast COTS hardware.
SEEs may cause probabilistic memory errors, namely transient Single Event Upsets (SEUs – “flipped bits”), and potentially permanent failures like Single Event Latchups (SELs – “stuck bits”).
How often do errors occur?
The diagrams below show occurrences of bit failures that were encountered during the Gravity Probe B mission with points marking detected SEEs and colors representing the different on-board computers. The left diagram displays transient single bit failures that were correctable. The right diagram shows uncorrectable multi-bit failures, occurring roughly once per 40 days per computer, causing a reboot about every 90 days (depending on the amount of used memory).
The data shown in these diagrams does not represent a GP-B specific problem. The Satellite News Digest website lists about 40 anomalies attributed to radiation during autumn 2003, which does not include military satellites. Effects can range from temporary functional disruptions to a complete loss of mission.
Conventional hardware approaches
Traditionally, radiation effects have been treated primarily as a hardware problem, to be mitigated by shielding, radiation hardened processors, and Error Detection And Correction (EDAC) memory. Due to higher chip densities and lower power levels, the vulnerability against SEEs is expected to increase. Finally, it is important to note that even if radiation hardened hardware is used, there is a significant residual risk.
The potential benefits of COTS hardware usage can be illustrated by comparing costs and performance of the RAD6000 (the prevalent radiation hardened processor used in space flight), and the current Intel P4:
processor per chip costs clock frequency transistors
RAD6000 > $200,000 25MHz ~1e6
Intel P4 (COTS) $500 > 2 GHz ~55e6
Using hardened software
Creating flight software that is resilient against SEEs, it is hoped that by increasing the probability of relevant SEE detection, reducing the amount of potential data loss, and decreasing required recovery time, they expect to achieve a twofold effect:
- Increase robustness
- Decrease dependency on specialized hardware
The project consists of three parts:
- A library for radiation hardening (RHS library) with corresponding application design guidelines,
- A flight experiment to test a set of applications that were hardened using the RHS library against non hardened variants in a real space environment,
- A software simulator to reproduce the measured results in a lab environment, optimize the RHS library, and
apply it to other missions and applications.