CPL - Chalmers Publication Library
| Utbildning | Forskning | Styrkeområden | Om Chalmers | In English In English Ej inloggad.

Node-Level Fault Tolerance for Embedded Real-Time Systems

Joakim Aidemark (Institutionen för datorteknik)
Göteborg : Chalmers University of Technology, 2004. ISBN: 91-7291-530-7.- 42 s.

This thesis deals with cost-effective design and validation of fault tolerant distributed real-time systems. Such systems play an increasingly important role in embedded applications such as automotive and aerospace systems. The cost of fault-tolerance is of primary concern in these systems, particularly for emerging applications like micro-satellites, unmanned air vehicles and active safety systems for road vehicles.

We address cost issues of fault tolerance from both a design and a validation perspective. From a design perspective, we investigate cost-effective techniques that can make systems more resilient to transient hardware faults. We propose a two-level approach to achieve fault-tolerance that combines system-level and node-level fault tolerance. Our approach relies on nodes that mask the effects of most transient faults and exhibit omission or fail-silent failures for permanent faults and transient faults that cannot be masked by the node itself. As only a subset of the faults is tolerated at the node level, we call this approach /light-weight node-level fault tolerance/, or light-weight NLFT. Tolerating transient faults at the node level is important in systems that rely on duplicated nodes for fault tolerance, as it allows the system to survive transient faults also when one of the nodes have failed permanently. It also improves the robustness of the system when both nodes are affected by correlated or near coincident transient faults. We have implemented a real-time kernel that supports light-weight NLFT through time redundant execution of tasks and the use of software implemented error detection. The effectiveness of light-weight NLFT is evaluated both analytically and by extensive fault injection experiments.

The thesis also deals with the cost of fault tolerance from a validation perspective. Fault injection based validation of error handling mechanisms is a time consuming and costly activity. We present a fault injection tool that can be easily extended and adapted to different target systems making fault injection less time-consuming. We also propose an analytical technique for investigating how error coverage varies for different input sequences to a system. The analysis helps us identify interesting activation patterns, e.g., those that give extremely low, or high, error coverage.

Nyckelord: distributed real-time systems, error detection, error recovery, time redundancy, real-time kernels, fault injection, reliability calculations, analytical coverage estimation

Denna post skapades 2006-08-25. Senast ändrad 2013-09-25.
CPL Pubid: 4262


Institutioner (Chalmers)

Institutionen för datorteknik (2002-2004)


Information Technology

Chalmers infrastruktur

Ingår i serie

Technical report D - School of Computer Science and Engineering, Chalmers University of Technology 34

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie 2212