Mission-critical software architecture: FSMs versus RTOS
| By |
|
A Real-Time Operating System (RTOS) is a popular selection for critical-mission embedded software architecture. However, software developers should consider a system of Finite State Machines (FSMs) as an alternative architecture.
The use of an RTOS - whether developed "in-house" by savvy individuals or commercially licensed (COTS) - presents difficult issues during the design and coding processes and when performing the independent validation and verification. These issues warrant special consideration in order to provide the reliable and predictable behavior, as well as excellent real-time performance, required of critical mission software. Many of these issues, however, simply do not exist in a system of Finite State Machine architecture. Additionally, an FSM-based architecture provides benefits that will improve upon the design, coding, and verification activities, thereby reducing development costs.
Benefits and design considerations for RTOS-based embedded software
The criteria for selecting a COTS-RTOS for a particular embedded application generally include the following:
- Services provided in the form of an Application Programming Interface (API)
- Scheduling algorithm (preemptive or nonpreemptive) and resulting performance expectations
- Memory requirements for the kernel and stack operations
- Learning curve, customization, and vendor support
- Software development/testing tools and support
- Licensing fees
Many capable vendors offer RTOS products that are suitable for many embedded applications based on these criteria. Vendors are anxious to report their RTOS success stories, and have respectfully met many customers' needs with their useful products. Despite these benefits, however, there are special design, coding, and verification considerations inherent in most RTOSs (COTS and in-house developed) that must be skillfully addressed to ensure a safe, reliable, and cost-effective embedded software product.
RTOS task partitioning and scheduling
Many COTS-RTOS products may be considered derivatives of general-purpose desktop/enterprise operating systems, which, from a processor perspective, execute separate, unrelated applications, referred to as tasks. Although this architecture has proven itself sufficient for multitasking desktop operating systems, it may not be the ideal architecture for embedded systems, which typically perform single, high-performance applications.
Partitioning an RTOS-based application into tasks is somewhat arbitrary and usually results in a set of tasks that may be run and tested individually. Tasks are generally constructed as 'while ()' loops and may use any of the hardware and software resources available to accomplish the required functionality. Figure 1 is a visualization of an RTOS partitioning with tasks appearing in columns and typical resources appearing as rows.
|
|
|
Figure 1 (click graphic to zoom by 1.4x) |
Each task is run by a complex and usually proprietary mechanism called the RTOS scheduler. This scheduling algorithm may accommodate fixed or dynamic task prioritization to allow all tasks to run at the desired time. This means that each task will be suspended either at a fixed time interval or upon the occurrence of other events, causing another higher priority task to run. A necessary runtime penalty for this scenario is the context-switch. This process must occur every time the execution of task is suspended in order to run the next priority task. It involves saving the processor registers and stack for the suspended task, and the restoration of the same for the next task to be run. This operation steals processor cycles from the application code, which might be a consideration for high task switch rates.
RTOS module design and coding considerations
Because each task can be suspended at any time, the task module must be designed and coded skillfully to ensure that proper operation occurs upon resumption of the task. This design consideration is known as code-reentrancy and is potentially a significant source of runtime anomalies for RTOS-based architectures.
Another potential pitfall when coding RTOS tasks is the requirement for mutexes/semaphores/tokens to allow for the sharing of resources among RTOS tasks - as though the "right hand knows not what the left hand is up to."
Although RTOSs are currently being deployed in military embedded systems, the inherent characteristics of its architecture require specialized skills to develop safe and effective embedded software. The somewhat arbitrary partitioning of the embedded application into (prioritized) tasks, and the complexity of the task scheduling algorithm, present a difficult architectural framework for software development, test, and debug. The design and coding considerations to accommodate code-reentrancy and resource sharing add additional challenges for developers. An effective verification process will analyze both the RTOS internals and application modules by source-code inspection. The effectiveness of the RTOS scheduler will need to be verified to ensure proper task execution sequences in the dynamic event-driven runtime environment. Theoretically, all possible sequences of events and conditions and the resulting suspension-resumption of tasks must be tested to provide sufficient tes t coverage for mission-critical software. Software modeling and performance analysis tools may be valuable to the verification process, but the effort to properly configure these tools to obtain meaningful results should be a consideration. Overall, the most difficult challenge might be the comprehensive verification of the RTOS-based architecture and design in order to obtain certification for deployment. Most of these RTOS architectural, design, and coding considerations become nonissues in an FSM-based architecture.
An FSM-based architecture
Because embedded system hardware and software components have a unified purpose in performing a single dedicated function, perhaps a more singular approach to software architecture would be appropriate - specifically, an architecture consisting of a system of individual Finite State Machines. As the use of FSMs has proven itself a worthy tool in the digital hardware design of sequential circuits such as graphic controllers and communication controller ICs, abstracting these powerful concepts into the software realm for embedded systems seems almost intuitive: soft FSMs driving hard FSMs. In other words, it makes sense to design (the layers of) a communication protocol task as a Finite State Machine to 'drive' the communication controller digital device, the design of which is also a Finite State Machine.
Task partitioning as a system of FSMs
Partitioning embedded system software into application tasks (versus RTOS tasks) would simply be a matter of partitioning by resource, as depicted in Figure 2. In other words, tasks would be responsible for:
|
|
|
Figure 2 (click graphic to zoom by 1.4x) |
- Driving both internal and external devices and their interfaces
- Algorithmic data processing
- Data acquisition, storage, and retrieval
- The various layers of communication protocols
Most applications would also require a control task, which could provide supervisory control of other tasks as well as maintaining the state of the machine, for example, various modes of operation and/or navigation of a user interface. Additionally, including a task responsible for the detection, reporting, and recovery from errors would likely be a requirement of most mission-critical embedded system specifications.
In this FSM-based software architecture, all application tasks would be designed as individual state machines using conventional state diagrams of the Mealy/Moore paradigm to convey state logic, a template of which is shown in Figure 3. Each state would be directly encoded as a (C-language) function to be run through completion by a system executive. Each task would maintain its own current state variable to be used by the executive to determine which state function is to be executed for each task. One of the advantages of this method of partitioning is that the issues of context switching, code reentrancy, and resource-sharing among RTOS tasks become entirely irrelevant.
|
|
|
Figure 3 (click graphic to zoom by 1.4x) |
The FSM-based architecture illustrated would also support basic services such as a system timer and intertask communications. In addition to a current state variable, each task could have a timer count that would be decremented by a single timer interrupt ISR. Tasks could communicate using basic messaging and bit flags. Other device interrupts would be handled by device-specific ISRs and possibly setting a flag event for further processing by a particular task.
An FSM-based architecture provides important benefits
The real-time performance of an FSM-based software architecture is superior to that of an RTOS because of the lack of context switching and the inherent nature of state machines. As each task is in a given state at a particular time, if the specified event for that state has not occurred, nothing else needs to be performed for that task, and execution continues with the next task. If the specified event has occurred, processing that event occurs immediately within that state (or subsequent states) and the task's current state variable is updated as required, causing a different state function of the task to be executed on the next pass of the executive. This results in very predictable and deterministic behavior of the entire system, which can be easily measured using basic performance metrics.
Another advantage to an FSM-based architecture is that tasks can be assigned, at design time, to one of many processing elements. The current trend to distribute RTOS/application tasks across processing cores running at low(er) clock speeds to reduce system power consumption is very beneficial but presents challenges to software tool providers. Ideally, to maximize system performance, each processing core could be dedicated to running specific FSM tasks assigned at design time, as opposed to a hardware/software algorithm to dynamically assign (RTOS) tasks during runtime.
The application, as a system of Finite State Machines, can be easily conveyed diagrammatically showing partitioning into tasks, intertask communication, and data flow among tasks. The state diagram for each task is both easily documented and encoded into source modules. As there is no RTOS or scheduler, only a basic executive is needed to execute the current state functions for each task. All of these attributes of an FSM-based software architecture allow the verification process to be more meaningful and easier to perform.
FSM-based architectures for future embedded systems
Embedded system software is deserving of its own architecture and development methods that are more akin to that of hardware design processes, as opposed to those of desktop/enterprise software systems. The use of an FSM-based software architecture would benefit mission-critical applications by providing a consistent design method that can be easily tested and verified. Offering excellent system performance and ease of maintenance, an FSM-based software architecture could significantly reduce the costs and risks normally associated with developing, verifying, and certifying mission-critical embedded software.
Mapletech Productions215-628-2231
www.mapletechproductions.com


