Mitigating AI/ML Risks in Safety-Critical Software
July 09, 2024
Story
Artificial intelligence (AI) and machine learning (ML) are the newest frontiers for developers of safety-critical embedded software. These technologies can integrate and analyze data at a massive scale and support capabilities with human-like intelligence. As functional safety practitioners who are used to risk mitigation processes and techniques decades in the making, developers working in this field must adapt to the huge promise of AI/ML without compromising safety at any level of the systems they are building.
What Are Artificial Intelligence and Machine Learning?
ChatGPT, GitHub Copilot, Amazon Q Developer, and similar generative AI tools have caused plenty of buzz and confusion around what AI/ML actually is. For safety-critical development, AI/ML encompasses a broad range of capabilities with seemingly limitless applications from coding assistants to major in-vehicle features.
The Oxford English Dictionary (OED) defines AI as the “capacity of computers or other machines to exhibit or simulate intelligent behavior; the field of study concerned with this.” The OED defines machine learning as the “capacity of computers to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyze and infer from patterns in data.”
Types of Artificial Intelligence
AI algorithms are classified as “narrow” or “general”. Narrow (or weak) AI performs specific tasks and lacks general human-like intelligence. IBM’s Deep Blue is perhaps the most famous example of weak AI, as are modern chatbots, image recognition systems, predictive maintenance models, and self-driving cars. Automated unit test vector generation is a form of weak AI as it “simulates intelligent behavior” by deriving test stubs and control from existing code. Figure 1 shows a set of test vectors created automatically through software, eliminating the need for humans to spend time creating them.
Figure 1. Test vectors created by the LDRA tool suite (source: LDRA)
By contrast, general (or strong) AI performs a variety of tasks and can teach itself to solve new problems as if it were relying on human intelligence. Such systems have yet to be developed, and there is much debate over whether they are possible in our lifetime (Star Trek’s main computer notwithstanding).
Types of Machine Learning
ML can be classified by the types of data facilitating its algorithms, either “labeled” or “unlabeled.” Labeled data refers to data that has been annotated in some manner with correct outcomes or target values. This type of data is generally more difficult to acquire and store than unlabeled data.
The four main types of machine learning are:
- Supervised Learning: The algorithm is trained on labeled data such that the correct output is provided for each input. The experience gained from mapping the inputs to the outputs provides a basis for predictions on new data.
- Unsupervised Learning: The algorithm is given unlabeled data and identifies patterns or structures for use by applications.
- Semi-Supervised Learning: The algorithm is trained on a combination of labeled and unlabeled data.
- Reinforcement Learning: The algorithm learns to make sequences of decisions based on a reward system, receiving feedback on the quality of its decisions and adjusting its approach accordingly.
Applications in Functional Safety
Although the different types of ML require different levels of human input, they align with functional safety standards proven to yield sufficiently reliable software within the context of their deployment. For example, the IEC 62304 “Medical device software—software life cycle processes” functional safety standard is typical of a “requirements first” approach embodied by supervised and semi-supervised learning.
This standard does not insist on any specific process model, but it is often represented as a V-model as shown in Figure 2.
Figure 2. A V-model representation of the stages of development imposed by the IEC 62304 functional safety standard (Source: LDRA)
Industry-Specific Adaptations for AI/ML
The International Medical Device Regulators Forum (IMDRF) publishes a document defining a systematic risk classification approach for software intended for medical purposes. Known as “Software as a Medical Device: Possible Framework for Risk Categorization and Corresponding Considerations,” the document classifies a given device’s risk according to a spectrum of impact on patients, as shown in Figure 3.
Figure 3. Software as a Medical Device (SaMD) impact on patients with “I” representing the lowest risk and “IV” the highest (Source: IMDRF)
This includes factors such as the software's intended use, the significance of the information provided by the software for making medical decisions, and the potential consequences of software failure.
As this classification is agnostic of the methodology used to create the software, medical device developers can apply these guidelines to determine the level of requirements and regulatory scrutiny necessary for software based on AI/ML-based techniques.
For its part, the automotive industry is taking a more proactive approach, developing new standards to accommodate the growth of AI/ML applications:
- ISO/CD PAS 8800, Road Vehicles — Safety and artificial intelligence: This standard will define safety-related properties and risk factors impacting the insufficient performance and malfunctioning behavior of AI.
- ISO/CD TS 5083, Road Vehicles — Safety for automated driving systems — Design, verification, and validation: This document will provide an overview and guidance on the steps for developing and validating an automated vehicle equipped with a safe automated driving system.
Figure 4 illustrates how these standards fit within the context of existing industry guidelines.
Figure 4. New automotive safety standards for AI, ISO PAS 8800, and ISO DTS 5083, in the context of existing guidelines (Source: LDRA)
A Method to Mitigate AI/ML Risks in Safety-Critical Applications
In modern systems, the outputs of AI/ML-based components will be sent to software built with non-AI techniques, including systems with humans in the loop. This supports the segregation of domains familiar to safety-critical developers where AI/ML components are contained, and non-AI components are designed to mitigate risks with cross-domain interactions.
IEC 62304:2006 +AMD1:2015 permits this approach, stating that “The software ARCHITECTURE should promote segregation of software items that are required for safe operation and should describe the methods used to ensure effective segregation of those SOFTWARE ITEMS.” It further states that segregation is not restricted to physical separation, but instead “any mechanism that prevents one SOFTWARE ITEM from negatively affecting another.” This suggests that software separation between AI/ML components and traditional components is valid.
Current test tools can support risk assessment and mitigation of these cross-domain interactions. As illustrated in Figure 5, taint analysis can validate data flows coming from AI/ML components into traditionally developed software.
Figure 5. Taint analysis using the LDRA tool suite (Source: LDRA)
The European Aviation Safety Agency (EASA) document, “Artificial Intelligence Roadmap: A human-centric approach to AI in aviation,” has additional suggestions to ensure AI safety:
- Include a human in command or in the loop.
- Monitor AI/ML output through a traditional backup system.
- Encapsulate ML within rule-based approaches.
- Monitor AI through an independent AI agent.
Mitigating AI/ML Safety Risks Starts with Today’s Tools
Developers of safety-critical systems are cautious about AI/ML algorithms and are looking for ways to mitigate risks proactively. For teams holding themselves back from adopting AI/ML, existing functional safety principles, such as domain segregation, can be effective in mitigating risks. Existing tools can also be used to determine the influence of AI/ML on traditionally developed software items.