The State of Safety Certification of Platforms
December 22, 2020
Blog
A lot has been written about safety “certification” of platforms. As the number of applications involving human safety increases in markets such as avionics, automotive, industrial, etc.
A lot has been written about safety “certification” of platforms. As the number of applications involving human safety increases in markets such as avionics, automotive, industrial, etc., the importance of the functional safety certification of software that controls key functions has never been greater. There are several standards that govern the safety certification of software like DO-178, SEAL, ISO26262, and IEC61508. It is the best known and perhaps the most rigorous is the DO-178 standard that is governed by the FAA for commercial avionics software. A look “under-the-hood” into the process of safety certification reveals many interesting facts.
As the leader of an engineering team that is working on certifying code for deployment on big programs like the Joint Strike Fighter, I thought it would be interesting to share the next level of what is involved. Let me start with a datapoint. The average time to get a single line of source certified to DO-178 DAL A Standard (used for the most critical system functions in aircraft and helicopters) may take 2-3 hours. So, that means that every 2,000 lines of code takes one year to certify. How many applications these days have as little as 2,000 lines of code?
Let me talk a little bit more about why it is that complex, and how companies deal with that. At the core of any safety certification process, there are two key elements:
- Writing down the requirements/design and being able to trace to a piece of code that implements each specific requirement
- Translating those requirements into a test suite that can independently verify the implementation of requirements without access to the source code, and demonstrating that every line of code has been exercised by these tests.
The reality of that 30-35% of the time is spent on the requirements, and 60-65% of the time is expended on the verification of software. Why? The short answer is that each line of source code requires many more lines of test code for complete coverage. Modified Condition/Decision Coverage (MC/DC) is a code coverage criterion commonly used in software testing. In addition to the criteria required by statement and decision coverage, MC/DC requires that, 'Each condition in a decision has been shown to independently affect that decision's outcome.'
There are a slightly relaxed set of regulations for systems that are not as mission critical and these will proportionally reduce the verification effort. A Design Assurance Level (DAL) C system for example, may require 20% less effort than a DAL A one for example.
One of the areas where avionics certification varies from other standards like IEC61508 (industrial) and ISO26262 (automotive), is that avionics requires independent authority to check your work. Engineering and flight test designees are responsible for finding that engineering data complies with the appropriate airworthiness standards. These designees are called Designated Engineering Representatives, or DERs, and they have the delegation of authority to examine certification data.
However, the challenge is that this safety-certification process has to co-exist with a more “agile” model of software innovation for commercial software vendors to stay competitive in the market. The team at Lynx believes in adopting a two-stage model for releases of safety-critical software. This is slightly different when compared with an avionics company like Collins Aerospace:
- Release “certifiable” products into the market to bring innovative new features and encourage market adoption. The best cadence is to issue a product release about every six months. This is an optimal frequency of releases for customers to adopt new product versions. This also allows customers to take advantage of our latest features and enables application development.
- Judge market adoption and then certify product releases that satisfy the key safety requirements of one or several use-cases. With the average process taking 12-18 months, it is possible that we could have released three versions during that cycle
Mindful of the earlier point about code complexity impacting the cost and time taken to certify platforms, not all code that is provided as a commercial product will necessarily be certified. What this is also driving is a shift to mixed criticality systems using virtualization, with only the minimum set of software that needs to be put through the certification process being ported to run on (typically) a real-time operating system, with the rest running on operating systems like Linux. What is critical for the certification process is to be able to show how the cost for certification is isolated from the remainder of the system so that its functionality and behavior can be guaranteed with certain bounds, not matter what else is occurring on the rest of the system.
Finally, let’s talk about open source, specifically Linux. Being located in the Bay Area, I have met with several companies over the last few years that are creating flying taxis (more recently renamed as “Urban Mobility”). All were building demonstrators that used Linux. The challenge is that use of Linux for mission critical systems simply doesn’t fly with the FAA (sorry about the pun!). Developing code around Linux is the fastest way to bring up a system. However, it is extremely critical to find a simplified path to migrate code onto a real time operating system – creating POSIX-compliant technology is an advantage.
POSIX started back in the days when there were a diverse set of UNIX kernels (remember Solaris, AIX and DG/UX) and there was a need to migrate applications between them. While POSIX might sound old (and it is!), it still provides the best option to migrate applications from Linux to a real-time, safety-certified RTOS environment. In fact, when the Future Airborne Capability Environment (FACE) group was formed, they did the smart thing and chose to adopt elements from the most popular APIs, namely POSIX and ARINC. This application migration approach is a critical element of ensuring that a new segment like Urban mobility can stay viable in the face of emerging functionality, complexity and regulatory scrutiny.
So, to summarize here, I will leave you with the following:
- 1,000 lines of source code; It’s a big deal for certified systems. Architecting your system appropriately is key to approaching the certification step with a cost- and time-effective path
- Starting with Open Source or Linux is fine. But plan your path towards certification early in the process
- Some of the APIs that you are relying on for embedded platforms might well have their heritage from decades ago from the world of mainframes and workstations!
About the Author
Arun Subbarao is Vice President of Engineering at Lynx Software Technologies, responsible for the development of products for the Internet of Things and Cyber-security markets. He has 20+ years of experience in the software industry working on security, safety, virtualization, operating systems and networking technologies. In this role, he spearheaded the development of the LynxSecure separation kernel & hypervisor product as well as other software innovations in cyber-security leading to multiple patents. He is also a panelist and presenter at several industry conferences. He holds a BS in Computer Science from India, MS in Computer Science from SUNY Albany and an MBA from Santa Clara University.