Static analysis aids code portability
February 01, 2010
Story
Static analysis tools help developers ensure that porting will proceed as planned.
Code reuse is often a major consideration within new projects, both in terms of making use of legacy code from preceding projects and as a foundation for those that will follow. Static analysis can be used to ensure that legacy code does not become a source of issues within a project and to guarantee that any code produced during its development will not afflict any projects that draw on it as a code base.
C code is particularly vulnerable to porting issues, especially since compilers cannot be expected to detect them as code will comply with the language specification (assuming no language extensions are used). It is therefore essential that developers use static analysis tools to confirm that porting will proceed as planned. Static analysis tools can help with this in several ways.
Portability issues arising from the size of int
The precision (number of bits) in an int can differ between systems. To deal with this, it is common to define a set of typedefs to map system types onto application types. The following example can be defined for a 16-bit architecture:
typedef unsigned char U8;
typedef unsigned int U16;
typedef unsigned long U32;
typedef unsigned char U8;
typedef unsigned short U16;
typedef unsigned int U32;
However, porting is not that simple, as changes in the size of int can have some less obvious effects on the code. For example, any expression whose result depends on the effects of integer promotion may exhibit different behavior. Because of this, such a change is only appropriate if the precision within all expressions containing objects of the affected types fits the purpose. Static analysis can be used to validate this assumption.
Compilers will not report any of these issues because the code is perfectly valid for the targeted environment, even though it might not behave as expected.
Portability issues arising from compiler implementation
Differences in the implementation-defined, unspecified, or undefined behavior associated with the compiler can lead to defects when porting.
Implementation-defined behavior is behavior that might differ between compilers but is documented by the compiler vendor. Static analysis tools can detect code that invokes such behavior so that it can be eliminated to facilitate porting.
Unspecified or undefined behavior can also be detected; however, it presents more than just a portability issue, as such behavior can change in an undocumented way between different versions of the same compiler and might even change between various use cases within the same compiler. Code invoking such behavior could work but most likely would be very fragile. Notably, moving to a different version of a compiler can be considered as porting.
Compilers are not required to detect uses of implementation-defined, unspecified, or undefined behavior because the code is perfectly valid.
Coding standards
Publicly available coding standards like MISRA C:2004 (www.misra-c.com), which can be rigorously enforced by static analysis tools, include rules that defend against these portability issues. The subsequent examples make use of this standard.
Integer conversions within C
The rules governing how and when different arithmetic types are implicitly converted during the evaluation of an expression within C are complex. To ensure that results are as expected when code is ported, all operations within an expression should be conducted in the same type after all such implicit conversions have been taken into consideration.
The implicit conversions associated with integer promotion can easily lead to code performing in a way that is significantly different from what developers expect. Integer promotion basically requires that any type smaller than an int (such as char, short) be converted to an int before it is used as an operand within an expression. Many embedded systems make extensive use of these types because they often allow for more efficient usage of memory resources, which could be restricted to save cost, space, and power.
Integer promotion is value-preserving (meaning the magnitude and sign are preserved), but the signedness of an object might change. Additionally, the expression will be evaluated in a type that is wider than the type of the operands. Consider the following example:
U8 u8a = 200U;
U8 u8b = 100U;
U8 u8r = u8a + u8b;
In this example, u8a and u8b are both converted to signed int with a width of at least 16 bits before the addition takes place. The result of the addition is then implicitly converted back to 8 bits before it is stored in u8r. In this case, developers are likely to expect the result (44), as it is reasonable to assume they are aware of the modulo 2 arithmetic that takes place on assignment. This means that the result is effectively the same as it would have been if the operation had taken place with 8-bit precision (integer promotion did not affect the result).
However, when integer promotion occurs at the same time as an implicit widening conversion, there is potential for confusion. Consider the following:
U16 u16a = 0xffffU;
U16 u16b = 0x0001U;
U32 u32r = u16a + u16b;
On a 32-bit architecture, u32r will have type unsigned int while u16a and u16b will have type unsigned short. Integer promotion will cause the operands to be converted to signed int before the addition takes place, and the result will be implicitly converted to unsigned int on assignment, giving a final value of 0x10000. Developers can (perhaps justifiably) rely on the integer promotion taking place to ensure that the addition does not wrap as it would if 16-bit arithmetic were used.
If developers decide to port the code to a 16-bit architecture, u32r will then have type unsigned long and u16a and u16b will have type unsigned int. This time, no conversions will be applied to the operands, which are already unsigned int, before the addition takes place (also in unsigned int), and the result of 0x0000 will be implicitly converted to unsigned long on assignment, giving a final value of 0x0000. The assumption that the addition would be performed in a wider type is now no longer valid, and there is a risk that an unintended wraparound has occurred.
This shows how easily code can exhibit different behavior when it is ported from one platform to another. The real issue here relates to the implicit widening conversion that takes place on assignment of the result. This can be eliminated by ensuring that the expression is always evaluated with the necessary precision using a cast, for example:
u32r = ( u32 ) u16a + u16b;
The ( u32 ) cast ensures that the expression is always evaluated in a type with the appropriate precision. In the previous example, this means the expression is evaluated in unsigned long rather than unsigned int. As shown in Figure 1, static analysis can easily detect implicit widening.
Integer promotion can also affect other operations. Consider the following:
u16a = 0x1234U;
u16r = ~u16a >> 8;
On a 16-bit architecture, this will lead to the bits of u16a being inverted and the top byte shifting into the bottom byte, assigning 0x00ED into u16r.
However, on a 32-bit architecture, u16a will be converted to signed int (with 32 bits) before the complement takes place, resulting in the value 0xFFED being assigned into u16r.
Once again, the use of a cast will ensure the behavior is as expected:
u16r = ( U16 )~u16a >> 8;
Evaluating code suitability
Static analysis tools are an invaluable aid to code porting. As shown in Figure 2, these tools permit developers to evaluate legacy code and ensure that new code is developed in a way that allows it to be ported.
Early adoption of static analysis in a project life cycle will ensure that legacy code is validated as early as possible and that any new code is portable from the outset. Developers can rapidly recover the initial outlay involved in using such tools through reduced development time and significantly lower residual defect levels.
LDRA +44-0151-649-9300 [email protected] www.ldra.com