You Will Ship Bugs: Why Planning for the Inevitable Leads to Better Products
May 17, 2021
Story
Firmware engineers are acutely aware of the damage a buggy product can inflict – from customer inconvenience and reputational damage, to a bricked device when a firmware update flow is broken.
With companies investing heavily in firmware in order to ship functional products (estimates peg the cost between $20 and $40 per line of code), no engineer wants to be responsible for a bug.
But while it may be hard to hear, and even harder to accept, the truth is that you will ship bugs. You can be part of the most experienced, most intelligent, most extraordinary development team with the most well-tested product built on the most sophisticated hardware in history...and you will still ship bugs. Hyperbole aside, too many developers focus on delivering a bug-free product and, by doing so, they forgo implementing key processes for monitoring, updating, and patching products that can actually result in better products that get to market faster.
Whether due to an error or improvement, firmware updates are necessary for myriad reasons. There’s no way to anticipate each user’s input. There’s no way to know each way end users will use your device (and potentially break – or brick – it). Time and budget constraints put pressure on development teams to simply get the product out the door. It’s impossible for even the best QA teams to catch every bug, no matter how rigorous the testing or talented the team. Ask NASA. Or Apple. Or Tesla. Or iRobot.
Embedded systems expert Jack Ganssle offers a useful parallel about software engineering; that the elite, or top 1% of engineers, inject just about 11 bugs per thousand lines of code. Ganssle asserts that the other 99% average about 120 bugs per KLOC. Extrapolating that to firmware, expecting bugs, and implementing a system to monitor and fix them regularly is merely common sense.
Beyond internal code, connectivity through BLE, LTE, or mesh networks adds additional complexity and risk. Everything is connected, whether amongst devices or between devices and the cloud, which requires visibility to ensure products are functioning correctly and are protected against external threats. Not long ago, developers could write static firmware for specific device use cases or for commoditized products and ship those products without any additional interaction or engagement. Today’s connected environment has upended that assumption entirely. Customers expect continuous functionality and continued product improvements through updates and have expansive online platforms to amplify any dissatisfaction. Connectivity also adds security issues that need swift patches and regular monitoring, often due to third-party code bugs or cloud-associated vulnerabilities.
When we’ve polled developers, they consistently confirm what we found to be true in our experience: nearly half their time is spent diagnosing and debugging elusive issues. Bus faults, memory management faults, usage faults, and hardfaults are all commonly encountered, but their causes are often obscure and difficult to reproduce.
Instead of passively waiting to be alerted about unexpected issues, deploying a system that proactively looks for them (and can repair them before end users find them) frees up valuable developer time, and allows teams to build and improve products rather than just respond and react.
Acknowledging that some bug will inevitably arise in your code someday empowers developers to build better embedded devices and get them to market faster. Adopting an agile workflow means shipping a minimum viable product (MVP) and iterating. Rather than freezing development for general manufacturing, teams can freeze firmware in an incomplete state and update the device once it’s in customers’ hands.
Once the devices are in the wild, teams can monitor mission-critical metrics (battery life, memory usage, connectivity state, etc.) for instant visibility into fleet performance along with potential bugs, affected versions, and the prevalence and frequency of those issues. Device makers who can preemptively discover and repair issues offer a superior customer experience while maintaining robust security.
Bugs are to be expected, no matter who you are or what you’re building. In place of the Sisyphean task of eliminating them, adopt a system of find-and-fix.
François Baldassari is the CEO and cofounder of Memfault, creator of the first end-to-end observability platform for connected devices. Prior to Memfault, François worked as an embedded software engineer at Oculus and Pebble.