Troubleshoot data corruption
April 14, 2016
Blog
In the world of embedded computing, data-related failures are unfortunately part of being in business. Even with the right hardware, software, and dev...
In the world of embedded computing, data-related failures are unfortunately part of being in business. Even with the right hardware, software, and development, frustrating and costly failures can occur. But when issues do surface, many companies don’t possess the right tools to troubleshoot the challenges they run into.
As experts in making flash data storage reliable in embedded systems, companies often call upon Datalight to figure out what went wrong and how to fix it. Over the years, we’ve developed a robust set of tools to diagnose data issues, oftentimes even without the benefit of a reproducible case.
Flash failures are often complex, making it difficult to discern if the problem resides in the file system, flash driver, or hardware. And almost every corruption we see is unique. The issue could result from power failures, or the system may have experienced bit flips caused by over programming or read disturb.
As we’ve investigated more failures, patterns have emerged and we were able to devise several different tools and methods to diagnose errors. Errors are rarely identified quickly. Rather, the process tends to be trial and error to eliminate possible causes one at a time instead of landing directly on the root cause.
Recently, we solved an issue for an automotive client that was seeing corruption in images. The customer had grouped them into nearly a dozen different “symptom buckets” that seemed to be unrelated. Our engineers identified patterns that enabled them to combine symptoms, reducing them into a few suspected root cause buckets, one of which led to a solution.
Datalight’s Reliance family of file systems simplifies the diagnosing of hardware failures. Because it never overwrites live data, we can replay exactly what the system was writing and locate the bad sector. Plus, we can examine erase counts on the flash media. This combination usually enables us to identify hardware errors.
For an in-depth discussion of this topic, download the whitepaper Troubleshooting Data Corruption on NAND Flash Memory.
Kerri McConnell is a vice president at Datalight. She has over two decades of technology marketing and product management experience and has learned the importance of listening to customers to build better products. Known as the “queen of analogies,” Kerri excels in translating technical topics into laymen’s terms and finding the business benefit to be gained from product features.