Notes on IoT Database Management - Part 2
May 07, 2019
Blog
The overall purpose of collecting data on the edge has shifted from purely device control and monitoring to improving various service capabilities through real-time analysis.
In part 1 of this 2-part series, we discussed how the Internet of Things has caused a dramatic increase in the volume of data that embedded database vendors, network infrastructure vendors and physical storage vendors must design their systems to contend with; the implications of the growth in the number of data collection and processing points; and, the evolution from largely standalone embedded database systems to the variety of connectivity requirements of IoT systems.
Shifting Gears
The overall purpose of collecting data on the edge has shifted from purely device control and monitoring to improving various service capabilities through real-time analysis. Artificial intelligence and machine learning are becoming the core of many everyday services. (Try using your bank card at an ATM in Istanbul without notifying the bank first.) However, due to the reasons discussed above, the ability to add sophisticated storage algorithms and analytical capabilities to edge devices’ containers is limited. Instead these tools exist on gateways and back-end servers. Given the volume of data collected and analyzed, specific methodologies geared toward utilization of multi-core architectures and multi-channel storage systems are necessary to address usage requirements. Scalability and near real-time performance are achieved by processing data in parallel through the use of vertical storage layouts, vector processing and pipelining, lockless data structures and algorithms, specialized index algorithms, data aggregation, statistical data analysis, etc.
Often these techniques are implemented as a part of advanced SQL optimizers. Yet data produced by sensors and instruments (such as measurements, multimedia content, etc.) are quite often unstructured and does not fit well in a traditional relational database layout. Storage containers on the edge device should be able to support data presented in non-relational formats like JSON or optimized binary formats and allow nonSQL access to data. At the same time, the database engine should be able to convert the container’s format into the back-end database relational format and provide advanced SQL to optimize access and analytics.
Security
No discussion about data management is complete without covering the topic of security. The industry really needs to move from the IoT to the IoST, the Internet of Secure Things. Let’s not pretend to address IoT security in general though — the topic is far too broad. Likewise, this author cannot pretend to be a security expert. My expertise is in database management systems. While not invincible, the existing internet SSL/TLS technologies do a good job of protecting communications channels between edge nodes and back-end servers. In addition, to secure communications and channel authentication, the integrity of data must be ensured by the database management system, an important component of which is data encryption. Data encryption on the edge node is, in a way, more important than server-side encryption. Back-end database servers are normally run behind sophisticated firewalls and avail themselves of a full network security apparatus: server security and authentication is overseen by security personnel, regulated by strict rules and procedures, etc. In contrast, edge nodes' storage containers are there in the open, often running unattended and more often than not have very limited resources without well-established firewall-like security measures. Yet, tampering with nodes’ data containers has already proven to be harmful. In recent years we have seen cases of spoofing the identity of insecure network nodes. In the end, you don’t really care why your autonomous car drove you into a lake. You are wet, regardless of whether it was the GPS transmitter providing incorrect data, or your car’s system was hacked.
To mitigate these risks, the best database vendors enable use of IoT containers incorporating Federal Information Processing Standard (FIPS) for 140-2, a U.S. government standard that defines cryptographic module security requirements. These encryption modules — some are FIPS 140-2 certified, some are merely compliant - are designed to protect data at rest. All data transmission within the IoT architecture should also be protected by providing the ability to integrate commercial or open source lightweight SSL/TLS modules.
Code Quality
This might be more of an editorial. Years ago, embedded programming was like magic (or hard-core, depending on the viewpoint). Developers had to know the C language and assembler, understand the underlying hardware architecture and how to use registers, interrupts and other “low-level” programming and to be able to use special debugging techniques (ever hear of a JTAG debugger?), and specialized skills. Often, embedded projects had quite long development cycles.
Nowadays, the time-to-market is understandably shorter because of increased competition. Software development is now as any other large industry, where success largely depends on the time-to-market and production pipelines. Software development is also now a common career and the number of developers involved in creating, testing and deploying embedded software – often times on the same project – has skyrocketed. Expertise in programming techniques, and a depth of knowledge that was once mandatory, is no longer necessary. Modern toolsets include simple and easy to understand alternatives to programming in low-level languages such as C/C++ , or Java. Scripting languages, third-party BI tools, GUI-based access tools, and so on, are increasingly common, as is the transfer of developers on and off a project. As a result, the application code deployed on the edge sometimes lacks perfection, and might not be optimized for performance or resource consumption in a way that it could be.
For a responsible database vendor with the unqualified success of a project in mind, this means that in addition to low-level programming languages, database kernels ought to expose interfaces for lightweight, yet capable scripting languages such as LUA or Python. Database systems must provide protection from application errors and in themselves be as error free as possible. Extensive debugging capabilities, including remote access debugging, low-overhead tracing capabilities and adherence to coding standards (e.g. ISO certification or MISRA compliance) play an increasingly large role in the making of database software that is trustworthy and ready for edge deployments.
Summary
The Internet of Things has noticeably changed the landscape for embedded database system vendors. Whereas in the past, embedded systems was something of a black art requiring in-depth knowledge of basic electronics (oscilloscopes, and other fun stuff), interrupt processing, internal hardware architecture elements and assemblers, protocols like UART and CAN, and hardware assisted debugging modern IoT development tools have elevated the development process and made it more accessible. Embedded database systems have had to follow suit and make embedded databases accessible through higher level scripting languages. The connectivity requirements inherent in the IoT, in turn, have mandated that embedded databases also be accessible through the internet. The shift toward analyzing data collection and usage patterns dictates that databases provide robust and convenient means to process data in place, and replicate data upstream for aggregation into artificial intelligence and/or machine learning systems. Accessibility and interconnectivity requirements bring data security to the fore. And because of the ever-widening horizons of the IoT environment these new challenges are no longer something to wish for, but a necessity.
McObject co-founder Andrei Gorine leads the company’s product engineering. As CTO, he has driven the growth of the eXtremeDB real-time embedded database system, from the product’s conception to its current wide usage in virtually all embedded systems market segments. Mr. Gorine’s strong background includes senior positions with leading embedded systems and database software companies; his experience in providing embedded storage solutions in such fields as industrial control, industrial preventative maintenance, satellite and cable television, and telecommunications equipment is highly recognized in the industry. Mr. Gorine has published articles and spoken at many conferences on topics including real-time database systems, high availability, and memory management. Over the course of his career he has participated in both academic and industry research projects in the area of real-time database systems. Mr. Gorine holds a Master’s degree in Computer Science from the Moscow Institute of Electronic Machinery and is a member of IEEE and ACM.