How Zigbee, Thread, and Bluetooth Mesh stack up in performance benchmarking
April 12, 2018
Story
Regardless of protocol, IoT networks must be robust. This can be quantified by measuring throughput, latency, and reliability. These measurements depend on system and installation requirements.
Numerous wireless protocol options are available to developers of smart home and building automation products. Zigbee, Z-Wave, and proprietary wireless controls dominate these markets today, and new entrants include Thread and Bluetooth mesh. While Bluetooth low energy (BLE) and Wi-Fi are also popular in these markets, they do not support mesh networks. Regardless of the underlying protocol, deployed networks for the IoT must be robust, and this robustness can be quantified by measuring throughput, latency, and reliability. These measurements depend on installation size and other system-level requirements.
“One size does not fit all” when it comes to mesh networking protocols. Each wireless protocol presents unique characteristics and advantages, depending on the use case and end application. Understanding the inner workings of mesh technology goes beyond a list of key features. More importantly, developers need to understand how these network protocols perform in the key areas of power consumption, throughput, latency, scalability, security, and Internet Protocol (IP) connectivity. Zigbee, Thread, and Bluetooth mesh are all designed differently from the ground up, and how each mesh is implemented can have an impact on system performance and robustness.
Wireless connectivity options galore
Wireless system-on-chip (SoC) devices have become cost effective enough to be added to countless “things” that provide convenience, safety, and comfort to our daily lives. A “thing” becomes an IoT device when wireless connectivity is added. Many of today’s IoT devices were previously things that didn’t have wireless Internet connectivity. Changing regulations and consumer expectations are forcing product manufacturers to add wireless connectivity to a myriad of products and systems to stay competitive or create the potential for new revenue streams. When developers choose to build IoT devices, they must consider how the end product is used and the ecosystem in which these products will operate.
Types of wireless networks
Two basic topologies exist among the many competing IoT wireless technologies: mesh and star (Figure 1). Mesh is often preferred over star networks in home and building automation due to its ability to scale to numerous nodes and cover long distances. Star networks rely on a point-to-point connection between an end node and a central device. If the environment changes after the network is installed, a star network can fail. Mesh, on the other hand, is distributed and self-healing. If the environment changes or a node fails after the network is deployed, the mesh network can heal itself.
Which network is best for home and building automation?
Zigbee is commonly used in building and home automation. More recently, Thread and Bluetooth mesh are being considered for these applications. Z-wave is another mesh technology that is also popular in smart home and home security applications. We did not include Z-Wave in our initial mesh performance analysis because it focuses on protocol comparisons in the 2.4 GHz band using Silicon Labs’ Wireless Gecko SoCs as the device platform for testing. Furthermore, we lacked access to a comparable test network to verify Z-Wave results at the time of testing.
Home and building automation includes a combination of energy harvesting devices, battery-powered devices, and line-powered devices. Lighting and thermostats are typically line powered because they are part of the infrastructure, but that doesn’t mean power consumption can be ignored. Devices that are part of the infrastructure and are AC-powered must be managed carefully due to new government regulations limiting “vampire power.” Batteries usually power remote sensors and control elements. That means the mesh must comprehend two fundamentally different use cases from a power perspective.
Use cases
There are many potential use cases for mesh networking in home and building automation.
Comfort
Consider, for example, lighting and environmental control in a theater or museum. These installations usually have hundreds to thousands of nodes. The lights, motors for curtains, and blinds need to be controlled in a precise and choreographed way. All the lights need to dim simultaneously, and the motors controlling the curtains should all work in concert. Slight differences are noticeable and would detract from the experience of the audience.
The home has similar requirements. If you are creating a scene with lights and window shades, the user expects a seamless and choreographed experience where all lights dim simultaneously and all window shades move in unison.
Safety
An industrial environment like a warehouse may have different lighting needs than a theater. Often, the lights in a section are turned on simultaneously. However, it doesn’t really matter if those lights turn on together or if it takes a few seconds for all of them to illuminate. The user experience and expectation are different. On the other hand, if certain lights need to turn on quickly due to a power outage, suddenly time does matter.
Convenience
A developer may want to add additional services to wirelessly controlled lights in a warehouse, for instance. It may not matter if every light turns on in unison in the installation. However, it could matter how robust the network is if the developer wants to add additional services.
A service that is gaining popularity in mesh installations is asset tracking. In this instance, the designer relies on the control network to also transmit data about the assets being tracked by the installed infrastructure. In this example, throughput and latency matter in terms of how quickly the asset information will propagate through the network.
Which mesh protocol is best?
There is no simple answer. Fundamental architectural differences exist between Zigbee, Thread, and Bluetooth mesh. Zigbee and Thread can use flooding when required but generally use a routing mesh to minimize network overhead that can interfere with messaging. Bluetooth mesh uses a flooding mesh but allows configuration of the devices to act as routers to reduce the impact of the flooding. The Bluetooth Special Interest Group (SIG) calls this “managed flooding.”
Zigbee and Thread networks include routing nodes and end nodes. The routing nodes are usually line powered and serve as the backbone to the mesh. The end nodes are normally battery powered, operating on the periphery of the mesh, and use routers to relay messages for them. The routing table is established when the mesh is created. The routing table is a directory of sorts that tells each device how to communicate to other devices in the mesh. In this manner, one node can efficiently communicate to another node by sending messages in a precise route through the mesh. This has a positive effect on the throughput of the mesh and can reduce latency as the mesh grows.
A routing mesh is historically preferred to a flooding mesh because it provides more efficient communications and predictable performance. On the other hand, routing is more difficult to implement for the developers of the stack.
Packet structure
Zigbee and Thread packet structure
Both Zigbee and Thread use IEEE 802.15.4 with 127-byte packets and an underlying data rate of 250 kbps. While the PHY headers are the same, the packet structure is different, resulting in slightly different payload sizes. The Zigbee packet format is shown in Figure 2 and results in a 68-byte payload. For payloads above 68 bytes, Zigbee fragments into multiple packets. Thread packet format is shown in Figure 3, and results in a 63-byte payload. For payloads above 63 bytes, the Thread stack fragments using 6LoWPAN. Silicon Labs’ mesh performance data is based on payload size as this is the design parameter of concern when building an application.
Each of these networks fragments larger messages into smaller ones. For Zigbee, fragmentation occurs at the application layer and is performed end-to-end from the source to the destination. For Thread, the fragmentation is done at the 6LoWPAN layer, as well as from source to destination.
For unicast forwarding within these networks, the message is forwarded as soon as the device is ready to send. For multicast forwarding, there are networking requirements for how messages are forwarded:
- For Zigbee devices, a multicast message is forwarded by a device only after jitter of up to 64 milliseconds occurs. However, the initiating device has a gap of 500 milliseconds before retransmitting the initial message.
- RFC 7731 MPL forwarding is used for Thread devices. The trickle timer is set to 64 milliseconds so the devices back off a random amount up to this time before retransmitting.
Bluetooth LE Packet Structure
Bluetooth low energy has the following packet structure to minimize time on air and energy consumption. Bluetooth mesh further refined this packet structure to add the mesh and security capabilities.
This means Bluetooth mesh has only 12 or 16 bytes available for payload, and beyond this, the packets are segmented into individual packets and reassembled at the destination. This segmented packet carries a header identifying the segment and 12 bytes of application payload except for the last segment, which can be shorter. However, additional back off requirements in the Bluetooth mesh specification space out these segmented packets, increasing latency and decreasing throughput. As all of our throughput and latency analysis is based on application payload, we can see that Bluetooth mesh will require more packets than Zigbee or Thread because of this lower packet payload size.
Routing versus flooding mesh
Zigbee, Thread, and Bluetooth mesh were designed for home and building automation. Zigbee supports several routing techniques, including flooding of the mesh for route discovery or group messages; next-hop routing for controlled messages in the mesh; and many-to-one routing to a gateway, which then uses source routing out to devices. It is normal for a Zigbee network to use all of these methods simultaneously.
Thread also supports next-hop routing as well as flooding. However, Thread networks maintain next-hop routes to all routers as part of normal network maintenance instead of a device performing route discovery. Thread also minimizes the number of active routers to address scalability to large networks. Previously, this has been viewed as a limitation for embedded 802.15.4 networks because the network flooding in the presence of a large number of routers limited the frequency and reliability of multicast traffic. Note that the thread network manages the number and spacing of active routers, and user intervention or management is not required.
Bluetooth mesh supports managed flooding. This is a slight spin on a flooding mesh in that the user can designate which powered devices participate in the flooding. This will reduce the impact of flooding but requires the user to determine the appropriate density and topology for routers in their network , which can be difficult. As network conditions change over time, which devices participate in the flood may also need to change and this would require user intervention.
Bluetooth also has end devices similar to Zigbee or Thread called “friendship” devices. A friendship device is coupled with an adjacent powered node, and packets for the friend are stored by the line-powered node. The friend will wake periodically to ask its neighbor if there are any packets. The powered node only saves the packet for a defined period of time so the “friend” needs to check in with its paired relay node.
Our study of mesh topologies analyzes both small and large networks. These networks can behave very differently, and the routing and management techniques often need to change when considering a 10-node network or a 200-node network.
Typically, in a small network, devices are within one or two hops and very simple routing or flooding can be suitable. As the network grows in size, it adds complexity such as more hops between devices; density of devices, which may interfere with each other when sending messages; and more concerns over latency and reliability. If a flood type message is used to turn on 100 lights, it is normally not acceptable for 98 or 99 of the 100 lights to turn on or off. This type of problem is rare in a 10-node network but may become common in a 100-node network.
Figures of merit
In the previously cited use cases, a designer will desire a robust network for the application. The figures of merit to be measured in assessing the robustness of a network are throughput, latency, and reliability. These three measurements can accurately predict the robustness of a network for a given installation.
- Throughput defines the scalability of the network (how many devices can be sending normal traffic), as well as the behavior for higher data operations such as pushing a firmware update to devices.
- Latency describes how long it takes for an action to happen. It is a critical parameter for any interaction involving end users (as opposed to machine-to-machine communications), as most people can detect operations that take longer than 100 milliseconds. For processes where simultaneous operation is desired, such as turning on multiple lights, the timing must be lower than 100 ms so that end users do not complain of a “popcorn” effect as lights turn on in succession.
- Reliability is taken for granted, but when interacting with everyday devices such as lights and switches, users expect nearly 100 percent reliability. As a matter of practice, Silicon Labs tests to 99.999 percent reliability. These are the most critical aspects of the mesh network to measure and strongly relate to the design goals for devices and wireless systems, no matter what underlying wireless technology is used.
Test setup
To minimize the variability of device testing, the test can be performed in fixed topologies where the RF paths are wired together through splitters and attenuators to ensure the topology does not change over time and testing. This is used for the seven-hop testing to ensure the network topology. MAC filtering can also be used to achieve the network topology.
Large network testing is best conducted in an open-air environment where device behavior is based on existing and varying RF conditions. The Silicon Labs lab in Boston, MA is used for this open-air testing process.
The wireless conditions in the open-air testing environment have typical Wi-Fi and Zigbee traffic present as noise. This is not part of the test network and is used as a typical building control system independent of any tests being performed.
Figure 6 shows the average latency per hop for a Thread network versus Bluetooth mesh unsegmented and segmented packets. Zigbee data is not included as it is similar to Thread. In this example, we can see for these smaller payloads the Bluetooth unsegmented and Thread latency is very similar out to six hops. As we add the Bluetooth segmented packet and increase the payload to 16 bytes, the latency increases substantially due to the additional packets being transmitted.
Looking at four-hop data with increasing payload as shown in Figure 7, Bluetooth mesh has higher latency as it has to use segmented messages. This shows the importance of Bluetooth mesh devices trying to keep payloads within one packet to avoid this increase latency in applications where it is an important factor.
Conclusion
The choice of mesh network depends on the end application or ecosystem. There are many established ecosystems such as Philips Hue, Amazon Echo Plus, and Comcast Xfinity. If a device manufacturer wants to interoperate with these ecosystems, Zigbee is an optimal choice. If the ecosystem has not been specified for the application, then many other protocol choices are available.
Thread and Bluetooth mesh are both viable options, and the most commonly considered aside from Zigbee. Development tools provided by the IC vendor matter greatly in terms of how quickly a mesh network can be developed. Tools such as packet tracing and multi-node energy profiling can ensure the chosen mesh network is robustly designed. Ultimately, the network size, required latency, desired throughput and overall reliability will drive the choice of mesh protocols.
Silicon Labs’ mesh performance test results are available now at www.silabs.com/mesh-performance.
Tom Pannell is Senior Director of IoT Marketing at Silicon Labs.
Silicon Laboratories
LinkedIn: www.linkedin.com/company-beta/165971/
Facebook: www.facebook.com/siliconlabs
Google+: plus.google.com/117130120420400445098
YouTube: www.youtube.com/user/ViralSilabs