Add Voice Recognition to Your TV Remote
October 30, 2019
Blog
Voice control is popular in consumer electronics. Adding it with a "wake word" is not possible, and without draining your battery too quickly.
Virtual assistants continue to be integrated into more devices in our homes. Amazon recently announced that it is bringing Alexa to a variety of new devices, including earbuds, glasses, and a ring, giving consumers even more ways to access information. The recurring theme among these new voice-enabled products is that they’re wireless and hands free. The device effortlessly connects to your cell phone, or some other host, and listens patiently for a command to be uttered. The technology under the hood is a Bluetooth RF chipset supporting the wireless connection and specialized embedded processors running the wake word engine (WWE) to recognize voice commands.
Another example of this trend is the candy bar shaped remote control shipped with every new flat screen TV, set-top box, and media player. They too will soon be fully wireless and hands free. Sure, many will still support the old IR line-of-sight mode and a “push-to-talk” button when you want voice control, but those are quickly becoming outdated. Users want a device that seamlessly responds to their commands, not one they have to use like a walkie-talkie. Similar to the recent wave of Amazon gadgets, the next generation of TV remote controls will be wireless and hands free.
However, remote-control design brings some unique challenges. For example, remote controls typically are not rechargeable; they typically run on standard AA batteries. Not only do remotes need to perform well in noisy environments, but they need to instantly transmit information wirelessly to the host device (such as a TV) while being located 3 to 9 ft. from your body.
Furthermore, consumers prefer long-lasting batteries that don’t require replacement for the life of the device. Not surprisingly, a significant portion of customer support phone calls that device manufacturers receive can be resolved by just changing the batteries. Every call that the consumer makes costs the company $30 to $50 depending on how long the call lasts. Essentially the remote control has to perform like a wall-powered Amazon Echo Dot while also being more energy efficient than an in-ear headset.
This challenge, designing powerful, highly power-efficient remotes, calls for both an innovative Bluetooth solution and audio processing solution since one or both of them contribute to battery life.
Using Bluetooth 5.0/LE solves a couple of problems over traditional IR. First, Bluetooth is a standards compliant solution so it’s easy for devices to talk to a large infrastructure of existing Bluetooth devices. Additionally, Bluetooth 5.0/LE provides comparable range to a WiFi device. This is well suited for voice-enabled remotes. Traditional Bluetooth solutions have been optimized for mobile phones and laptops that tend to have larger batteries and the Bluetooth power consumption is not as significant.
These two issues—larger batteries and power consumption—don’t translate well to endpoint devices like remote controls. Companies like Atmosic have innovated the overall solution design by creating a ground up solution focused on consumer end-point devices like remote controls. This design reduces the active power significantly (around 5X) and, as a result, can extend battery life 3X to 5X longer than competing solutions.
In addition to an extremely low-power Bluetooth design, it’s possible to use a secondary wake-up receiver that consumes significantly less power (20X to 50X lower than standard receivers) that puts the entire Bluetooth SoC into deep sleep; the device can be woken up by a special pattern from another host. We won’t go into detail on this technique here since that pertains to a smaller number of specialized remote controls.
The third technique is to utilize energy harvesting (embedded into the Bluetooth SoC) to harvest RF wireless energy to extend battery life. Many homes and buildings have significant RF energy (usually in the ISM bands) that can be harvested while a remote is lying on a table. Depending on the level of energy, a device could harvest tens of microwatts to 1 mW of energy. The goal is to substitute battery power when possible and extend the life of the battery to last a few years, compared to the current six- to nine-month lifespan. For industrial and special-use remotes, additional energy techniques like photo (solar), thermal, and motion energy harvesting could be used as well.
As mentioned earlier, for true hands-free operation, a remote control must simultaneously perform like a smart speaker yet be as energy efficient as an in-ear headset device. Companies like QuickLogic have created highly optimized, ultra-low power companion devices to work with Bluetooth chipsets to address this challenge.
There are essentially three modes for a voice-enabled remote with a Bluetooth connection: standby, wake-word detection, and data transmission mode. Each uses a progressively higher energy mode.
In standby mode, the Bluetooth and companion chips are in sleep waiting for some noise in the ambient surrounding environment to wake them. One of the most energy efficient ways to implement this is with Vesper’s microphone Wake-on-Sound feature which consumes only 10 µA waiting for the ambient noise to exceed a pre-configured threshold dB SPL (Sound Pressure Level). In a typical living room use case, the system can be in this mode as much as 80% of the time.
Once the threshold level is met, an interrupt is fired from the mic and the companion chip wakes; this is when the wake word detection mode is entered. The companion chip’s MCU can spin up and run a WWE for a determined time period to detect whether the keyword has been uttered. Third-party solutions such as Retune DSP’s VoiceSpot WWE can run on a Cortex-M4 utilizing only one microphone and negating the need for a compute-intensive solution with multi-mic adaptive beam forming, which is typically required for mid-field (3 to 9 ft.) voice recognition.
Aside from the obvious MIPS savings, there’s also 400 to 650 µA (active power) saved for every mic removed from the system. If the wake word is detected, it then interrupts and wakes the Bluetooth chip to enter the data transmission mode. This is necessary since the user’s words following the wake word need to be transmitted to the host (TV) in the form of pulse-code modulation (PCM) or compressed data.
If the wake word isn’t detected, the system reverts back into the initial standby mode. Some companion chips, like those from QuickLogic, have dedicated low-power sound detection (LPSD) hardware to reduce the average system power used in the in the wake word detection mode. For example, some sounds like fans have a high dB SPL but obviously aren’t speech. LPSD hardware has enough intelligence to sense this and ignore the sound to avoid the additional power consumption of running the WWE unnecessarily.
Bluetooth 5.0/LE is ideally suited for the data transmission mode since it can transmit data in low-power on-demand data packets. An ideal companion should have enough memory space and processing power to compress voice data prior to sending it to the Bluetooth device. A typical example of this would be to run an Opus encoder configured for a complexity setting of four.
Scott Haylock is the Director, Product Marketing at QuickLogic. He has over 20 years of system-on-a-chip experience, and holds a BSEE degree from Michigan State University.
Srinivas Pattamatta is the Vice President of Business Development at Atmosic Technologies. He too has over 20 years of experience in wireless and other communication technologies. Srinivas earned a Master’s Degree in Electrical Engineering from Oregon State University and an MBA from Santa Clara University.