By Chad Cox

Production Editor

Embedded Computing Design

By Tiera Oliver

Associate Editor

Embedded Computing Design

By Brandon Lewis

Editor-in-Chief

Embedded Computing Design

July 02, 2021

Story

MLPerf Tiny Inference Benchmark Lays Foundation for TinyML Technology Evaluation, Commercialization

The speed with which edge AI ecosystems like TinyML are evolving has made standardization difficult, much less the creation of performance and resource utilization benchmarks that could simplify technology evaluation. Edge AI benchmarks would be hugely beneficial to the ML industry as they could help accelerate solution comparison, selection, and the productization process.

But standing in the way of this is the fundamentally distributed nature of the edge and the varied applications and systems that reside there, which mean a benchmark of any value must account for:

Hardware heterogeneity ranging from general-purpose MCUs and processors to novel accelerators and emerging memory technologies that are commonplace in the TinyML ecosystem.
Software Heterogeneity varying wildly across TinyML systems that often use their own inference stacks and deployment toolchains.
Cross-Product Support, as the heterogeneity mentioned previously means that interchangeable components can and are being used at every level of TinyML stacks.
Low power by profiling device/system power consumption and energy efficiency via a power analysis mechanism that considers factors like chip peripherals and any underlying firmware.
Limited memory within different devices with different resource constraints, which in the case of edge AI is usually under a gigabyte.

In an effort to overcome these barriers, MLCommons, the organization behind the popular MLPerf family of benchmarks AI training and inferencing benchmarks, recently released version 0.5 of the MLPerf Tiny benchmark. It’s an open-source, system-level Inferencing benchmark designed to measure how quickly, accurately, and power-efficiently resource-constrained embedded technologies can execute trained neural networks of 100 kB or less

Inside the MLPerf Tiny Edge Inferencing Benchmark

Developed in collaboration with EEMBC, the Embedded Microprocessor Benchmark Consortium, this iteration of MLPerf Tiny Inference consists of four separate tasks for measuring the latency and accuracy or power consumption of an ML technology:

Keyword Spotting (KWS) uses a neural network that detects keywords from a spectrogram
Visual Wake Words (VWW) is a binary image classification task for determining the presence of a person in an image
Tiny Image Classification (IC) is a small image classification benchmark with 10 classes
Anomaly Detection (AD) uses a neural network to identify abnormalities in machine operating sounds

These tasks are presented in four different scenarios that an edge device may encounter or be deployed in, namely single-stream queries, multiple-stream queries, server configuration, or offline mode. Each scenario requires approximately 60 seconds to complete, and some have latency constraints.

Figure 1. The MLPerf Tiny inferencing benchmark v0.5 presents each of the tasks in four different deployment scenarios. (Source: ML Commons)

This combination of tasks and scenarios make it possible to analyze sensors, ML applications, ML datasets, ML models, training frameworks, graph formats, inference frameworks, libraries, operating systems, and hardware components. This is possible thanks to multi-layered test suites that look at the rational, dataset, model, and quality targets (usually a measure of accuracy when executing the data set and model).

Figure 2. The MLPerf Tiny inference benchmark test suite permits the evaluation of the end-to-end edge ML stack. (Source: ML Commons)

The test suite procedure is as follows:

Latency – The latency measurement is performed five times in the following order:
1. Download the input stimulus,
2. Load the tensor and converting the data as needed,
3. Run the inference for a minimum of 10 seconds and over 10 iterations
4. Measure the inferences per second (IPS)
  The median IPS of the five runs is reported as the latency score.
Energy – The energy test is identical to latency, but measures of the total energy used during the compute timing window
Accuracy – A single inference is performed on the entire set of validation inputs, which vary depending on the model. The output tensor probabilities are then collected to calculate the percentage score.

Modular, Open and Closed

Of course, there are also limitations around the MLPerf Tiny benchmark in the form of run rules that ensure components are analyzed accurately and reproducibly. The run rules are established via a modular benchmark design that addresses the end-to-end ML stack, as well as two divisions that permit different types of analysis.

Modular design allows hardware and software users to target specific components of the pipeline, like quantization, or complete solutions. Each benchmark within the TinlyML suite has a reference implementation that contains training scripts, a hardware platform and more to provide a baseline result that can be modified by a submitter to show the performance of a single component.

Closed and Open divisions are more strict and more flexible, respectively, in the submissions they accept. The closed division offers a more direct comparison of systems whereas the open division provides a broader scope that allows submitters to demonstrate performance, energy, and/or accuracy improvements in any stage of the ML pipeline. The open division also allows submitters to change the model, training scripts, and dataset.

Figure 3. MLPerf Tiny’s two divisions provide a flexible way to test edge ML components against each other and a generic reference implementation. (Source: ML Commons)

The MLPerf Tiny inferencing benchmark rules are available on Github.

The first batch of submissions has already been published. It includes entries from Latent AI, Peng Cheng Laboratory, Syntiant and hls4ml, all of whom except hls4ml submitted to the Closed division.

In the Closed Division:

Latent AI submitted its Latent AI Efficient Inference Platform (LEIP) software development kit (SDK) for deep learning, which it executed on a Raspberry Pi 4.
Syntiant submitted its NDP120 neural decision processor equipped with the company’s Syntiant Core 2 deep learning accelerator and an Arm Cortex-M0 that executed TensorFlow and the Syntiant training and software development kits.
Peng Cheng Laboratory ran a modified version of TensorFlowLite for Microcontrollers (v2.3.1) on its PCL Scepu02 containing an open-source RISC-V RV32IMA core with floating-point unit.
Editor’s note: The Closed Division benchmark reference was submitted by Harvard: an ST Nucleo-L4R5ZI that utilizes an Arm Cortex-M4 and FPU to execute TensorFlowLite for Microcontrollers. The reference implementations can be found on Github.

Figure 4. The MLPerf Tiny inferencing benchmark reference implementation is based on an STMicroelectronics Nucleo-L4R5ZI. (Source: ML Commons)

In the Open Division:

hls4ml submitted its python package for machine learning inferencing on FPGAs, electing to run it on a Xilinx Pynq-Z2 device with dual-core Arm Cortex-A9 MPCores and a Xilinx Z-7020 accelerator.

Measured on latency and energy consumption, these ML stack combinations ran the Visual Wake Word, Image Classification, Keyword Spotting, and Anomaly Detection workloads described in Table 1.

Task	Visual Wake Words	Image Classification	Keyword Spotting	Anomaly Detection
Data	Visual Wake Words Dataset	CIFAR-10	Google Speech Commands	ToyADMOS (ToyCar)
Model	MobileNetV1 (0.25x)	ResNet-V1	DSCNN	FC AutoEncoder
Accuracy	80% (top 1)	85% (top 1)	90% (top 1)	0.85 (AUC)

Table 1. Submitters to the MLPerf Tiny v0.5 inferencing benchmark put their solutions up against these workloads. (Source: ML Commons)

Below are the results for each entrant:

Harvard (Reference)
- o Visual Wake Word Latency: 603.14 ms
- o Image Classification Latency: 704.23 ms
- o Keyword Spotting Latency: 181.92 ms
- o Anomaly Detection Latency: 10.40 ms
Latent AI LEIP Framework
- Visual Wake Word Latency: 3.175 ms (avg)
- Image Classification Latency: 1.19 ms (avg)
- Keyword Spotting Latency: .405 ms (avg)
- Anomaly Detection Latency: .18 ms (avg)
Peng Cheng Laboratory:
- Visual Wake Word Latency: 846.74 ms
- Image Classification Latency: 1239.16 ms
- Keyword Spotting Latency: 325.63 ms
- Anomaly Detection Latency: 13.65 ms
Syntiant:
- Keyword Spotting Latency: 5.95 ms
hls4ml:
- Image Classification Latency: 7.9 ms
- Image Classification Accuracy: 77%
- Anomaly Detection Latency: 0.096 ms
- Anomaly Detection Accuracy: 82%

Editor’s note: An expanded table containing the results can be found here: https://mlcommons.org/en/inference-tiny-05/

New Classes of Edge AI

The MLPerf Tiny inferencing benchmark is a step in the right direction for the commercialization of edge AI technology and the new classes of applications it will bring. A product of collaboration between more than 50 organizations throughout industry and academia, the benchmark provide a fair measure of component and system-level ML technologies with room to expand into other applications and higher-order benchmarks like MLPerf Inference Mobile, Edge, and Data Center.

For more information or to submit your results to the MLPerf Tiny inference benchmark, visit https://mlcommons.org/en.

Chad Cox. Production Editor, Embedded Computing Design, has responsibilities that include handling the news cycle, newsletters, social media, and advertising. Chad graduated from the University of Cincinnati with a B.A. in Cultural and Analytical Literature.

Embedded Computing Design

MLPerf Tiny Inference Benchmark Lays Foundation for TinyML Technology Evaluation, Commercialization

By Chad Cox

By Tiera Oliver

By Brandon Lewis

Inside the MLPerf Tiny Edge Inferencing Benchmark

Modular, Open and Closed

New Classes of Edge AI

Categories

AI & Machine Learning - AI Development Tools & Frameworks

AI & Machine Learning - AI Logic Devices & Workload Acceleration

AI & Machine Learning - Computer Vision & Speech Processing

Debug & Test - Code Analysis Tools

IoT - Edge Computing

Trending Articles

The Road to NVIDIA GTC: Vecow Presents Latest Jetson-Powered Solutions for Diverse Applications

embedded world 2025 Best-in-Show: Winners

embedded world 2025 Best-in-Show: Honorable Mention

Don't Swap SWaP for RF Bandwidth. Get Both with AMD Versal Adaptive SoCs

The Road to embedded world: Experience AI, Industrial Automation, and More with Microchip

Debug & Test

The Road to embedded world: Linaro’s ONELab Enhances Cloud-Native Edge Readiness

IoT

embedded world 2025: APLEX Highlighted AI-Driven Edge Computing and Industrial HMIs

Storage

The Road to embedded world: BIWIN’s Mini SSD Brings Lightning-Fast Speeds and Rugged Durability

HPC/Datacenters

CES 2025: DFI and DEEPX Pioneer Low-Power, High-Performance AI for Smart Cities