//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>
Ambiq Micro is the latest microcontroller maker to build its own AI-focused software development kit (SDK). The combination of Ambiq’s Neural Spot AI SDK with its ultra-low power sub-threshold and near-threshold technologies will enable efficient inference: Ambiq’s figures have keyword spotting at less than a milliJoule (mJ). This efficiency will suit IoT devices, especially wearables, which are already a big market for the company.
Artificial intelligence applications on Cortex-M devices require specialized software stacks over and above what’s available with open-source frameworks, such as TensorFlow Lite for Microcontrollers, since there are so many challenges involved in fine-tuning performance, Carlos Morales, Ambiq Micro’s VP of AI, told EE Times.
“[Arm’s CMSIS-NN] has optimized kernels that use [Arm’s cores] really well, but getting the data in and moving it to the next layer means there are a lot of transformations that happen, and [Arm] has to be general about that,” he said. “If you carefully design your datapath, you don’t have to do those transformations, you can just rip out the middle of those things and just call them one by one–and that gets very efficient.”
Neural Spot’s libraries are based on an optimized version of CMSIS-NN, with added features for fast Fourier transforms (FFTs), among others. Morales points out that, unlike cloud AI, embedded AI is focused in large part on about a dozen classes of models, so it’s an easier subset to optimize for.
“A voice-activity detector running in TensorFlow would be terrible, you’d just be spending all your time loading tensors back and forth. But you write it [at a lower level], and suddenly you’re doing it in two or three milliseconds, which is great,” he said.
Further headaches include mismatches between Python and the C/C++ code that runs on embedded devices.
“We created a set of tools that let you treat your embedded device as if it were part of Python,” Morales said. “We use remote procedure calls from inside your Python model to execute it on the eval board.”
Remote procedure calls enable easy comparison of, for example, Python’s feature extractor or Mel spectrogram calculator to what’s running on the eval board (a Mel spectrogram is a representation of audio data used in audio processing).
Neural Spot includes an open-source model zoo with health (ECG classifier) and speech detection/processing examples. Speech processing includes models for voice activity detection, keyword detection and speech to intent. Ambiq is working on AI models for speech enhancement (background noise cancellation) and computer vision models, including person detection and object classification.
The Neural Spot AI SDK is built on Ambiq Suite—Ambiq’s libraries for controlling power and memory configurations, communicating with sensors and managing SoC peripherals. Neural Spot simplifies these configuration options using presets for AI developers who may not be familiar with sub-threshold hardware.
The new SDK is designed for all fourth-generation Apollo chips, but the Apollo4 Plus SoC is particularly well suited for always-on AI applications, Morales said. It features an Arm Cortex-M4 core with 2 MB embedded MRAM, and 2.75 MB SRAM. There’s also a graphics accelerator, two MIPI lanes, and some family members have Bluetooth Low Energy radios.
Current consumption for the Apollo4 Plus is as low as 4 μA/MHz when executing from MRAM, and there are advanced deep sleep modes. With such low power consumption, he said, “suddenly you can do a lot more things,” when running AI in resource-constrained environment.
“There are a lot of compromises you have to make, for example, reducing precision, or making shallower models because of latency or power requirements…all that stuff you’re stripping out because you want to stay in the power budget, you can put back in,” Morales added.
He also pointed out that while AI acceleration is important to saving power, other parts of the data pipeline are just as important, including sensing data, analog-to-digital conversion and moving data around memory: Collecting audio data, for example, might take several seconds while inference is complete in tens of milliseconds. Data collection might thus account for the majority of the power usage.
Ambiq compared internal power measurements for the Apollo4 Plus running benchmarks from MLPerf Tiny, with published results for other microcontrollers. Ambiq’s figures for the Apollo4 Plus have the energy consumption (µJ/inference) at roughly 8 to 13× lower, compared with another Cortex-M4 device. The keyword-spotting inference benchmark used less than a milliJoule, and person detection used less than 2 mJ.
Ambiq achieves such low power operation using sub-threshold and near-threshold operation. While big power savings are possible using sub-threshold voltages, it is not straightforward, Scott Hanson, founder and CTO of Ambiq Micro, told EE Times in an earlier interview.
“At its surface, sub-threshold and near-threshold operation are quite simple: You’re just dialing down the voltage. Seemingly, anybody could do that, but it turns out that it’s, in fact, quite difficult,” he said. “When you turn down voltage into the near-threshold or sub-threshold range, you end up with huge sensitivities to temperature, to process, to voltage, and so it becomes very difficult to deploy conventional design techniques.”
Ambiq’s secret sauce is in how the company mitigates for these variables.
“When faced with temperature and process variations, it’s critical to center a supply voltage at a value that can compensate for those temperature and process fluctuations, so we have a unique way of regulating voltage across process and temperature that that allows subthreshold and near-threshold operations to be reliable and robust,” Hanson said.
Ambiq’s technology platform, Spot, uses “50 or 100” design techniques to deal with this, with techniques spanning analog, digital and memory design. Most of these techniques are at the circuit level; many classic building block circuits, including examples like the bandgap reference circuit, don’t work when running in subthreshold mode and require re-engineering by Ambiq. Other challenges include how to distribute the clock and how to assign voltage domains.
Running at lower voltage does come with a tradeoff: Designs have to run slower. That’s why, Hanson said, Ambiq started by applying its sub-threshold ideas in the embedded space. Twenty-four or 48 MHz was initially sufficient for ultra-low power wearables, where Ambiq holds about half the market share today. However, customers quickly increased their clock speed requirements. Ambiq achieved this by introducing more dynamic voltage and frequency scaling (DVFS) operating points—customers run 99% of the time in sub-threshold or near-threshold mode, but when they need a boost in compute, they can increase the voltage to run at higher frequency.
“Over time, you’ll see more DVFS operating points from Ambiq because we want to support really low voltages, medium voltages and high voltages,” Hanson said.
Other items on the technology roadmap for Ambiq include more advanced process nodes, architectural enhancements that increase performance without raising voltage and dedicated MAC accelerators (for both AI inference and filter acceleration).