//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>
AiM Future, a spinout from LG Electronics, is commercializing the Korean consumer giant’s AI acceleration IP for applications as diverse as consumer electronics, robotics and automotive. The IP is designed for multi-modal operation, running many different AI models at once. The current hardware generation also supports training at the edge, and future generations aim to extend efficient scaling above the current tens of TOPS range.
LG spun out its semiconductor and IP divisions in 2020, prior to exiting the mobile phone market in 2021. AiM’s founders were LG employees working on the company’s AI accelerator IP in its Silicon Valley lab at the time; they were able to raise a seed round of $1.8 million and will be closing a $6.1 million-Series A shortly. LG is a lead investor, and will be a potential customer for AiM when it commercializes the IP it’s developed since spinning out later this year, AiM Future CEO ChangSoo Kim told EE Times.
AiM’s AI accelerator IP, NeuroMosaic, has been under development since 2015, with earlier generations of NeuroMosaic already shipping in LG robot vacuum cleaners and washing machines. AiM has an exclusive license on the patents and IP already developed, and is working on future generations of the tech. AiM has sold three commercial licenses so far, all to Korean companies, including at least one automotive company, Kim said.
NeuroMosaic is a multi-modal, scalable design that suits image, video, audio and time-series sensor data processing, and sensor fusion.
“The architecture was meant to be general-purpose…. It was driven by end products that were LG-centric,” Bob Allen, VP of marketing and business development at AiM Future, told EE Times. “But they’re all products that fit within the envelope of what we commonly call IoT.”
This might include something like autonomous robots, which might have multiple sensor types, or AR glasses that need to process data from different sensor modalities within a strict time window.
“There were definite use cases where the engine would need to be doing multiple things at the same time,” Allen said. “This multi-modal, multi-tasking–oriented architecture [was designed] to tackle multiple different types of applications running simultaneously, to be able to provide the best type of user experience.”
NeuralMosaic was designed to offer concurrent, independent multi-task operation, even when concurrent tasks are cascading. For example, keyword spotting might be used to trigger speech recognition, which might be used to trigger another task.
“Nobody else is really providing this kind of capability in their architecture by default,” Allen said, adding that using separate accelerator chips for these types of tasks would escalate complexity, cost and power consumption.
NeuroMosaic’s design has 32-GOPS NPU cores, with 16 cores in a 512-GOPS tile. Small workloads like object detection in a washing machine might need only one core, a robot vacuum cleaner with continuous learning might need 32 cores, smart TV super-resolution might need 128-256 cores, and bigger ADAS applications might need 320-512 cores, Allen said.
NeuroMosaic NPU cores can work independently, with software managing on-the-fly task assignment. AiM’s SDK can simulate workloads to understand how many cores are required to get a result within a certain time.
“[Task assignment] can be done in software, and it can be changed depending on the operations that we are continuously executing, so it can change on the fly,” Kim said. “We already know how much performance is required for each task, and we can allocate fewer cores, but then it takes longer.”
Tiles and tilelets
The NeuroMosaic tile supports up to 16 tilelets, each with a single NPU core plus some memory and a small RISC-V processor for the NPU to fall back on.
“If a customer selects our IP today, their chip might be fabricated in one or two years,” Kim said. “During the next two years, there could be a lot of changes in optimization and advancements will be made to neural networks. Some neural networks may not have layers and functions our IP can support. In that case, the RISC-V core will be engaged to execute that with a software approach.”
Three generations of NeuroMosaic are available today: generations 2.5, 3.0 and 4.0-EL.
Generation 2.5 is silicon-proven; it offers INT16 or INT8 inference and the architecture supports fused layer calculations.
Generation 3.0 adds a shared tile memory, supports synchronous operation and improves memory bandwidth. The memory matrix within each tilelet can be used for inputs, weights, and operation outputs, but a bigger shared tile memory can hold larger amounts of data coming from off-chip DDR for use by the cores. AiM has verified designs with tile shared memory between 64-512 Kbytes.
Generation 4.0-EL supports INT8 and BF16, meaning it’s possible to implement a level of on-device training.
“We think in the future training requirements will be coming to the edge, because of many reasons, including security and latency,” Kim said, adding that while FP32 is too expensive to implement at the edge, BF16 training offers a balance between accuracy and silicon area. “Tilelets occupy about 20% of the whole area of the tile… adding training capability to tilelets grows them, but because tilelets are not that big, even if the NPU grows 20%, the overall area increase is not that big.”
Included is support for quantization-aware training, a training scheme that simulates lower precision inference on the forward pass, introducing quantization errors during training to improve the trained model’s robustness to eventual quantization.
In practice, edge training on AiM-powered chips will usually be limited to fine-tuning, or re-training of final neural network layers. Allen’s example is a robot vacuum cleaner that learns to recognize specific items of furniture in the home, training itself during the 20 hours a day it’s not in operation.
4.0-EL also adds a softmax accelerator, per tile.
“Softmax would require a lot of computation in the CPU–because it’s a series operation, it’s difficult to accelerate in [the NPU],” Kim said. “We found that running the softmax layer in the CPU took thousands of cycles, but with our hardware accelerator, performance improved ten times. It takes some area on the chip, but it’s up to the customer if they want to do softmax in hardware to improve the overall performance by sacrificing some area.”
All the tile’s functional blocks can be configured per the customer’s requirements, including the blocks in the tilelet. AiM has also created pre-configured versions of the IP in small (up to 512 GOPS), medium (up to 4 TOPS) and large (up to 16 TOPS) versions, which can be applied to any generation of the architecture (generation 2.5 versions of pre-configured systems are silicon-proven today, while 4.0-EL versions have been fully verified).
While there is no reason multi-tile designs can’t scale beyond 16 TOPS, in practice, designs bigger than tens of TOPS are outside the sweet spot and not as efficient in terms of silicon area, Kim said.
AiM’s software stack, NeuroMosaic Studio, offers a complete developer flow today, though some features are still under development.
The current version includes a converter, which optimizes and compresses models for NeuroMosaic’s hardware architecture, and quantizes them. There’s also a mapper, compiler, simulator and a model zoo.
The next version of NeuroMosaic Studio, due next quarter, will add support for edge training, RNNs and LSTMs, and additional computer vision models.
AiM has big plans for 2023. This includes multi-project wafer runs to demonstrate generations 3.0 and 4.0-EL in silicon, as well as software updates to include support for edge training.
AiM is also working on a generation 5.0 of the NeuroMosaic hardware architecture, currently up and running on FPGAs.
This next-gen architecture is designed to enable efficient scaling above tens of TOPS, and promises to reduce silicon area by 25% and power by 50% (compared with generation 2.5 under the same process technology). Generation 5.0 is expected to launch next quarter.