
//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>
The Embedded Vision Summit, May 22-24 in Silicon Valley, provides insight into cutting edge AI at the edge, focused on perceptual AI and machine vision. Here is a preview of EE Times’s top picks from this year’s program.
DeepX
Korean edge AI chip company DeepX will present on its new M1 accelerator chip, its vision for AI everywhere at the edge, and its quantization secret sauce that helps the company maintain GPU-like levels of prediction accuracy at low power and high throughput, DeepX CEO Lokwon Kim told EE Times.
The company has developed a range of edge AI chips focused on vision applications.
“Because we focus on edge applications, almost all applications require vision-based AI,” Kim said. “For edge applications, the most important thing is power, any kind of new functionality cannot go without battery power.”
DeepX’s accelerator is designed for efficiency and throughput, with the upcoming M1 chip expected to be in the 16-18 TOPS/W range when it launches imminently. The company has also demonstrated leading frames per second per TOPS throughput compared to edge competitors.
Part of DeepX’s secret sauce is reducing DRAM access as far as possible, while balancing the appropriate amount of SRAM on-chip for optimal power consumption. This is done partly by reducing model size via quantization and partly with software analyzing the optimal memory access patterns to try to make best reuse of data already in SRAM.
While the chip is focused on accelerating a range of mathematical operations found in popular AI vision algorithms, this is carefully balanced with PPA (power, performance, area). The hardware can cover almost all kinds of activation functions, Kim said, with support for edge-sized transformer networks coming next year.
DeepX’s quantization work has been focused on maintaining GPU-levels of accuracy, where porting to most other types of hardware would incur an accuracy loss.
“We spent a lot of time performing experiments on the accuracy degradation for each data path,” Kim said. “Then we found some critical points that incur accuracy degradation, and we created a new way to maintain the accuracy with multiple innovative ideas…. Our original target was to maintain the accuracy, but we found that in 50% of the algorithms, we can improve accuracy compared to GPUs.”
DeepX will deliver chips sized for various edge markets: The company spoke with some several hundred customers to assess their requirements, and it decided one chip could not cover a meaningful enough market, according to Tim Park, a DeepX marketing specialist.
One-camera applications will use the L1 (2.4 TOPS, quad-core RISC-V, ISP, MIPI and video codec). Systems like drones and robots that need three or four cameras will need the L2 (6.4 TOPS plus the other L1 blocks). The M1 chip supports 10 or so cameras (23 TOPS, dual ARM core, ISP, uses LPDDR4X/5 combination, PCIe gen3 interface).
In the case of a smart factory, there may be thousands of cameras. For that application, DeepX has designed the H1 for rack-scale on-premises installations. A one-rack system of H1s can process data from 10,000 cameras, Park said. The H1 has similar dual ARM core and ISP blocks to the M1, but multi-chip systems can scale up to 22 POPS.
IP licensing is a possibility for the automotive industry, Park added, where the company is already seeing demand from both OEMs and established automotive chip companies.
L1 and L2 will be ready for mass production in Q1 ’24, with M1 coming in Q2 and H1 in Q3. Engineering samples are already with lead customers. A single SDK covers all four chips, with a full version launching at the end of this year.
Sportlogiq
Sportlogiq’s co-founder and CTO Mehrsan Javan will present the company’s computer vision solution for sports analytics—and its plans to scale the technology from the top professional leagues to down-market leagues and youth sports applications. The company wants to democratize sports analytics, and it aims to offer pro-level analytics to leagues with modest resources.
“Sports are good for visual perception: It’s a controlled environment with a limited number of people,” Javan told EE Times. “The vision is to democratize computer vision machine learning tools so that every person who plays any sport, anyone on this planet, can film the game and get some numbers to track their progress, compare themselves to elite athletes and use it as a spotting tool for athlete development.”
The Canadian company started with NHL ice hockey analytics, building systems than can track players and understand their behaviors. This includes tracking, event detection, and semantic analytics to benchmark players and measure performance, look at team tactics and strategies, and overall help teams win. The company has expanded to American football and soccer so far.
In general, leagues and video streaming companies send footage to the cloud where Sportlogiq works with it, or, in the case of the Canadian NHL where real-time analytics is required in real time during the game, they use on-premises servers with dedicated hardware.
Javan said while there is a case to move to edge processing, which would mean the same type of analytics could be produced in real time without having to go to the cloud, this would require access to camera hardware.
“We don’t have access to the facilities,” he said. “We partner with streaming companies, though in that partnership we are encouraging them to upgrade their cameras because some of these streaming companies are using … old hardware and this is time for them to upgrade.”
As cameras are upgraded, Sportlogiq hopes to be involved in that process, Javan said.
Sportlogiq’s models for different sports are conceptually the same, he added—tracking players around the pitch (or ice rink). Models can be tuned with transfer learning, but overall 80-85% of the model is the same for different sports.
Moving to down-market leagues is going to be a challenge, in part because the camera setups are different from top professional leagues in terms of position and quality—and because the variety of camera setups is much greater. Other problems for amateur sports include players wearing the same jersey numbers.
Models are trained with hand-labelled data from a limited number of leagues. One of the challenges is measuring concept drift, in order to know when to add more labelled data, tune the models or change them completely. One of the model types used is ViT (vision transformer) since the attention mechanism means Sportlogiq does not need to specifically encode anything about attention, Javan said, while it also works well for fusing multiple sources of information.
There is “almost no difference” between models for men’s and women’s pro sports, Javan said, noting that research with the University of Toronto and Toronto FC regarding the perception that women’s soccer is slower and there are fewer shots has shown there is almost no difference. However, he added, for lower age groups, the pace of the game is totally different. This means models will need to be adapted.
“We have to work a lot on making models adapted to the different age groups, but that’s the nature of the game,” he said. “But recently, we merged everything together, so there is one model that is used for tracking and event detection across all age groups in hockey, and it’s giving us the same performance as specific models.”
As well as working with lower leagues, Sportlogiq’s roadmap includes working with camera manufacturers and expanding to other sports.
Nauto
Nauto’s edge AI system works with large fleets of vehicles, from Uber drivers to delivery vans to big trucks, to help reduce collisions using AI.
Nauto CEO Stefan Heck told EE Times that Nauto uses AI to understand what is happening both inside and outside the vehicle in real time, recognizing lanes, signs, vehicles and pedestrians. It also monitors drivers for risky behavior, using an aftermarket dashcam-type device mounted in the vehicle.
The device uses cameras facing the road and the driver—along with telematics data, including Doppler GPS for braking and acceleration and map data for speed limits and previous accident data. The data is combined by a multi-modal AI model running 15 times per second on a high-end Qualcomm Snapdragon processor.
Processing at the edge means footage of drivers does not need to be recorded or leave the device for privacy reasons (though it can be recorded if required, perhaps capturing data during a collision or near miss). The device draws power from the car’s systems, and it can plug into the car’s OBD or J-bus port to get speed and other data from the vehicle.

“The reason we picked mid-level compute platform is we want to be accurate enough that we can make a real difference to the safety, but we don’t need to be 100% perfect because we’re not replacing the driver, we’re augmenting the driver with an AI co-pilot,” Heck said.
Fusing multi-modal data together at a low level enables a special type of insight.
“We pioneered this idea of not only traditional sensor fusion like AV companies already do, fusing radar, lidar and camera, but actually fusion at the object level or the meta level,” he said.
The company’s patented SAFER (situational assessment, fusing exponential risks) model combines risk factors from inside and outside the vehicle, including road conditions and driver behavior, to predict and prevent collisions.
“Something like tailgating isn’t particularly dangerous if you’re staring at the car in front of you and paying close attention,” Heck said. “But if you’re looking down at your phone, tailgating is pretty deadly because the lead vehicle brakes and you have no time to.”
If a driver is tailgating, they have about a 20% elevated risk of collision versus a driver who is not tailgating. If they are distracted while driving, the risk is about four times higher. But if they are both tailgating and distracted, it’s 28 times more dangerous than regular driving. Some combinations of behaviors and events can be 1,000 times more risky—taking the average from one collision every 20 million miles to one every 20,000, which for a commercial driver would be twice a year.
Nauto provides audible imminent-collision warnings, and it captures footage and telemetry in the event of a collision (Often, this exonerates delivery drivers, who are usually unfairly blamed, Heck said).
Nauto also gives the driver behavioral guidance in real time so they can self-correct. When a particularly risky situation is identified, the system gives guidance via voice prompt, and/or an alarm. Driver feedback like this has proven to be effective very quickly; tests have shown as much as 80% reduction in risky behaviors within the first couple of days, or 90% reduction after a couple of weeks of using the system.
Nauto is carefully calibrated to provide notifications only for the highest risk situations or behaviors, since frequent interruptions would mean the driver ignores the system completely. It also considers not just collision frequency, but potential severity when deciding whether to alert the driver.
“The basic idea is you take all these individual risk factors, you assemble them into an overall picture of how risky is this moment, and then you choose selectively when to intervene,” Heck said.
With 3 billion driven miles already for Nauto, the company understands what constitutes risk, Tahmida Mahmud, engineering manager at Nauto, told EE Times.
“If you want to solve the long tail of the problem, you need to understand risk holistically,” she said. “The best way to understand what actually leads to a true collision or a particular collision type is fusing all this data together.”
The SAFER model itself is a single lightweight convolutional neural network (CNN) that is both flexible and easy to integrate, so it can go into Nauto’s aftermarket device or be integrated into OEM systems. The model uses time series data across all the sensors—things like drowsiness come on progressively over a period of time, Mahmud noted—and the model considers what is going on internally and externally over 10-15 seconds to predict what is going to happen in the next 3-4 seconds.
In terms of transferability, the system is already deployed globally, and it was “pretty easy” to switch to driving on the other side of the road, Heck said.