Now that we’re getting comfortable with 5G, network operators are already planning for 5G-Advanced, release 18 of the 3GPP standard. The capabilities enabled by this new release—extended reality, centimeter-level positioning, and microsecond-level timing outdoors and indoors—will create an explosion in compute demand in Radio Access Network (RAN) infrastructure. Consider fixed wireless access for consumers and businesses.
Here, beamforming through massive MIMO for remote radio units (RRUs) must manage heavy yet variable traffic, while user equipment (UE) must support carrier aggregation. Both need more channel capacity. So, solutions must be greener, high performance and low latency, more efficient in managing variable loads, and more cost effective to support widescale deployment.
Figure 1 5G networks are evolving in several vectors, all pointing toward network openness and sophistication. Source: ABI Research
As a result, 5G infrastructure equipment builders want all the power, performance, and unit cost advantages of chips, plus all these added capabilities in a more efficient package. Start with virtualized RAN (vRAN) components that offer the promise of higher efficiency by being able to run multiple links simultaneously on one compute platform.
Virtual RANs and vector processing
The vRAN components aim to deliver on decade-old goals of centralized RAN: economies of scale, more flexibility in suppliers and central management of many-link, high-volume traffic through software. We know how to virtualize jobs on big general-purpose CPUs, so the solution to this need might seem self-evident. Except that those platforms are expensive, power hungry, and inefficient in the signal processing at the heart of wireless designs.
On the other hand, embedded DSPs with big vector processors are expressly designed for speed and low power in signal processing tasks such as beamforming, but historically have not supported dynamic workload sharing across multiple tasks. Adding more capacity required adding more cores, sometimes large clusters of them, or at best through a static form of sharing through a pre-determined core partitioning.
The bottleneck is vector processing since vector computation units (VCUs) occupy the bulk of the area in a vector DSP. Using this resource as efficiently as possible is essential to maximize virtualized RAN capacity. The default approach of doubling up cores to handle two channels requires a separate VCU per channel. But at any one time, software in one channel might require vector arithmetic support where the other might be running scalar operations; one VCU would be idle in those cycles.
Now imagine a single VCU serving both channels with two vector arithmetic and register files. An arbitrator decides dynamically how best to use these resources based on channel demands. If both channels need vector arithmetic in the same cycle, these are directed to the appropriate vector ALU and register files. If only one channel needs vector support, the calculation can be stripped across both vector units, accelerating computation.
Dynamic vector threading
This method for managing vector operations between two independent tasks looks very much like execution threading, maximizing use of a fixed compute resource to handle one or more than one simultaneous task. This technique, dynamic vector threading (DVT), allocates vector operations per cycle to either one or two arithmetic units (in this instance).
Figure 2 DVT maximizes use of a fixed compute resource to handle one or more than one simultaneous task. Source: CEVA
You can imagine this concept being extended to more threads, even further optimizing VCU utilization across variable channel loads since vector operations in independent threads are typically not synchronized.
Support for DVT requires several extensions to traditional vector processing. Operations must be serviced by a wide vector arithmetic unit, allowing for say 128 or more MAC operations per cycle. The VCU must also provide a vector register file for each thread so that vector register context is stored independently for threads. A vector arbitration unit provides for scheduling vector operations, effectively through competition between the threads.
How does this capability support virtualized RAN? At absolute peak load, signal processing requirements on such a platform will continue to be served as satisfactorily as they would be on a dual-core DSP, each with a separate VCU. When one channel needs vector arithmetic and the other channel is quiet or occupied in scalar processing, the first channel completes vector cycles faster by using the full vector capacity. That delivers higher average throughput in a smaller footprint than two DSP cores.
DSPs with DVT in virtualized RANs
Another example of how DVT can support more efficiency in baseband processing can be understood in 5G-Advanced RRUs. These devices must support massive MIMO handling for beamforming. A massive MIMO-based RRU will be expected to support up to 128 active antenna units, including support for multiple users and carriers. This implies massive compute requirements at the radio device, which becomes much more efficient with DVT. In UEs— terminals and CPEs supporting fixed wireless access—carrier aggregation also benefits from DVT. So, DVT benefits at both ends of the cellular network, infrastructure and UEs.
It might still be tempting to think of big general-purpose processors as the right answer to these virtualization needs but, in signal-processing paths, that could be a backwards step. We cannot forget that there were good reasons the infrastructure equipment makers switched over to ASICs with embedded DSPs. Competitive fixed wireless access solutions need to explore the benefits of DSP-based ASICs to leverage support for dynamic vector threading.
Nir Shapira is business development director for mobile broadband business unit at CEVA.