Today, multicore system-on-chip (SoC) designs can be composed of hundreds of IP blocks, typically containing up to ten million logic gates. One way for SoC developers to create devices of this complexity is to make use of proven IP blocks provided by trusted third-party vendors. There’s no point in devoting thousands of hours to reinventing a USB 3.2 Gen x interface, for example, when it is already available as off-the-shelf IP. Instead, engineers can focus their efforts on creating their own internal IP that will differentiate their SoC from any competitive offerings.
When it comes to connecting the IP blocks so they can talk to each other, the only practical option for the majority of today’s high-capacity and high-complexity SoCs is to use a network-on-chip (NoC). What many people fail to realize is that an NoC is IP too, albeit IP that spans the entire SoC. As for this IP, design teams can decide to develop the NoC in-house, or they can choose to use proven NoC IP from a trusted third-party vendor.
Another consideration SoC architects can easily overlook is the necessity for the NoC’s design environment to be physically aware. This dramatically accelerates the exploration of the needed space to achieve an optimal NoC topology at the front-end of the process. It also significantly speeds up timing closure at the back-end.
The elements forming an NoC
An NoC is formed from multiple elements. First, each IP block has its own interface characteristics—such as data width and clock frequency—and employs one of the many standard protocols that have been adopted by the SoC industry: OCP, APB, AHB, AXI, STBus, and DTL. One or more sockets need to be attached to each of the functional IP blocks, which will then packetize and serialize data from source IPs into a normalized form suitable for transport over the network. Contrariwise, sockets attached to destination IPs will convert incoming packets back into whatever forms are desired.
In addition to the wires linking everything together, the main transport mechanism of the NoC is largely formed from switches and buffers. The switches act like multiplexers with associated arbiters, or demultiplexers with associated mapping logic, using the destination data in each packet’s header to route from its source to its intended destination. Meanwhile, buffers are used as storage elements to aggregate data along a path. For example, a buffer might be quickly loaded from an IP block using a fast clock, which can then turn its attention to other tasks while another IP block using a slower clock drains it.
Last, but certainly not least, pipeline registers are inserted into NoC pathways to address timing concerns. These issues may be caused by the need to traverse long distances across the SoC.
Smaller geometries mean larger problems
As SoC designs move to more advanced processes, transistors on the chip become smaller and faster. Unfortunately, the size and speed of the wires do not scale at the same rate, which means they have a higher relative cost in terms of area and power. In addition to the fact that more logic can be squeezed into the same area using smaller device geometries, the chips themselves are getting larger, thereby supporting more and more IP blocks. The architecture also has higher performance requirements in terms of clock frequencies, data bandwidths, and latency optimization of critical paths.
A large 7-nm SoC may require 6,000+ pipeline registers. Developers must consider many parameters when determining where these registers need to be located to meet timing needs while at the same time trying to minimize area, latency, and congestion. With such a monumental task, often coupled with a tight schedule, performing the insertion of pipeline registers by hand invariably results in overdesign in order to reduce costly place-and-route (P&R) iterations. The subsequent cost is a larger area, higher power consumption, and longer latencies. Even worse, hand insertion is prone to error and can easily lead to increased P&R iterations despite one’s best effort to ensure sufficient pipelines on all relevant paths.
Much of the complexity of pipeline insertion resides within the NoC IP, which connects to the vast majority of IP blocks on the chip. Consequently, it is the only IP that traverses the chip, so it has the longest wires and is most likely to route through congestion points. It must also react to architecture and marketing-instigated engineering change orders (ECOs) over the course of the project, which means the NoC is often the last IP to be frozen.
Physically aware NoC IP
The combination of the costs associated with inserting pipeline registers by hand coupled with the need to quickly adapt to changing requirements provides an excellent opportunity for an automated solution. By taking physical requirements into account, physically aware NoC technology can intelligently insert pipeline registers and suggest appropriate locations for their placement to the layout team. Third-party tools have this capability.
As a starting point, it’s necessary to know the relative placements of the IP blocks and any associated routing channels. During the later stages of the SoC development process, the physical design team will have detailed data pertaining to the locations of the IP blocks and routing channels. In this case, they can provide this information in the form of library exchange format (LEF) and design exchange format (DEF) files. However, this data is typically not available early in the project, so it is important to be able to make use of whatever early details are available.
Examples of early data are images, Visio drawings that capture high-level views of the SoC’s floorplan, or detailed LEF/DEF data. Third-party design tools can use these inputs to suggest appropriate layouts (Figure 1).
The tools use this information, along with “speed and feed” requirements, to automatically suggest pipeline register insertions and locations. The design teams can work interactively with the tools to experiment with different placements. In return, they receive accurate timing and area estimations and a better understanding of what can be achieved with the current NoC architecture earlier in the design flow.
Furthermore, third-party design tools can output LEF/DEF data associated with the NoC, including the suggested locations of pipeline registers, for use with back-end P&R tools (Figure 2).
Figure 2 The design example employs FlexNoC design tools to output LEF/DEF data associated with the NoC, including the suggested locations of pipeline registers, for use with back-end P&R tools. Source: Arteris
The sooner the better
Allowing teams to interactively iterate at the front-end of the development process dramatically reduces the number of back-end to front-end iterations needed to close timing. They can experiment with different NoC topologies and automatically place pipeline registers and evaluate the results of such placements in terms of area, power, and latency.
In turn, this reduces project cost, risk, and time to market. Furthermore, reducing the number of back-end to front-end iterations has a proportional impact on non-recurring engineering (NRE) costs, which are typically at their peak during this phase of the SoC design cycle.
Andy Nightingale, VP of product marketing at Arteris, has over 35 years of experience in the high-tech industry, including 23 years spent on various engineering and product management positions at Arm.