Engineers often consider thermal management and cooling as a two-part problem. First, there’s the global “macro” case where no individual component is excessively hot, but the aggregate heat buildup puts the board or chassis outside of acceptable limits. Second, is the localized “micro” case where one or more active or passive components (power devices, high-end processors, FPGAs, current-sense resistors) need to be cooled to avoid slow-clocking mode, burn-out, or excessive drift due to temperature coefficient. Often, the micro problem is a major contributor to the macro one, of course.
The solutions to thermal excess are well known in principle: just use some combination of convection, conduction, and radiation cooling, Figure 1.
Figure 1 The three modes of heat transfer are well understood and can be modeled and simulated as a first step in the cooling-plan analysis. Source: sciencenotes.org/
But that’s where the simplicity ends. You can model the thermal solution, but it often takes much more than a fan or a few add-on heat sinks to create a mechanically sound design which can convey your heat to that mystical place called “away” which is the thermal depository for that excess heat.
This is where the mechanical designers and production engineers earn some serious respect, as they must turn a thermal goal into a tangible, manufacturable reality. One such advanced cooling technique which has been field-proven over a decade is called a “hybrid” approach, represented by the patented RuggedCool℠ technology from General Micro Systems, Inc.
Unlike the conventional approach where the air or cooling liquid is focused on individual components and hot spots, here the heat is evacuated to an entire cold-plate assembly for the whole system, via a central “radiator” core plenum that’s essentially a whole-system cooling plate, Figure 2.
Figure 2 In the RuggedCool design, a central radiator-core plenum functions as a whole-system cooling plate. Source: General Micro Systems, Inc.
In this design, every component, board, or subsystem is conductively cooled using the cold-plate mechanism, with heat conducted away from their individual heat sinks to the combined heat-sink assembly of the entire system.
This all sounds like a simple-enough idea, but implementing it is a challenge. The GMS technology uses a corrugated alloy slug with an extremely low thermal resistance, acting as a heat spreader at the processor die (assuming that is the primary heat source). Once the heat is spread over a much larger area, a liquid silver compound in a sealed chamber is used to transfer the heat from the spreader to the system’s enclosure. There is one surface of copper, one surface of aluminum, and sandwiched in-between is a layer of silver.
This approach yields a temperature difference of less than 10°C from the CPU core to the cold plate, compared with over 25°C for conventional approaches, Figure 3.
Figure 3 The resulting arrangement has a low 10°C delta between the CPU core to the cold plate. Source: General Micro Systems, Inc.
It’s a form of liquid cooling but without the headaches or issues associated with moving fluid. Using materials like liquid silver makes it clear that the technology is expensive, but it is intended for applications for which no other viable solution is available.
This approach is in contrast to just adding cooling plates in order to produce conduction-cooled systems. That can result in inadequate cooling since the heat-producing devices, other than the CPU itself (or other primary heat source) are cooled by the CPU’s thermal-conduction path. This, of course, is contrary to the objective of drawing heat away from the CPU.
By directing all heat to central plenum, the effectiveness of blown air, if any, is maximized. It also allows for a sealed system where only the central plenum is open to the environment, thus making it easier to manage dust and moisture ingress while also easing the electrical challenge of EMI control.
This technique provides benefits related to shock and vibration—the silent and longer-term “killers” of many components. Here the CPU die does not make direct contact with the system enclosure, but instead connects via the liquid-silver chamber which acts as a shock absorber. This prevents shock from being transferred from the enclosure to the flip-chip ball grid array (FCBGA), thus isolating the CPU from ongoing vibration-induced micro-fractures (which, in time, cause the CPU to fail).
There’s no question that this is a complex, costly mechanical design, but electronic engineers and their customers have only themselves to blame. After all, dissipation has gone from a hundred or so watts to beyond a kilowatt—a modest 19-inch-wide rack-unit (RU) in 1U-size (1¾ inches high) can now reach 1.5 kW and more, so innovative approaches and new idea are needed.
Not all of these require the complexity and sophistication of this technology. In some cases, just switching to card-cage guides and grips which offer a greatly enhanced thermal path can be a big help (see Related Content).
Have you ever been involved in a cooling scenario where the physical implementation of the needed strategy was a much bigger challenge than the thermal model suggested? Was the solution just a carefully considered application of existing components, or were custom component and specialized resources needed?
Bill Schweber is an EE who has written three textbooks, hundreds of technical articles, opinion columns, and product features.