A question we often get from customers is how does Single-phase Liquid Immersion Cooling improve over system reliability? The answer is a combination of factors whose impact when added together show a dramatic increase in reliability of both the electronics and the overall system. I want to break down these factors so that you get a clear understanding of how SLIC increases reliability and increases the Mean Time Between Failures (MTBF).
1) Reduction in Number of Mechanical Components
In a SLIC system the two things you want to remove from all the equipment are the fans and vented Hard Disk Drives (HDDs). Because fans and HDDs are mechanical moving components they have the highest rate of failure of all the components in an electrical system. This includes all the fans in the air Conditioning system, the air handlers, dehumidifiers, etc. Spinning HDDs have the second highest failure rate of equipment in Data Centers. In fact, when you combine the personnel and maintenance costs for fan and HDD maintenance and replacement, they often constitute the largest operational maintenance cost for the Data Center. For a SLIC system we don't need the fans, and we replace the all HDDs with Solid State Drives which operate very well in SLIC. This has an immediate impact of increasing the reliability of the system and decreasing maintenance costs.
2) Improved Operating life of Pumps and Dry Cooler Fans.
But what about the pumps and the fans on the dry coolers? Well, the water and oil industries have been hard at work driving up reliability of pumps and reducing their costs for the last ~100 years. Today's pumps are both reliable by design, but when pumping our Dielectric Coolants they last even longer! How is this possible? Its based on how pumps are constructed and in particular the seals used on the pump shafts that drive the pump. These seals are designed to be "water tight" and in the majority of SLIC systems we use carbon fiber / PTFE seals, these seals require lubrication to decrease wear. In water based systems, the water eventually displaces the lubricant allowing wear to occur, but in SLIC our dielectric coolants are in fact excellent lubricants allowing the pumps to operate on constant duty for often 1.5-3x longer than the expected operational life of the pump seal if it were operating with a glycol / water mix. In some systems we even immerse the electric motors directly in the Dielectric Coolant to both lubricate and cool the motor for further enhancement in MTBF of the pump.
For the Dry Coolers fans, they get an extended life bonus through a major reduction in daily run time. In most SLIC systems we design we try to target a maximum operational usage of the fans of ~15%. This not only reduces wear on the fan motors, but also dramatically reduces power consumption. We are able to accomplish trick because the input temperature of the dry coolers (exit temperature of the electronics immersion container) is typically 50C for servers, GPUs, and FPGAs, and 60C for ASIC miners. This high input temperature to the dry cooler yields an excellent delta in temperature (Delta-T) between the dry cooler's radiator coils and the ambient air, typically in the 20-30C range. This extreme Delta T enables us to turn off the fans entirely and still get sufficient cooling of the fluid, except of course during the hottest times of year and day in the hottest locals (think Phoenix, AZ at 3PM in July where the average temperature in direct sun can reach over 45C). With the excellent Delta-T the dry coolers actually develop their own air flow drawing cooling air into the coils as the heated air rises out of the coils due to natural convection, this action provides sufficient heat transfer to cool the heated Dielectric Cooling by 15-20C.
3) Increase in Operating Life Expectancy of the Electronics.
What ultimately causes electronics of all sorts to fail is heat, Obviously, when the device is run hotter than the manufactures specified operating range this can cause damage, so most chips go into thermal shutdown to prevent this. But, its actually not just heat that can destroy electronics, but rather the constant cycling of heating and cooling is a much more damaging situation. This is why managing airflow in the air cooled data center is so critical. The reason this thermal cycling is damaging is that all the components and materials on a circuit board have different coefficients of expansion and contraction. This means that as the board is heated and cooled, all the traces, components, solder, joints, etc. all expand and contract at slightly different rates putting stress on the connections between the components.
In September 2019, the University of Texas published a research paper that actually measured these factors and concluded that using Single-phase, Liquid Immersion Cooling with ElectroCool actually decreased thermal cycling on the components, increased the flexibility of the circuit board, and therefore increased the reliability and resulting MTBF of the electronics. You can download a copy of the full research paper "IMPACT OF IMMERSION COOLING ON THERMO-MECHANICAL PROPERTIES OF PCB’S AND RELIABILITY OF ELECTRONIC PACKAGES" from our website.
The Impact of these Three Factors on Reliability and MTBF
When you combine the additive effects of these three factors it results in an increase in reliability by over ~200% with the resultant increase in system MTBF by as much as 50-60,000 of hours.