Skip to content

Liquid-Cooled Data Centers / IT Equipment #11408

@KWSL

Description

@KWSL

General Summary

With the expansion of data centers due to AI development and widespread adoption of liquid cooling for data centers, we request that the EnergyPlus team provide a way to directly model liquid cooled IT equipment within EnergyPlus. This will allow practitioners to accurately asses electricity and water use, and evaluate opportunities for heat recovery and cogeneration using data center waste heat.

Detailed Description

Context:

Data center capacity has more than tripled in the past four years and is expected to continue growing. Data center electrical use is also projected to double in the next five years. There are also significant concerns about the use of fresh water to provide data center cooling.

EnergyPlus currently has an ElectricEquipment:ITE:AirCooled object, which allows projects to model air-cooled electric information technology equipment which has variable power consumption as a function of loading and temperature. The variable electric load is a critical piece of the equation as computers will reduce their clock speed in response to high temperatures. There is currently no ability to directly model liquid-cooled IT equipment.

Most new data centers are using a combination of liquid and air to cool IT equipment, and major companies have shifted to manufacturing liquid-cooled chips, particularly for AI applications. This includes offerings from Microsoft (Azure), NVIDIA (RTX Pro and GB300 NVL72), Dell (PowerEdge XE9680L), HPE (Cray XD), etc.

Technology:
Liquid-cooled data centers is an umbrella term that encompasses the following technologies:

  1. Rear-Door Heat Exchangers (RDHx) – these are liquid cooling coils installed in the airflow of the computer’s internal cooling system. RDHx have two sub-categories

    a. Active Systems – which include a dedicated cooling fan in addition to the computer’s internal cooling system
    b. Passive Systems – which do not include any dedicated fans
    RDHx systems generally are not capable of removing 100% of the heat generated, and an air-cooling system will also be required to handle the balance of the load.

  2. Direct-to-Chip (D2C) Cooling – takes liquid cooling one step farther by attaching liquid-cooled plates directly to heat-generating components. D2C includes both Cold plate technology and Microfluidics. D2C cooling systems generally require supplemental cooling from an air system.

  3. Single-Phase Immersion Cooling – this is currently the most efficient option, where servers and components are submerged into a thermally conductive dielectric liquid. The liquid is cooled by a secondary cooling system. These systems eliminate the need for air cooling. At this time, immersion cooling is not used widely outside of experimental or laboratory settings.

  4. Two-Phase Cooling Systems – two phase cooling systems use refrigerants to move heat away from the chip to a condenser. These systems can be immersion systems or use a heat exchanger, with an evaporator at the chip and a separate condenser which can reject heat to a free cooling source (such as outdoor air) or to a loop for heat recovery. Two-phase immersion systems do not require supplemental air cooling. Two-phase systems that use a condenser and evaporator generally need some supplemental cooling. At this time, immersion cooling is not used widely outside of experimental or laboratory settings.

Need:
Driven by AI, data center electricity consumption is expected to increase 3-fold in the next 5 years (https://www.iea.org/reports/energy-and-ai), and represents about 10% of electricity demand growth by 2030. As data center electricity consumption grows, there is increasing interest in recovering waste heat generated by data centers for co-generation, absorption cooling, and heating of nearby facilities. Liquid-cooled ITE equipment provides both more efficient cooling for the ITE equipment and the opportunity to effectively recover heat for secondary applications.

EP3 customers, universities and consulting clients are asking us to model data centers and to compare cooling systems for these data centers. These parties need to evaluate:

  1. The efficiency of the data centers themselves
  2. Heat recovery options
  3. Cogeneration options
  4. Water consumption

Currently, we can only achieve this with EnergyPlus through workarounds. We can currently model data center loads as a LoadProfie:Plant, but this does not give the ability to monitor chip performance, or reduce the load when temperatures get too high. Modeling data center loads in part as a LoadProfile:Plant also does not create the electric load for meter reporting. Approximations can be achieved through this ad-hoc method coupled with EMS or python scripting, but this does not give consistent or validated results.

As data center growth accelerates, and as interest in liquid cooling and heat recovery increases, there will be increasing need for simulation tools that can accurately model these systems.

We request that the EnergyPlus team provide the ability to:

  1. Define parameters around the size of the ITE load handled by liquid cooling vs air cooling. This could be a percentage (or capacity) of the ITE load that is handled by the liquid cooled system. We request that this percentage be driven by an equation. If it is a fixed number, we request that the input be given an EMS system actuator.
  2. Define how remaining heat is removed - ie - is it removed by standard HVAC equipment, or is it handled using the equations and methodology established by ElectricEquipment:ITE:AirCooled
  3. Allow liquid-cooled chips to be connected to the existing EnergyPlus PlantLoop architecture - this allows use of existing EnergyPlus architecture to explore opportunities for heat recovery, and calculate cooling system water use.
  4. Allow cooling at a wide range of temperatures (some experimental chips can currently be cooled at temperatures up to 86C - the EnergyPlus implementation should support exploration of high temperature cooling and not be limited current typical design conditions)
  5. Allow the energy modeler to analyze chip performance as a function of temperature as with the current ElectricEquipment:ITE:AirCooled object
  6. Allow specification of auxiliary electric power associated with the liquid cooling system. THis would be used to model a localized fan as in an Active Rear Door Heat Exchanger configuration, or localized circulation pumps.
  7. Ideally be flexible enough to allow simulation of future technology that operates at higher efficiencies or wider temperature ranges than the limits of current technology.

Possible Implementation

Based on our experience creating the EP3 EnergyPlus UI, we recommend an implementation that:

  1. Creates two new objects:
    a. Coil:LiquidCooled:ITE – this would essentially be a liquid coil with inlet and outlet nodes – support for liquids with other heat transfer properties would be handled at the loop level with the fluid_type and user_defined_fluid_type fields. Just like a standard Coil:Cooling:Water, this object should address:
    i. Design fluid inlet and outlet temperatures
    ii. Design fluid flow rate
    b. Coil:LiquidCooled:ITE:TwoPhase – this object should allow the user to specify or select a phase change fluid, and specify the heat-rejection location – either plant nodes for liquid cooling, outdoor air, or evaporative cooling
    Of the two suggested new objects, Coil:LiquidCooled:ITE is the higher priority
  2. Expands ElectricEquipment:ITE:AirCooled to reference Coil:LiquidCooled:ITE or Coil:LiquidCooled:ITE:TwoPhase objects. This would be a set of fields referencing the LiquidCooled objects. Coils should be listed in the order in which they will be called to meet the ElectricEquipment:ITE load, with the air-cooled component acting last. Suggested set of repeating fields:
    a. LiquidCooled Coil Type
    b. LiquidCooled Coil Name
    c. LiquidCooled Coil flow control parameters – allow the modeler to specify how flow is modulated (if it all) – constant flow, variable flow, etc
    d. LiquidCooled Coil capacity control parameters – I trust the EnergyPlus dev team with the details, but the idea is to capture the fact that liquid-cooling often does not capture 100% of the heat generated

Metadata

Metadata

Assignees

No one assigned

    Labels

    NewFeatureRequestThis "issue" is a new feature request, not a defect reportTriageIssue needs to be assessed and labeled, further information on reported might be needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions