Challenge:
With over 3 MW of power consumption during peak HPC workloads like HPL, LRZ was seeking a way to cap energy usage on SuperMUC-NG to 2 MW, without sacrificing performance.
Solution:
LRZ deployed EAR to dynamically manage and limit energy usage across its high-density Intel CPU cluster. EAR’s smart power capping and runtime optimization features allowed LRZ to meet its energy goals while maintaining system throughput. Following the success of this first deployment, LRZ extended EAR to its second flagship cluster, SuperMUC-NG2, which integrates Intel Data Center GPUs (Ponte Vecchio) to accelerate scientific research at scale.
Results:
Peak energy usage reduced by 33% (3 MW → 2 MW)
Minimal performance degradation on large workloads
Transparent operation with no impact on user experience
Successful extension to GPU-accelerated workloads on SuperMUC-NG2
Ongoing collaboration with EAS for tuning and support
“We are fully satisfied that EAR meets our goal and by the quality of professional services that EAS is delivering.”
— Dr. Herbert Huber, Head of High Performance Systems Department, LRZ
About LRZ:
The Leibniz Supercomputing Centre (LRZ) is one of Europe’s leading HPC facilities and a pioneer in energy-efficient supercomputing. SuperMUC-NG, a 6480-node Intel cluster, delivers over 27 Petaflops of performance with advanced liquid cooling, making it one of the most sustainable systems in its class. Its follow-up system, SuperMUC-NG2, adds GPU acceleration with 240 nodes, each featuring 4 Intel Ponte Vecchio GPUs, and has been operational since October 2024.
Using EAR since: August 2019
Systems: SuperMUC-NG (Intel CPUs), SuperMUC-NG2 (Intel Data Center GPUs)