Friedrich-Alexander-Universität Boosts High-Performance Computing with xiRAID and Lustre Solution

September 25, 2024

Back to all news

Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), one of Germany’s leading research universities, has successfully enhanced its high-performance computing capabilities through a strategic implementation of xiRAID and Lustre technologies. This upgrade, facilitated by Xinnor and MEGWARE, has resulted in substantial performance gains for the GPU cluster “Alex”. Alex is a critical resource for research in artificial intelligence, machine learning, atomistic simulations and other advanced computing projects and is located at the Erlangen National High Performance Computing Center (NHR@FAU), which enables technical and scientific research at the peak level.

The implementation addressed the performance challenges NHR@FAU faced with its previous CephFS system, particularly in write throughput and metadata operations. Comparative testing between the new xiRAID with Lustre solution (on four nodes) and the existing CephFS system (on three nodes) showed notable performance differences:

  • Write Throughput: Increased from 11.3 GB/s to 67 GB/s
  • Read Throughput: Improved from 23.4 GB/s to 90.6 GB/s

It is important to note that these results should be interpreted within the context of the specific testing environment:

  1. Number of servers: The xiRAID + Lustre configuration utilized four NVMe servers, while the CephFS setup used three.
  2. Network configuration: The four NVMe servers with xiRAID + Lustre were connected to eight clients via Infiniband with an overall limit of 100GB/s, while the three NVMe servers with Ceph were connected to eight clients via Ethernet with a bandwidth limit of 25GB/s because CephFS does not support RDMA interconnects like InfiniBand.

The solution based on xiRAID + Lustre demonstrated considerable performance improvement, leveraging the advanced network infrastructure. The new system also provided additional benefits, including improved metadata performance and nearly double the usable storage capacity compared to the previous CephFS setup.

Prof. Gerhard Wellein, Head of NHR@FAU, commented, “The xiRAID and Lustre implementation has transformed our ability to handle complex AI and ML workloads. The dramatic increase in throughput and storage capacity allows our researchers to push the boundaries of what is possible in fields like molecular dynamics and gesture recognition.”

Davide Villa, CRO at Xinnor, stated, “We’re thrilled to see the impact of xiRAID at NHR@FAU. This implementation showcases how our technology can significantly enhance performance in demanding research environments, particularly when combined with Lustre for large-scale cluster computing.”

Markus Hilger, Senior HPC Engineer at MEGWARE, added, “As a leading system integrator, we’re proud to have recommended and facilitated this solution for NHR@FAU. The successful implementation underscores the importance of choosing the right technologies to meet the evolving needs of high-performance computing in academia.”

This customer success story was announced by Xinnor at the LAD24 conference in Paris. For those interested in learning more about this case study, Xinnor and MEGWARE will be hosting a webinar on October 2 at 3pm CEST as part of the “Monthly Storage Talks” by the “Research Group High-Performance Storage”. For more information and to register for the webinar, please visit https://hps.vi4io.org/events/2024/mst.

For more information about this implementation please visit the case study page: https://xinnor.io/partners-resellers/fau/