Case Studies

About University of Pisa

Established in 1343, the University of Pisa (UniPi) stands as a renowned public research institution in Pisa, Italy, boasting a rich legacy as one of Europe's oldest universities and is today one of the country’s largest universities. UniPi garners international acclaim for its contemporary approach to advanced research across diverse fields encompassing natural and social sciences, humanities, medicine, engineering, agriculture, and applied sciences. UniPi also boast a pioneering role in Computer Science, having invented the CEP (Calcolatrice Elettronica Pisana, the first electronic calculator) in the 50s. Committed to fostering excellence in both research and teaching across all domains of knowledge, the university endeavors to broaden access to its programs for an expanding global cohort of students and scholars.

University of Pisa

Recognizing the pivotal role of cutting-edge IT infrastructure in maintaining its leadership position across scientific, mathematical, and engineering disciplines, UniPi has made significant investments in this realm. Recent efforts have underscored the university's commitment to not only supporting its academic community in research and development pursuits, but also extending assistance to Italian industries. This entails furnishing resources for the conceptualization and testing of pioneering solutions.

The University of Pisa has been a longstanding partner of E4 Computer Engineering, the leading Italian system integrator in HPC and AI markets. UniPi has been relying on E4 expertise to create high-performance infrastructures for various projects, particularly in the realm of artificial intelligence. Utilizing E4's USTI solution, the university has established a robust IT environment to support its researchers in developing AI models across different domains.

Challenge

Having inaugurated its Green Data Center in late 2016, the University of Pisa has been committed to leveraging cutting-edge technologies to drive digital innovation initiatives, particularly those demanding Big Data processing, advanced technologies, and proficiency in HPC and AI domains.

For this reason, UniPi deployed 2 Nvidia DGX systems. NVIDIA DGX is a series of high-performance computing systems designed to accelerate AI workloads with powerful GPUs and software.

Nvidia DGX systems create several challenges for underlying storage due to their immense processing power and focus on AI workloads:

  • High bandwidth demands: DGX systems equipped with powerful GPUs generate massive amounts of data during AI training and inference, requiring storage with high bandwidth to keep up with data flow.
  • Low latency needs: Real-time AI processing demands minimal delays in data access. Traditional storage solutions can introduce latency bottlenecks, hindering DGX performance.
  • Large dataset support: AI training often involves massive datasets. Storage needs to be scalable to accommodate these growing data volumes.
  • Random I/O patterns: AI workloads often involve frequent, unpredictable data access patterns, placing additional strain on storage systems compared to sequential reads/writes.

In short, Nvidia DGX systems push the boundaries of storage performance and scalability to ensure efficient handling of data-intensive AI tasks.

Solution

Given the above-mentioned challenges, the university sought to enhance its storage infrastructure to meet the escalating demands for speed, agility, and durability. Consequently, they opted to implement an All NVMe storage system based on BeeGFS, a POSIX parallel file system developed by ThinkparQ.

Like all parallel file systems, BeeGFS requires a block-level device to be used as backend storage. Indeed, to get larger storage volumes, higher performance and higher reliability, drives are usually combined in RAIDs. But RAID engines require computational power to work.

Most of the modern RAID solutions were originally developed to cope with the limited performance of HDDs and can’t manage the high level of parallelism of modern NVMe SSD, because of their data path or limited number PCIe lanes. Alternative solutions based on GPU occupy additional PCIe slots, which are highly demanded on parallel file systems servers for NVMe drives connection, network interfaces and eBOF connections, if used.

xiRAID, a modern software RAID from Xinnor, thanks to its unique data path, solves all these problems and provides a scalable software-only RAID solution for NVMe, requiring very little CPU power to maximize parallel file systems performance.

For all these reasons, E4 recommended UniPi to implement xiRAID together with BeeGFS. Two identical servers with the same hardware and software configuration are used for the solution. Metadata targets are mirrored while storage targets form two separate storage pools. This allows a higher level of redundancy, reducing the failure domain, and also allows workload isolation, since the clients connecting are "only" two DGXs. Each client node accesses a different storage pool formed by storage targets belonging to only one of the Beegfs storage nodes.

With the management server running on external hypervisor infrastructure, if one of the two nodes fails, operations can continue on the mount point served by the storage pool belonging to the node that remains operational.

In case a client holds the connection via the metadata server hosted on the down node, it is enough to restart its service beegfs-client to re-establish a new connection.

University of Pisa - Solution

Test results

Test setup:

Hardware:

  • Server Model: Supermicro SYS-120U-TNR
  • BIOS: 1.8 2023/11/22
  • CPU Model: Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz
  • CPU Count: 64 with HT
  • Memory Model: 36ASF4G72PZ-3G2R1 32GB DIMM DDR4 3200MT/s
  • Memory Count: 16
  • Drives: 10 x KIOXIA KCD8XVUG3T20
  • IB Controller Model: Mellanox Technologies MT28908 Family [ConnectX-6]
  • IB Controller FW: 20.39.3004

Software:

  • Operating System: Rocky Linux release 8.9 (Green Obsidian)
  • Kernel: 4.18.0-513.5.1.el8_9.x86_64
  • MOFED: 23.10-2.1.3.1
  • Xinnor xiRAID: 4.0.3
  • Beegfs: 7.4.2

xiRAID was implemented in RAID level 6 (8+2), meaning that each server can survive the failure of up to 2 drives at the same time, without losing data.

The following table summarizes the test conducted running 32, 64 and 128 processes (clients per node) with very large files (>23GB). The means for the 3 tests are given below:

  32 tasks 64 tasks 128 tasks
  GB/s Latency (ms) GB/s Latency (ms) GB/s Latency (ms)
Write 22.4 11.4 25.8 19.9 23 44.4
Read 29.2 8.7 29.2 17.6 27 37.9

Monitoring

E4 provided the University of Pisa with a storage infrastructure that included a comprehensive monitoring and alerting system for each component involved, including the Xinnor xiRAID engine.

E4 deploys this stack on any of its storage solutions in two different versions: Basic and Custom.

Features

  • Store in real time: info, performance, and status metrics of all the components that make the storage solution.
  • Retrieve and aggregate the metrics historically and visualize the results of the aggregations.
  • Create alerts that can be routed to different users and destinations, such as e-mail, Slack and PagerDuty.
  • Consolidate and visualize the logs coming from several cluster nodes into a comprehensive dashboard.

This is possible thanks to the integration of existing opensource software solutions that include: Prometheus, Alert Manager, Grafana and Grafana Loki. This software is installed and configured inside a virtual machine or a physical node running in the customer environment.

Architecture

University of Pisa - Monitoring

Conclusion

In conclusion, the storage solution proposed by E4, integrating xiRAID with BeeGFS, delivered remarkable performance capabilities and low latency, effectively meeting the demanding requirements of the University of Pisa's computational infrastructure. The further monitoring tools provided by E4 simplify deployment and management for an optimal user experience.