Current InfiniBand status
2020-11-21
Latest modification of this page: 25 November 2020
Current status of InfiniBand upgrades #
Recently installed InfiniBand switch Mellanox MSB7800 supports EDR 100Gb/s speed.
Nodes cl1n005–cl1n010 and cl1n017–cl1n030 include new Mellanox ConnectX-5 adapters with EDR 100Gb/s support.
Nodes cl1n001–cl1n004, cl1n011–cl1n016 include old Mellanox ConnectX-3 adapters with QDR 40Gb/s support.
Nodes with new adapters have higher priority in Slurm system. You can also explicitly define the list of nodes you want to run you job with -w
parameter, e.g., to run your job on 4 nodes cl1n005–cl1n008 use the following command:
sbatch -p x12core -w cl1n[005-008] --nodes=4 --ntasks-per-node=24 ...
Known problems #
Node cl1n001 is currently running at SDR speed, which is 4 times slower compared to QDR. This node currently have lowest priority in Slurm system.Maximum speed at cl1n001 node was restored on November 25, 2020.- Mixing nodes with different adapters (ConnectX-3 и ConnectX-5) may cause problems in MPI apps. The following environment variable may help when using latest Intel MPI library:
export UCX_TLS=ud,sm,self
This page will be updated.