9/2/2023 0 Comments Gpu id![]() Variant B2 (-gpus-per-task=1, -gpu-bind=single:1, srun) Note that this variant behaves the same with and without -gpu-bind=single:1. Rank 3: rank on node is 1, using GPU id 0 of 1, CUDA_VISIBLE_DEVICES=0 Rank 2: rank on node is 0, using GPU id 0 of 1, CUDA_VISIBLE_DEVICES=0 The job is not executed correctly, GPUs are not mapped correctly due to CUDA_VISIBLE_DEVICES=0 on the second node: Rank 0: rank on node is 0, using GPU id 0 of 2, CUDA_VISIBLE_DEVICES=0,1 Rank 3: rank on node is 1, using GPU id 1 of 2, CUDA_VISIBLE_DEVICES=0,1 Rank 2: rank on node is 0, using GPU id 0 of 2, CUDA_VISIBLE_DEVICES=0,1 Rank 1: rank on node is 1, using GPU id 1 of 2, CUDA_VISIBLE_DEVICES=0,1 In both variants A1 and A2, the job executes correctly with optimal performance, we have the following output in the log: Rank 0: rank on node is 0, using GPU id 0 of 2, CUDA_VISIBLE_DEVICES=0,1 Variant A1 (-gres=gpu:2, mpirun) Variant A2 (-gres=gpu:2, srun) Hence, 4 variants in total: A1, A2, B1, B2. I'm running 2 tasks per node since we have 2 GPUs per node.įinally, there are two variants of GPU allocation (A and B) and two variants of program execution (1 and 2). Next, the behaviour I'll describe is observable when the job runs on at least 2 nodes. The #SBATCH options in the first block are quite obvious and uninteresting. # start the job in the directory it was submitted from #SBATCH -gpu-bind=single:1 # bind each process to its own GPU (single:) #SBATCH -gpus-per-task=1 # number of GPUs per process #SBATCH -gres=gpu:2 # number of GPUs per node (gres=gpu:N) #SBATCH -ntasks-per-node=2 # MPI processes per node #SBATCH -cpus-per-task=4 # number of CPUs per process #SBATCH -threads-per-core=1 # do not use hyperthreads (i.e. #SBATCH -exclusive # request exclusive allocation of resources #SBATCH -partition=gpXY # put the job into the gpu partition #SBATCH -time=1:00:00 # maximum wall time allocated for the job (D-H:MM:SS) #SBATCH -output=log.%x.job_%j # file name for stdout/stderr (%x will be replaced with the job name, %j with the jobid) #SBATCH -job-name=sim_1 # job name (default is the name of this file) I'm submitting a batch job with sbatch where the basic script is the following: #!/bin/bash I found some surprising differences in behaviour between these methods. There are also two ways to launch MPI tasks in a batch script: either using srun, or using the usual mpirun (when OpenMPI is compiled with Slurm support). There are two ways to allocate GPUs in Slurm: either the general -gres=gpu:N parameter, or the specific parameters like -gpus-per-task=N. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |