Step 11: Set up Slurm on nodes
Prerequisites
Provide the Slurm 25.05.2 user repository.
Fill the mandatory parameters in
omnia_config.yml: Input parameters for the clusterFill the parameters in
storage_config.yml: Input parameters for the clusterAdd
slurm_customtosoftware_config.jsonand addslurm_customsubgroups.Add
slurm_customrepository URL touser_repo_url_x86_64oruser_repo_url_aarch64inlocal_repo_config.yml.
Setup Slurm:
To download the artifacts required to set up Slurm on the nodes, run the
local_repo.ymlplaybook.To build diskless images for cluster nodes, run build_image_x86_64.yml or build_image_aarch64.yml: Build cluster node images
To discover the potential cluster nodes, configure the boot script, and cloud-init based on the functional groups, run the
discovery.ymlplaybook: Discover cluster nodesAfter successfully executing the
discovery.ymlplaybook, you can PXE boot the slurm node, login node, and login compiler node simultaneously.
Note
If you want to deploy only Slurm clusters (slurm_custom), the idrac_telemetry_support parameter must be set to false in the telemetry_config.yml file. Omnia is Validated for Slurm version 25.05. If you use any other version, some functionality like PAM may not work.
Slurm with GPU:
Prerequisites
You must have the
user_repowhich is compiled with nvml and cgroup-v2. If slurm-nodes have GPU then you must provide at least onelogin_compiler_node.
Note
If the iDRAC of a Slurm node is not accessible through OIM—because of issues such as an incorrect iDRAC port configuration or invalid credentials—the node configuration specified in /etc/slurm/slurm.conf for NodeName will default to: Sockets=1 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=3774873. Update slurm.conf with the correct hardware values and run scontrol reconfigure to apply the changes.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.