Re-provisioning the Cluster
In the event that an existing Omnia cluster needs a fresh installation, the cluster can be re-provisioned.
If you deploy the service kubernetes cluster or slurm cluster freshly, ensure that the NFS server share path used by slurm or service kubernetes cluster is cleared manually. The NFS details are available in storage_config.yml.
To reuse the same
server_share_pathandclient_share_path, do the following:Power off all servers except the OIM.
From the OIM, go to the
client_share_pathand delete all contents in the respective client share path.Run the
provision.ymlplaybook.PXE boot the required nodes to be reprovisioned.
To use the new
server_share_pathandclient_share_path, do the following:Run the
provision.ymlplaybook.PXE boot the required nodes to be reprovisioned.
Re-provision Existing Nodes without Any Modifications
To re-provision the existing nodes without any modifications, PXE boot the required nodes to be reprovisioned.
The OS is automatically installed on every PXE boot if there are no modification in the cluster.
Re-provision the Nodes with Modifications
Update the mapping file (for mapping file discovery) or ensure nodes are configured in OME (for OME-based provisioning), and update
software_config.jsonas required.In the event of any modification to the
software_config.json, run thelocal_repo.ymlplaybook, and then run thebuild_image_x86_64.ymlorbuild_image_aarch64.ymlto build the new images. For more information, see Execute the Local Repo Playbook.After the images are created, run the
provision.ymlplaybook. For OME-based provisioning, use:ssh omnia_core cd /omnia/provision ansible-playbook provision.yml
For more information, see Provision the Cluster Nodes.
PXE boot the required nodes to be reprovisioned.
Note
The entire cluster needs to be reprovisioned if you want to reprovision the Slurm Control node and Kube Control Plane.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.