Kubernetes

⦾ Why do Kubernetes Pods show ImagePullBackOff or ErrImagePull errors in their status?

Potential Cause: The errors occur when the Docker pull limit is exceeded.

Resolution:

Ensure that the docker_username and docker_password are provided in /opt/omnia/input/project_default/omnia_config_credentials.yml.

During omnia.yml execution, a kubernetes secret Docker regcred will be created in default namespace and patched to the Docker service account. To avoid ErrImagePull issue, you need to patch this secret to your namespace while deploying custom applications and use this secret as ImagePullSecrets in the yaml file . Click here for more info.

Note

If the playbook is already executed and the pods are in ImagePullBackOff state, run kubeadm reset -f on all the nodes before re-executing the playbook with the Docker credentials.

⦾ What to do if the nodes in a Kubernetes cluster reboot?

Resolution: Wait for 15 minutes after the Kubernetes cluster reboots. To verify the status of the cluster nodes, run the following commands from the kube_control_plane:

To get real-time kubernetes cluster status, run:
```
kubectl get nodes
```
To check which the pods are in the Running state, run:
```
kubectl get pods  all-namespaces
```
To verify that both the kubernetes master and kubeDNS are in Running state, run:
```
kubectl cluster-info
```

⦾ What to do when the Kubernetes services are not in Running state?

Resolution:

Run kubectl get pods all-namespaces to get the status of all the pods.
If the pod(s) are not in Running state, delete it using the command: kubectl delete pods <name of pod>
Re-run the omnia.yml playbook to bring up Kubernetes on the previously failed pods.

⦾ If the DNS servers are unresponsive, the Kubernetes pods stop communicating with the servers.

Potential Cause: The host network is faulty causing DNS to be unresponsive.

Resolution:

In your Kubernetes cluster, run kubeadm reset -f on all the nodes.
On the management node, edit the omnia_config.yml file to change the Kubernetes Pod Network CIDR. The suggested IP range is 192.168.0.0/16. Ensure that the IP provided is not in use on your host network.
List k8s in input/software_config.json and re-run omnia.yml.

⦾ Why does the TASK: Initialize Kubeadm fail with nnode.Registration.name: Invalid value: "<Host name>" error?

Potential Cause: The OIM does not support hostnames with an underscore in it, such as ‘mgmt_station’.

Resolution: Ensure that the OIM hostname meets the below mentioned requirements:

Hostname should not contain the following characters: , (comma), . (period), _ (underscore).

Hostname cannot start or end with a hyphen (-).

No upper case characters are allowed in the hostname.

Hostname cannot start with a number.

Hostname and domain name (hostname00000x.domain.xxx) cumulatively cannot exceed 64 characters.

⦾ What to do if omnia.yml playbook execution fails with MetalLB, a load-balancer for bare metal Kubernetes cluster?

Potential Cause: This failure is caused due to an issue with Kubespray, a third-party software. For more information about this issue, click here.

Resolution: If your omnia.yml playbook execution fails while waiting for the MetalLB controller to be up and running, you need to wait for the MetalLB pods to come to running state and then re-run omnia.yml/scheduler.yml.

⦾ Why does the NFS-client provisioner go to a ContainerCreating or CrashLoopBackOff state?

../../../_images/NFS_container_creating_error.png

../../../_images/NFS_crash_loop_back_off_error.png

Potential Cause: This issue usually occurs when server_share_path given in storage_config.yml for k8s_share does not have an NFS server running.

Resolution:

Ensure that server_share_path mentioned in storage_config.yml for k8s_share: true has an active nfs_server running on it.

⦾ If the Nfs-client provisioner is in ContainerCreating or CrashLoopBackOff state, why does the kubectl describe <pod_name> command show the following output?

Potential Cause: This is a known issue. For more information, click here.

Resolution:

Wait for some time for the pods to come up. or
Do the following:
Run the following command to delete the pod:
kubectl delete pod <pod_name> -n <namespace>
Post deletion, the pod will be restarted and it will come to running state.

⦾ Kubernetes workloads are unable to resolve the PowerScale SmartConnect hostname (e.g., management.ps.com) from within the cluster.

Potential Cause: The SmartConnect hostname is not resolvable by the Kubernetes cluster’s internal DNS (CoreDNS). This typically happens when: - CoreDNS is unaware of the external DNS zone used by PowerScale. - The SmartConnect service IP or hostname is not defined in CoreDNS or the upstream DNS servers.

Resolution:

Step 1 — Identify the SmartConnect Hostname and IP

In the PowerScale UI, go to:
Cluster Management → Network Configuration → Subnets → <Your Subnet Name>

Note the following details:

SmartConnect Service Name: e.g., management.ps.com

SmartConnect IP Address: e.g., 10.x.x.x

Step 2 — Update the CoreDNS ConfigMap

On a control-plane node, edit the CoreDNS ConfigMap:
kubectl -n kube-system edit configmap coredns

2. Locate the Corefile: section and add a hosts block before the forward or proxy section. Example:
hosts {
10.x.x.x management.ps.com
fallthrough
}
Replace 10.x.x.x with your actual PowerScale DNS IP. You can find the DNS IP inside the file: /opt/omnia/input/project_default/network_spec.yml → under [dns] field.

Step 3 — Restart CoreDNS Pods

Apply the changes by restarting CoreDNS:
kubectl -n kube-system rollout restart deployment coredns
Verify the CoreDNS pods are running:
kubectl -n kube-system get pods -l k8s-app=kube-dns

Step 4 — Validate DNS Resolution

Launch a temporary pod to test name resolution:
kubectl run -it dns-test --image=busybox --restart=Never -- sh
Inside the pod shell, test DNS:
nslookup management.ps.com
Expected Output: Server: 10.x.x.x Address 1: management.ps.com

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.