Step 8: Configure Telemetry Requirements
Omnia enables telemetry collection using both iDRAC Telemetry and LDMS (Lightweight Distributed Metric Service) in HPC environments. This design ensures that telemetry components are dynamically provisioned with stateless provisioning tool, providing flexible deployment and simplified lifecycle management.
iDRAC Telemetry provides out-of-band system metrics from Dell servers, including power, thermal, and hardware health information. The iDRAC Telemetry data can be collected and streamed to Kafka or VictoriaMetrics, depending on the deployment needs.
LDMS Telemetry collects in-band performance metrics such as CPU, memory, network, and I/O statistics from compute nodes. The LDMS Telemetry data can be collected and streamed to Kafka.
Note
Ensure that the service_k8s entry is mentioned in the software_config.json file when idrac_telemetry_support is set to true in the telemetry_config.yml file.
Omnia Telemetry Architecture
Omnia collects telemetry data from HPC cluster nodes using: LDMS for OS-level metrics and iDRAC for hardware telemetry.
The following diagram illustrates the telemetry services that can be deployed using Omnia and the data flow between the components.
Telemetry Components
The following components are involved in the telmetry services deployed by Omnia:
OIM (Omnia Infrastructure Manager)
Central management node that deploys and configures all telemetry services across the cluster.
Service Kubernetes Cluster
Hosts telemetry collection and storage services:
LDMS Aggregator – Receives metrics from slurm compute node samplers.
LDMS Store – Stores aggregated LDMS data
iDRAC Collector – Collects hardware telemetry via Redfish API
Kafka Broker – Streams telemetry data
VMAgent – Forwards metrics to Victoria Metrics
Victoria Metrics – Time-series database for metric storage
Slurm Cluster
Each slurm compute node runs:
LDMS Sampler – Collects OS metrics (CPU, memory, network, and I/O)
iDRAC – Provides hardware health data (temperature, power, and fans)
iDRAC and LDMS Telemetry Data Flows
LDMS Flow (OS Metrics)
Slurm Compute Nodes (LDMS Sampler) → LDMS Aggregator → LDMS Store → Kafka
iDRAC Flow (Hardware Metrics)
iDRAC (BMC) → iDRAC Collector → Kafka
iDRAC (BMC) → iDRAC Collector → VMAgent → Victoria Metrics
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.