Workshop
Monitor Storage
In this step, we’ll configure the Prometheus receiver to monitor the storage.
What storage do Cisco AI PODs utilize? #
Cisco AI PODs have a number of different storage options, including Pure Storage, VAST, and NetApp.
The workshop will focus on Pure Storage.
How do we capture Pure Storage metrics? #
Cisco AI PODs that utilize Pure Storage also use a technology called Portworx, which provides persistent storage for Kubernetes.
Portworx includes a metrics endpoint that we can scrape using the Prometheus receiver.
Capture Storage Metrics with Prometheus #
Let’s modify the OpenTelemetry collector configuration to scrape Portworx metrics with the Prometheus receiver.
To do so, let’s add an additional Prometheus receiver creator section
to the otel-collector-values.yaml file. Add it after the receiver_creator/weaviate
section but before the pipelines section:
receiver_creator/storage:
# Name of the extensions to watch for endpoints to start and stop.
watch_observers: [ k8s_observer ]
receivers:
prometheus/portworx:
config:
config:
scrape_configs:
- job_name: portworx-metrics
static_configs:
- targets:
- '`endpoint`:17001'
- '`endpoint`:17018'
rule: type == "pod" && labels["app"] == "portworx-metrics-sim"We’ll need to ensure that Portworx metrics are added to the filter/metrics_to_be_included filter
processor configuration as well:
processors:
filter/metrics_to_be_included:
metrics:
# Include only metrics used in charts and detectors
include:
match_type: strict
metric_names:
- DCGM_FI_DEV_FB_FREE
- ...
- px_cluster_cpu_percent
- px_cluster_disk_total_bytes
- px_cluster_disk_utilized_bytes
- px_cluster_status_nodes_offline
- px_cluster_status_nodes_online
- px_volume_read_latency_seconds
- px_volume_reads_total
- px_volume_readthroughput
- px_volume_write_latency_seconds
- px_volume_writes_total
- px_volume_writethroughputNote: add just the new metrics starting with
px_cluster_cpu_percent
We’ll need to add a new metrics pipeline for Portworx metrics as well. Add the following to the bottom of the file:
metrics/storage:
exporters:
- signalfx
processors:
- memory_limiter
- filter/metrics_to_be_included
- batch
- resourcedetection
- resource
receivers:
- receiver_creator/storageTake a moment to compare the
contents of your modified otel-collector-values.yaml file with the
otel-collector-values-with-portworx.yaml file.Remember that indentation
is important for yaml files, and needs to be precise:
diff otel-collector-values.yaml otel-collector-values-with-portworx.yamlUpdate your file if needed to ensure the contents match.
Don’t restart the collector yet
Because restarting the collector in an OpenShift environment takes 3 minutes per node, we’ll wait until we’ve completed all configuration changes before initiating a restart.
