Monitoring Cisco AI Pods with Splunk Observability Cloud

2 minutes   Author Derek Mitchell

Cisco’s AI-ready PODs combine the best of hardware and software technologies to create a robust, scalable, and efficient AI-ready infrastructure tailored to diverse needs.

Splunk Observability Cloud provides comprehensive visibility into all of this infrastructure along with all the application components that are running on this stack.

The steps to configure Splunk Observability Cloud for a Cisco AI POD environment are fully documented (see here for details).

However, it’s not always possible to get access to a Cisco AI POD environment to practice the installation steps.

This workshop provides hands-on experience deploying and working with several of the technologies that are used to monitor Cisco AI PODs with Splunk Observability Cloud, without requiring access to an actual Cisco AI POD. This includes:

  • Practice deploying a RedHat OpenShift cluster with GPU-based worker nodes.
  • Practice deploying the NVIDIA NIM Operator and NVIDIA GPU Operator.
  • Practice deploying Large Language Models (LLMs) using NVIDIA NIM to the cluster.
  • Practice deploying the OpenTelemetry Collector in the Red Hat OpenShift cluster.
  • Practice adding Prometheus receivers to the collector to ingest infrastructure metrics.
  • Practice deploying the Weaviate vector database to the cluster.
  • Practice instrumenting Python services that interact with Large Language Models (LLMs) with OpenTelemetry.
  • Understanding which details which OpenTelemetry captures in the trace from applications that interact with LLMs.

Please note: Red Hat OpenShift and NVIDIA AI Enterprise components are typically pre-installed with an actual AI POD. However, because we’re using AWS for this workshop, it’s necessary to perform these setup steps manually.

Tip

The easiest way to navigate through this workshop is by using:

  • the left/right arrows (< | >) on the top right of this page
  • the left (◀️) and right (▶️) cursor keys on your keyboard
Last Modified Oct 3, 2025

Subsections of Monitoring Cisco AI Pods with Splunk Observability Cloud

AWS Setup

10 minutes  

Enable the Red Hat OpenShift Service in AWS

To deploy OpenShift in your AWS account, we’ll need to first enable the Red Hat OpenShift service using the AWS console.

Next, follow the instructions to connect your AWS account with your Red Hat account.

Provision an EC2 Instance

Let’s provision an EC2 instance that we’ll use to deploy the Red Hat cluster. This avoids the limitations running the ROSA command-line interface on Mac OS.

We used a t3.xlarge instance type using Ubuntu 24.04 LTS while creating the workshop, but a smaller instance type can also be used.

ssh into the instance once it’s up and running.

Clone the GitHub Repository

Clone the GitHub repository to your EC2 instance:

git clone https://github.com/splunk/observability-workshop.git

cd observability-workshop/workshop/cisco-ai-pods 
Last Modified Sep 26, 2025

OpenShift Prerequisites

15 minutes  

The steps below are required before deploying the OpenShift cluster in AWS.

Create a Red Hat Login

The first thing we’ll need to do is create an account with Red Hat, which we can do by filling out the form here.

Install the AWS CLI

To install the AWS CLI on the EC2 instance provisioned previously, run the following commands:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
sudo apt install unzip
unzip awscliv2.zip
sudo ./aws/install

Use the following command to ensure it was installed successfully:

aws --version

It should return something like the following:

aws-cli/2.30.5 Python/3.13.7 Linux/6.14.0-1011-aws exe/x86_64.ubuntu.24

Login to your AWS account using your preferred method. Refer to the documentation for guidance. For example, you can login by running the aws configure command.

Confirm you’re logged in successfully by running a command such as aws ec2 describe-instances.

Then, verify your account identity with:

aws sts get-caller-identity

Check whether the service role for ELB (Elastic Load Balancing) exists:

aws iam get-role --role-name "AWSServiceRoleForElasticLoadBalancing"

If the role does not exist, create it by running the following command:

aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com"

Install the ROSA CLI

We’ll use the ROSA command-line interface (CLI) for the deployment. The instructions are based on Red Hat documentation.

You can download the latest release of the ROSA CLI for your operating system here.

Alternatively, we can use the following command to download the CLI binary directly to our EC2 instance:

curl -L -O https://mirror.openshift.com/pub/cgw/rosa/latest/rosa-linux.tar.gz

Extract the contents:

tar -xvzf rosa-linux.tar.gz

Move the resulting file (rosa) to a location that’s included as part of your path. For example:

sudo mv rosa /usr/local/bin/rosa

Log in to your Red Hat account by running the command below, then follow the instructions in the command output:

rosa login --use-device-code

Install the OpenShift CLI (oc)

We can use the following command to download the OpenShift CLI binary directly to our EC2 instance:

curl -L -O https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/stable/openshift-client-linux.tar.gz

Extract the contents:

tar -xvzf openshift-client-linux.tar.gz

Move the resulting files (oc and kubectl) to a location that’s included as part of your path. For example:

sudo mv oc /usr/local/bin/oc
sudo mv kubectl /usr/local/bin/kubectl

Create Account-Wide Roles and Policies

Use the following command to create the necessary account-wide roles and policies:

rosa create account-roles --mode auto

Create an AWS VPC for ROSA HCP

We’re going to use the Hosted Control Plane (HCP) deployment option to deploy our OpenShift cluster. To do this, we’ll need to create a new VPC in our AWS account using the following command:

Note: update the region as appropriate for your environment.

rosa create network --param Region=us-east-2

Important: make note of the subnet ids created as a result of this command as you’ll need them when creating the cluster.

Note: by default, each AWS region is limited to 5 elastic IP addresses.
If you receive the following error: “The maximum number of addresses has been reached.” then you’ll need to contact AWS to request an increase to this limit, or choose another AWS region to create the VPC for ROSA.

Create an OpenID Connect configuration

Before creating a Red Hat OpenShift Service on AWS cluster, let’s create the OpenID Connect (OIDC) configuration with the following command:

rosa create oidc-config --mode=auto --yes

Important: make note of the oidc-provider id that is created.

Last Modified Sep 25, 2025

Deploy OpenShift Cluster in AWS

25 minutes  

Deploy an OpenShift Cluster

We’ll use the ROSA CLI to deploy an OpenShift Cluster.

First, we’ll need to set a few environment variables:

Note: be sure to fill in the Subnet IDs and OIDC ID before running the EXPORT commands

export CLUSTER_NAME=rosa-test
export AWS_REGION=us-east-2
export AWS_INSTANCE_TYPE=g5.2xlarge
export SUBNET_IDS=<comma separated list of subnet IDs from earlier rosa create network command>
export OIDC_ID=<the oidc-provider id returned from the rosa create oidc-config command> 
export OPERATOR_ROLES_PREFIX=rosa-test-a6x9

Create operator roles for the OIDC configuration using the following command:

Note: just accept the default values when prompted.

rosa create operator-roles --hosted-cp --prefix $OPERATOR_ROLES_PREFIX --oidc-config-id $OIDC_ID

Then we can create the cluster as follows:

rosa create cluster \
    --cluster-name $CLUSTER_NAME \
    --mode auto \
    --hosted-cp \
    --sts \
    --create-admin-user \
    --operator-roles-prefix $OPERATOR_ROLES_PREFIX \
    --oidc-config-id $OIDC_ID \
    --subnet-ids $SUBNET_IDS \
    --compute-machine-type $AWS_INSTANCE_TYPE \
    --replicas 2 \
    --region $AWS_REGION 

Note that we’ve specified the g5.2xlarge instance type, which includes NVIDIA GPUs that we’ll be using later in the workshop. This instance type is relatively expensive, about $1.21 per hour at the time of writing, and we’ve requested 2 replicas, so be mindful of how long your cluster is running for, as costs will accumulate quickly.

To determine when your cluster is Ready, run:

rosa describe cluster -c $CLUSTER_NAME

To watch your cluster installation logs, run:

rosa logs install -c $CLUSTER_NAME --watch

Connect to the OpenShift Cluster

Use the command below to connect the oc CLI to your OpenShift cluster:

Note: Run the rosa describe cluster -c $CLUSTER_NAME command and substitute the resulting API Server URL into the command below before running it. For example, the server name might be something like https://api.rosa-test.aaa.bb.openshiftapps.com:443.

 oc login <API Server URL> -u cluster-admin

Once connected to your cluster, confirm that the nodes are up and running:

oc get nodes

NAME                                       STATUS   ROLES    AGE   VERSION
ip-10-0-1-184.us-east-2.compute.internal   Ready    worker   14m   v1.31.11
ip-10-0-1-50.us-east-2.compute.internal    Ready    worker   20m   v1.31.11
Last Modified Oct 2, 2025

Deploy the OpenTelemetry Collector

10 minutes  

Now that our OpenShift cluster is up and running, let’s deploy the OpenTelemetry Collector, which gathers metrics, logs, and traces from the infrastructure and applications running in the cluster, and sends the resulting data to Splunk Observability Cloud.

Deploy the OpenTelemetry Collector

First, we’ll create a new project for the collector and switch to that project:

oc new-project otel 

Ensure Helm is installed:

sudo apt-get install curl gpg apt-transport-https --yes
curl -fsSL https://packages.buildkite.com/helm-linux/helm-debian/gpgkey | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/helm.gpg] https://packages.buildkite.com/helm-linux/helm-debian/any/ any main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm

Add the Splunk OpenTelemetry Collector for Kubernetes’ Helm chart repository:

helm repo add splunk-otel-collector-chart https://signalfx.github.io/splunk-otel-collector-chart

Ensure the repository is up-to-date:

helm repo update

Review the file named ./otel-collector/otel-collector-values.yaml as we’ll be using it to install the OpenTelemetry collector.

Set environment variables to configure the Splunk environment you’d like the collector to send data to:

export ENVIRONMENT_NAME=<which environment to send data to for Splunk Observability Cloud>
export SPLUNK_ACCESS_TOKEN=<your access token for Splunk Observability Cloud> 
export SPLUNK_REALM=<your realm for Splunk Observability Cloud i.e. us0, us1, eu0, etc.>
export SPLUNK_HEC_URL=<HEC endpoint to send logs to Splunk platform i.e. https://<hostname>:443/services/collector/event> 
export SPLUNK_HEC_TOKEN=<HEC token to send logs to Splunk platform> 
export SPLUNK_INDEX=<name of index to send logs to in Splunk platform>

Then install the collector using the following command:

helm install splunk-otel-collector \
  --set="clusterName=$CLUSTER_NAME" \
  --set="environment=$ENVIRONMENT_NAME" \
  --set="splunkObservability.accessToken=$SPLUNK_ACCESS_TOKEN" \
  --set="splunkObservability.realm=$SPLUNK_REALM" \
  --set="splunkPlatform.endpoint=$SPLUNK_HEC_URL" \
  --set="splunkPlatform.token=$SPLUNK_HEC_TOKEN" \
  --set="splunkPlatform.index=$SPLUNK_INDEX" \
  -f ./otel-collector/otel-collector-values.yaml \
  -n otel \
  splunk-otel-collector-chart/splunk-otel-collector

Run the following command to confirm that all of the collector pods are running:

oc get pods

NAME                                                          READY   STATUS    RESTARTS   AGE
splunk-otel-collector-agent-58rwm                             1/1     Running   0          6m40s
splunk-otel-collector-agent-8dndr                             1/1     Running   0          6m40s
splunk-otel-collector-k8s-cluster-receiver-7b7f5cdc5b-rhxsj   1/1     Running   0          6m40s

Confirm that you can see the cluster in Splunk Observability Cloud by navigating to Infrastructure Monitoring -> Kubernetes -> Kubernetes Pods and then filtering on your cluster name:

Kubernetes Pods Kubernetes Pods

Last Modified Oct 3, 2025

Deploy the NVIDIA NIM Operator

20 minutes  

The NVIDIA GPU Operator is a Kubernetes Operator that automates the deployment, configuration, and management of all necessary NVIDIA software components to provision GPUs within a Kubernetes cluster.

The NVIDIA NIM Operator is used to deploy LLMs in Kubernetes environments, such as the OpenShift cluster we created earlier in this workshop.

This section of the workshop walks through the steps necessary to deploy both the NVIDIA GPU and NIM operators in our OpenShift cluster.

Create a NVIDIA NGC Account

An NVIDIA GPU CLOUD (NGC) account is required to download LLMs and deploy them using the NVIDIA NIM operator. You can register here to create an account.

Register with the NVIDIA Developer Program

Registering with the NVIDIA Developer Program allows us to get access to NVIDIA NIM, which we’ll use later in the workshop to deploy LLMs.

Ensure that NVIDIA Developer Program appears on your list of NVIDIA subscriptions in NGC:

NVIDIA Subscriptions NVIDIA Subscriptions

Generate an NGC API Key

Once you’re logged in to the NGC website, click on your user account icon on the top-right corner of the screen and select Setup.

Then click Generate API Key and follow the instructions. Ensure the key is associated with the NGC Catalog and Secrets Manager services.

Save the generated key in a safe place as we’ll use it later in the workshop.

Refer to NVIDIA Documentation for further details on generating an NGC API key.

Install the Node Feature Discovery Operator

The steps in this section are based on Installing the NFD Operator using the CLI .

Run the following script to install the Node Feature Discovery Operator:

cd nvidia
./install-nfd-operator.sh

To verify that the Operator deployment is successful, run:

oc get pods
NAME                                      READY   STATUS    RESTARTS   AGE
nfd-controller-manager-7f86ccfb58-vgr4x   2/2     Running   0          10m

Create a NodeFeatureDiscovery CR

The steps in this section are based on Creating a NodeFeatureDiscovery CR by using the CLI .

Run the following script to create the Node Feature Discovery CR:

./create-nfd-cr.sh

Install the NVIDIA GPU Operator

The steps in this section are based on Installing the NVIDIA GPU Operator on OpenShift.

Run the following script to install the NVIDIA GPU Operator:

./install-nvidia-gpu-operator.sh

Wait until the install plan has been created:

oc get installplan -n nvidia-gpu-operator
NAME            CSV                              APPROVAL   APPROVED
install-mmlxq   gpu-operator-certified.v25.3.4   Manual     false

Approve the install plan with the following commands:

INSTALL_PLAN=$(oc get installplan -n nvidia-gpu-operator -oname)
oc patch $INSTALL_PLAN -n nvidia-gpu-operator --type merge --patch '{"spec":{"approved":true }}'
installplan.operators.coreos.com/install-rc9xq patched

Create the Cluster Policy

The steps in this section are based on Create the cluster policy using the CLI.

./create-cluster-policy.sh

Verify the NVIDIA GPU Operator Installation

Verify the successful installation of the NVIDIA GPU Operator using the following command:

oc get pods,daemonset -n nvidia-gpu-operator
NAME                                                      READY   STATUS      RESTARTS      AGE
pod/gpu-feature-discovery-sblkn                           1/1     Running     0             5m5s
pod/gpu-feature-discovery-zpt94                           1/1     Running     0             4m58s
pod/gpu-operator-6579bc6fdc-cp28l                         1/1     Running     0             23m
pod/nvidia-container-toolkit-daemonset-qfcl9              1/1     Running     0             5m5s
pod/nvidia-container-toolkit-daemonset-zbwb6              1/1     Running     0             4m59s
pod/nvidia-cuda-validator-f7tl2                           0/1     Completed   0             78s
pod/nvidia-cuda-validator-t7n9g                           0/1     Completed   0             71s
pod/nvidia-dcgm-exporter-gk66x                            1/1     Running     0             4m59s
pod/nvidia-dcgm-exporter-w8kr8                            1/1     Running     2 (52s ago)   5m5s
pod/nvidia-dcgm-lrnzr                                     1/1     Running     0             4m58s
pod/nvidia-dcgm-tvrdm                                     1/1     Running     0             5m5s
pod/nvidia-device-plugin-daemonset-d62nk                  1/1     Running     0             5m5s
pod/nvidia-device-plugin-daemonset-fnv4j                  1/1     Running     0             4m59s
pod/nvidia-driver-daemonset-418.94.202509100653-0-5xbvq   2/2     Running     0             5m48s
pod/nvidia-driver-daemonset-418.94.202509100653-0-hmkdl   2/2     Running     0             5m48s
pod/nvidia-node-status-exporter-2kqwr                     1/1     Running     0             5m44s
pod/nvidia-node-status-exporter-n8d9s                     1/1     Running     0             5m44s
pod/nvidia-operator-validator-r2nm2                       1/1     Running     0             5m5s
pod/nvidia-operator-validator-w2fpn                       1/1     Running     0             4m59s

NAME                                                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                                                                         AGE
daemonset.apps/gpu-feature-discovery                           2         2         2       2            2           nvidia.com/gpu.deploy.gpu-feature-discovery=true                                                                      5m45s
daemonset.apps/nvidia-container-toolkit-daemonset              2         2         2       2            2           nvidia.com/gpu.deploy.container-toolkit=true                                                                          5m48s
daemonset.apps/nvidia-dcgm                                     2         2         2       2            2           nvidia.com/gpu.deploy.dcgm=true                                                                                       5m46s
daemonset.apps/nvidia-dcgm-exporter                            2         2         2       2            2           nvidia.com/gpu.deploy.dcgm-exporter=true                                                                              5m46s
daemonset.apps/nvidia-device-plugin-daemonset                  2         2         2       2            2           nvidia.com/gpu.deploy.device-plugin=true                                                                              5m47s
daemonset.apps/nvidia-device-plugin-mps-control-daemon         0         0         0       0            0           nvidia.com/gpu.deploy.device-plugin=true,nvidia.com/mps.capable=true                                                  5m47s
daemonset.apps/nvidia-driver-daemonset-418.94.202509100653-0   2         2         2       2            2           feature.node.kubernetes.io/system-os_release.OSTREE_VERSION=418.94.202509100653-0,nvidia.com/gpu.deploy.driver=true   5m48s
daemonset.apps/nvidia-mig-manager                              0         0         0       0            0           nvidia.com/gpu.deploy.mig-manager=true                                                                                5m45s
daemonset.apps/nvidia-node-status-exporter                     2         2         2       2            2           nvidia.com/gpu.deploy.node-status-exporter=true                                                                       5m44s
daemonset.apps/nvidia-operator-validator                       2         2         2       2            2           nvidia.com/gpu.deploy.operator-validator=true                                                                         5m48s

Install the Operator SDK

The steps in this section are based on Install from GitHub release.

Download the release binary

Set platform information:

export ARCH=$(case $(uname -m) in x86_64) echo -n amd64 ;; aarch64) echo -n arm64 ;; *) echo -n $(uname -m) ;; esac)
export OS=$(uname | awk '{print tolower($0)}')

Download the binary for your platform:

export OPERATOR_SDK_DL_URL=https://github.com/operator-framework/operator-sdk/releases/download/v1.41.1
curl -LO ${OPERATOR_SDK_DL_URL}/operator-sdk_${OS}_${ARCH}

Verify the downloaded binary

Import the operator-sdk release GPG key from keyserver.ubuntu.com:

gpg --keyserver keyserver.ubuntu.com --recv-keys 052996E2A20B5C7E

Download the checksums file and its signature, then verify the signature:

curl -LO ${OPERATOR_SDK_DL_URL}/checksums.txt
curl -LO ${OPERATOR_SDK_DL_URL}/checksums.txt.asc
gpg -u "Operator SDK (release) <cncf-operator-sdk@cncf.io>" --verify checksums.txt.asc

You should see something similar to the following:

gpg: assuming signed data in 'checksums.txt'
gpg: Signature made Fri 30 Oct 2020 12:15:15 PM PDT
gpg:                using RSA key ADE83605E945FA5A1BD8639C59E5B47624962185
gpg: Good signature from "Operator SDK (release) <cncf-operator-sdk@cncf.io>" [ultimate]

Make sure the checksums match:

grep operator-sdk_${OS}_${ARCH} checksums.txt | sha256sum -c -

You should see something similar to the following:

operator-sdk_linux_amd64: OK

Install the release binary in your PATH

chmod +x operator-sdk_${OS}_${ARCH} && sudo mv operator-sdk_${OS}_${ARCH} /usr/local/bin/operator-sdk

Install the NGC CLI

The steps in this section are based on NGC CLI Install.

Click Download CLI to download the zip file that contains the binary, then transfer the zip file to a directory where you have permissions and then unzip and execute the binary. You can also download, unzip, and install from the command line by moving to a directory where you have execute permissions and then running the following command:

wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/4.3.0/files/ngccli_linux.zip -O ngccli_linux.zip && unzip ngccli_linux.zip

Check the binary’s md5 hash to ensure the file wasn’t corrupted during download:

find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5

Check the binary’s SHA256 hash to ensure the file wasn’t corrupted during download. Run the following command

sha256sum ngccli_linux.zip

Compare with the following value, which can also be found in the Release Notes of the Resource:

5f01eff85a66c895002f3c87db2933c462f3b86e461e60d515370f647b4ffc21

After verifying value, make the NGC CLI binary executable and add your current directory to path:

chmod u+x ngc-cli/ngc
echo "export PATH=\"\$PATH:$(pwd)/ngc-cli\"" >> ~/.bash_profile && source ~/.bash_profile

You must configure NGC CLI for your use so that you can run the commands.

Enter the following command, including your API key when prompted:

ngc config set

Define an environment variable with your NGC API key:

export NGC_API_KEY=<your NGC API key> 

Install the NVIDIA NIM Operator

The steps in this section are based on Installing NIM Operator on Red Hat OpenShift Using operator-sdk (for Development-Only).

Run the following script to install the NIM operator:

./install-nim-operator.sh

Confirm the controller pod is running:

oc get pods -n nvidia-nim-operator
NAME                                                              READY   STATUS      RESTARTS   AGE
ec60a4439c710b89fc2582f5384382b4241f9aee62bb3182b8d128e69dx54dc   0/1     Completed   0          61s
ghcr-io-nvidia-k8s-nim-operator-bundle-latest-main                1/1     Running     0          71s
k8s-nim-operator-86d478b55c-w5cf5                                 1/1     Running     0          50s
Last Modified Oct 3, 2025

Deploy an LLM

20 minutes  

In this section, we’ll use the NVIDIA NIM Operator to deploy two Large Language Models to our OpenShift Cluster.

Create a Namespace

oc create namespace nim-service

Add Secrets with NGC API Key

Add a Docker registry secret for downloading container images from NVIDIA NGC:

oc create secret -n nim-service docker-registry ngc-secret \
    --docker-server=nvcr.io \
    --docker-username='$oauthtoken' \
    --docker-password=$NGC_API_KEY

Add a generic secret that model puller containers use to download the model from NVIDIA NGC:

oc create secret -n nim-service generic ngc-api-secret \
    --from-literal=NGC_API_KEY=$NGC_API_KEY

Deploy an LLM

Run the following command to create the NIMCache and NIMService:

oc apply -n nim-service -f nvidia-llm.yaml

Confirm that the Persistent Volume was created and the Persistent Volume Claim was bound to is successfully:

Note: this can take several minutes to occur

oc get pv,pvc -n nim-service
NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                   STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
persistentvolume/pvc-1af12c04-29ad-497f-b018-7d9a3aea3019   100Gi      RWO            Delete           Bound    openshift-monitoring/prometheus-data-prometheus-k8s-1   gp3-csi        <unset>                          4h15m
persistentvolume/pvc-9c389d79-13fb-4169-9d99-a77efd6e7919   100Gi      RWO            Delete           Bound    openshift-monitoring/prometheus-data-prometheus-k8s-0   gp3-csi        <unset>                          4h15m
persistentvolume/pvc-a603b8a7-1445-4b03-945a-3ed68338834c   50Gi       RWO            Delete           Bound    nim-service/meta-llama-3-2-1b-instruct-pvc              gp3-csi        <unset>                          114s

NAME                                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/meta-llama-3-2-1b-instruct-pvc   Bound    pvc-a603b8a7-1445-4b03-945a-3ed68338834c   50Gi       RWO            gp3-csi        <unset>                 7m8s

Confirm that the NIMCache is Ready:

oc get nimcache.apps.nvidia.com -n nim-service
NAME                         STATUS   PVC                              AGE
meta-llama-3-2-1b-instruct   Ready    meta-llama-3-2-1b-instruct-pvc   9m50s

Confirm that the NIMService is Ready:

oc get nimservices.apps.nvidia.com -n nim-service
NAME                         STATUS   AGE
meta-llama-3-2-1b-instruct   Ready    11m

Test the LLM

Let’s ensure the LLM is working as expected.

Start a pod that has access to the curl command:

oc run --rm -it -n default curl --image=curlimages/curl:latest -- sh

Then run the following command to send a prompt to the LLM:

curl -X "POST" \
 'http://meta-llama-3-2-1b-instruct.nim-service:8000/v1/chat/completions' \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
        "model": "meta/llama-3.2-1b-instruct",
        "messages": [
        {
          "content":"What is the capital of Canada?",
          "role": "user"
        }],
        "top_p": 1,
        "n": 1,
        "max_tokens": 1024,
        "stream": false,
        "frequency_penalty": 0.0,
        "stop": ["STOP"]
      }'
{
  "id": "chatcmpl-2ccfcd75a0214518aab0ef0375f8ca21",
  "object": "chat.completion",
  "created": 1758919002,
  "model": "meta/llama-3.2-1b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "The capital of Canada is Ottawa.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "total_tokens": 50,
    "completion_tokens": 8,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

Deploy an Embeddings Model

We’re also going to deploy an embeddings model in our cluster, which will be used later in the workshop to implement Retrieval Augmented Generation (RAG).

Run the following command to deploy the embeddings model:

oc apply -n nim-service -f nvidia-embeddings.yaml

Confirm that the NIMService is Ready:

oc get nimservices.apps.nvidia.com llama-32-nv-embedqa-1b-v2 -n nim-service
NAME                        STATUS   AGE
llama-32-nv-embedqa-1b-v2   Ready    82s

Test the Embeddings Model

Let’s ensure the embeddings is working as expected.

Start a pod that has access to the curl command:

oc run --rm -it -n default curl --image=curlimages/curl:latest -- sh

Then run the following command to send a prompt to the LLM:

  curl -X POST http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1/embeddings \
  -H 'Accept: application/json' \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["What is the capital of France?"],
    "model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
    "input_type": "query",
    "encoding_format": "float",
    "truncate": "NONE"
  }'
{"object":"list","data":[{"index":0,"embedding":[-0.016632080078125,0.041259765625,-0.0156707763671875,0.032379150390625,0.045074462890625,0.0169830322265625,-0.03546142578125,-0.0003402233123779297,-0.038909912109375,-0.0023651123046875,-0.0001741647720336914,-0.01377105712890625,-0.01200103759765625,-0.00659942626953125,0.002536773681640625,0.0185394287109375,0.01546478271484375,0.0216827392578125,0.0139923095703125,-0.0121612548828125,-0.015869140625,0.005313873291015625,-0.020599365234375,0.02984619140625,-0.031982421875,0.0005679130554199219,-0.021697998046875,-0.0305938720703125,0.027618408203125,0.005340576171875,0.011993408203125,0.0135345458984375,-0.015625,0.036651611328125,-0.0210113525390625,0.0033321380615234375,0.033172607421875,-0.009552001953125,0.0226287841796875,0.01448822021484375,-0.05474853515625,-0.00861358642578125,-0.01513671875,-0.028656005859375,-0.0095977783203125,-0.025146484375,-0.0352783203125,0.03106689453125,-0.00726318359375,0.0157623291015625,0.01319122314453125,-0.005218505859375,-0.0013561248779296875,0.01277923583984375,-0.007328033447265625,0.01486968994140625,-0.05413818359375,-0.022125244140625,-0.015869140625,-0.00917816162109375,0.0186309814453125,-0.00814056396484375,-0.04730224609375,0.01406097412109375,-0.0248260498046875,0.0094757080078125,0.0309600830078125,0.0196533203125,-0.0270843505859375,-0.01113128662109375,-0.0056915283203125,-0.01154327392578125,0.037750244140625,-0.0028896331787109375,-0.00376129150390625,0.03692626953125,0.020416259765625,0.00200653076171875,0.0396728515625,0.004985809326171875,-0.04425048828125,-0.034820556640625,-0.0102386474609375,0.0218505859375,0.003604888916015625,0.00940704345703125,0.01468658447265625,0.0089111328125,-0.0032196044921875,-0.043243408203125,0.015411376953125,-0.00653839111328125,-0.01128387451171875,-0.052734375,0.032684326171875,-0.0101470947265625,-0.018218994140625,-0.003955841064453125,0.007648468017578125,-0.02044677734375,-0.0285186767578125,0.0153350830078125,0.03692626953125,0.0147247314453125,0.01043701171875,0.0007462501525878906,0.0261077880859375,0.024169921875,-0.040283203125,-0.005828857421875,0.00736236572265625,-0.00909423828125,0.00920867919921875,-0.00243377685546875,-0.03204345703125,-0.0232696533203125,0.01131439208984375,-0.0192413330078125,-0.025482177734375,-0.0267333984375,-0.03350830078125,-0.0008440017700195312,-0.040496826171875,0.0281982421875,-0.03533935546875,-0.0005216598510742188,0.061859130859375,0.0439453125,-0.00656890869140625,0.004627227783203125,-0.05615234375,0.027801513671875,-0.0027790069580078125,-0.011322021484375,-0.00841522216796875,0.027191162109375,-0.0018453598022460938,-0.00560760498046875,0.020263671875,-0.0295867919921875,0.03759765625,-0.00951385498046875,0.014190673828125,-0.039764404296875,-0.0199127197265625,-0.007427215576171875,-0.021392822265625,0.00946807861328125,0.03955078125,0.0015478134155273438,0.0227203369140625,0.020843505859375,-0.00531768798828125,0.0087432861328125,-0.01617431640625,0.002574920654296875,0.0104217529296875,0.01215362548828125,0.00970458984375,0.003803253173828125,0.005451202392578125,-0.001972198486328125,-0.01171875,-0.0216217041015625,0.033355712890625,-0.007137298583984375,0.00890350341796875,-0.01293182373046875,0.0189666748046875,-0.0168609619140625,-0.0153350830078125,0.0113525390625,-0.0123443603515625,0.046905517578125,0.0082244873046875,-0.019500732421875,0.004482269287109375,0.01007080078125,0.0037708282470703125,0.053619384765625,0.0171051025390625,-0.0305023193359375,-0.0240478515625,0.007648468017578125,-0.03973388671875,0.00847625732421875,-0.0207061767578125,-0.025115966796875,0.0168914794921875,0.03619384765625,0.00720977783203125,0.00803375244140625,0.022064208984375,-0.01111602783203125,-0.0024890899658203125,0.0047760009765625,-0.02374267578125,-0.0197296142578125,0.0203704833984375,0.00572967529296875,0.0254058837890625,0.0186920166015625,-0.022796630859375,-0.02191162109375,-0.0182952880859375,-0.00365447998046875,-0.01262664794921875,-0.0128326416015625,0.0157318115234375,-0.004596710205078125,0.033843994140625,0.01313018798828125,0.00656890869140625,0.0004596710205078125,-0.020355224609375,0.03411865234375,0.00034546852111816406,0.0205230712890625,0.02960205078125,-0.0157318115234375,-0.051483154296875,-0.0255584716796875,0.039825439453125,0.0258636474609375,0.038238525390625,-0.03424072265625,0.0188140869140625,0.03216552734375,-0.048126220703125,-0.0227203369140625,0.0032215118408203125,-0.0156097412109375,-0.003170013427734375,0.01444244384765625,-0.0232086181640625,0.002437591552734375,0.012451171875,-0.0066680908203125,0.01158905029296875,0.026397705078125,-0.00373077392578125,0.008087158203125,-0.00798797607421875,-0.0173187255859375,-0.013580322265625,0.033660888671875,-0.0028629302978515625,0.046295166015625,0.0299530029296875,-0.0159759521484375,-0.005580902099609375,0.0015277862548828125,0.01123046875,0.0031585693359375,-0.01151275634765625,0.00814056396484375,-0.0369873046875,0.0267181396484375,0.0013856887817382812,0.028656005859375,0.01409149169921875,0.035614013671875,-0.01189422607421875,-0.01190185546875,-0.053619384765625,0.037139892578125,-0.02288818359375,-0.024139404296875,0.01678466796875,-0.01020050048828125,-0.0222930908203125,-0.01059722900390625,0.044525146484375,0.006526947021484375,0.006084442138671875,-0.0015411376953125,-0.040618896484375,0.027801513671875,-0.00839996337890625,-0.01126861572265625,-0.01000213623046875,0.01197052001953125,-0.01062774658203125,-0.0155487060546875,0.011688232421875,0.01044464111328125,-0.0528564453125,0.031005859375,-0.0007734298706054688,8.52346420288086e-6,-0.00894927978515625,-0.005870819091796875,0.04254150390625,-0.03216552734375,0.0215911865234375,-0.01029205322265625,-0.01015472412109375,-0.036285400390625,0.0111083984375,0.0018520355224609375,-0.01177215576171875,0.01256561279296875,-0.004901885986328125,-0.006866455078125,-0.0084381103515625,-0.01160430908203125,-0.044891357421875,0.00632476806640625,0.0015001296997070312,-0.0016727447509765625,-0.013031005859375,-0.01404571533203125,-0.035888671875,-0.013397216796875,-0.0006346702575683594,0.00981903076171875,-0.0134735107421875,-0.022705078125,-0.02606201171875,-0.018402099609375,-0.046966552734375,-0.012542724609375,0.0183563232421875,0.0179901123046875,-0.0225067138671875,0.02801513671875,-0.032379150390625,-0.0079803466796875,0.0070953369140625,0.017333984375,0.026611328125,-0.03778076171875,-0.023590087890625,-0.005245208740234375,0.024383544921875,-0.02105712890625,0.020843505859375,0.033905029296875,0.0225372314453125,0.00942230224609375,0.0005245208740234375,-0.0284881591796875,0.01499176025390625,-0.0124664306640625,-0.0267333984375,0.023040771484375,0.01265716552734375,-0.0026264190673828125,0.00955963134765625,-0.0036773681640625,-0.0394287109375,0.015716552734375,-0.01300048828125,0.0187225341796875,-0.01275634765625,-0.0273590087890625,0.045562744140625,-0.00913238525390625,-0.004268646240234375,-0.005107879638671875,-0.026702880859375,0.0015077590942382812,-0.02862548828125,0.0003228187561035156,0.0099334716796875,-0.0305328369140625,-0.0362548828125,0.0114898681640625,-0.00025653839111328125,-0.0022735595703125,0.0106201171875,0.01090240478515625,0.00992584228515625,0.00998687744140625,-0.00634002685546875,0.00711822509765625,-0.02337646484375,-0.01367950439453125,-0.006389617919921875,-0.006000518798828125,0.01027679443359375,-0.00838470458984375,0.004673004150390625,0.002841949462890625,0.014404296875,-0.02838134765625,0.023834228515625,-0.00823974609375,-0.038970947265625,0.003002166748046875,-0.04510498046875,-0.0265655517578125,-0.0036182403564453125,-0.046661376953125,-0.01062774658203125,-0.05804443359375,-0.02117919921875,-0.029815673828125,0.036712646484375,-0.0069122314453125,0.0079345703125,0.0164794921875,-0.007534027099609375,-0.01111602783203125,0.0135650634765625,-0.0017242431640625,0.009490966796875,-0.0222320556640625,0.043853759765625,0.054718017578125,-0.003208160400390625,-0.004199981689453125,0.01529693603515625,-0.007190704345703125,0.00637054443359375,-0.004749298095703125,-0.0217132568359375,-0.0093841552734375,-0.0335693359375,-0.0017490386962890625,0.0081939697265625,0.0247802734375,0.0148468017578125,0.026763916015625,0.002079010009765625,0.0292816162109375,0.04705810546875,0.02166748046875,-0.0120697021484375,0.01050567626953125,0.0131988525390625,0.0169525146484375,0.0291595458984375,-0.00270843505859375,-0.0095062255859375,-0.0211944580078125,-0.035980224609375,0.006805419921875,0.002735137939453125,0.043731689453125,-0.01515960693359375,0.0010576248168945312,-0.00913238525390625,0.001293182373046875,-0.00027489662170410156,-0.00868988037109375,0.007389068603515625,0.0023212432861328125,-0.01528167724609375,0.017852783203125,-0.03643798828125,0.045623779296875,-0.0030364990234375,-0.0271453857421875,0.0268402099609375,-0.0033473968505859375,0.0186920166015625,-0.0225067138671875,0.0125732421875,-0.01386260986328125,-0.0218658447265625,0.01248931884765625,0.025848388671875,0.021453857421875,0.008056640625,0.025421142578125,0.01224517822265625,0.0208740234375,-0.003856658935546875,-0.021209716796875,-0.00545501708984375,-0.0254058837890625,0.04388427734375,0.0204315185546875,-0.0072174072265625,-0.0110626220703125,0.0007481575012207031,-0.0022411346435546875,-0.046905517578125,-0.028472900390625,0.0196533203125,0.014129638671875,0.0130615234375,-0.01288604736328125,-0.03607177734375,-0.01568603515625,-0.00814056396484375,-0.01499176025390625,0.0112152099609375,-0.00360870361328125,0.024688720703125,-0.0189361572265625,-0.007122039794921875,0.00634002685546875,-0.00626373291015625,-0.000766754150390625,0.0193939208984375,-0.002841949462890625,0.041717529296875,-0.00016701221466064453,-0.043365478515625,-0.023773193359375,0.0283660888671875,0.0245208740234375,-0.055450439453125,0.01096343994140625,-0.0180511474609375,0.0189056396484375,0.0164947509765625,-0.033111572265625,0.0262603759765625,0.0294189453125,0.00084686279296875,0.0279388427734375,-0.003910064697265625,0.002910614013671875,0.00890350341796875,-0.033843994140625,0.004856109619140625,0.00033974647521972656,-0.056549072265625,-0.0110626220703125,-0.0178375244140625,0.006381988525390625,0.018798828125,0.0205230712890625,-0.05609130859375,-0.01023101806640625,-0.001201629638671875,-0.02227783203125,0.01910400390625,0.006931304931640625,0.0017032623291015625,-0.01849365234375,-0.0249786376953125,-0.0176849365234375,0.007389068603515625,-0.01025390625,0.036407470703125,-0.0275421142578125,0.021514892578125,-0.0198822021484375,-0.0189056396484375,-0.0156402587890625,0.01025390625,0.02197265625,-0.007740020751953125,-0.034515380859375,0.0011262893676757812,0.024566650390625,0.0229339599609375,0.004810333251953125,-0.01171875,-0.0238189697265625,0.021392822265625,0.0008301734924316406,0.019378662109375,-0.00894927978515625,-0.01496124267578125,0.01558685302734375,-0.0229339599609375,0.00020587444305419922,-0.0202178955078125,0.0298919677734375,0.00969696044921875,-0.0011949539184570312,-0.007144927978515625,-0.0198211669921875,0.0030422210693359375,-0.037811279296875,-0.039306640625,-0.027587890625,-0.0274810791015625,0.025390625,-0.0333251953125,-0.0062103271484375,-0.016876220703125,0.002651214599609375,-0.0020275115966796875,0.042144775390625,0.013092041015625,0.01690673828125,0.0268707275390625,0.0082244873046875,0.066650390625,0.0053253173828125,0.08526611328125,-0.0146331787109375,-0.0261688232421875,-0.04266357421875,0.004474639892578125,-0.005229949951171875,-0.01806640625,0.00479888916015625,0.00183868408203125,-0.01030731201171875,0.0028285980224609375,-0.0239410400390625,0.0166778564453125,0.0006723403930664062,-0.00923919677734375,-0.00504302978515625,0.0159759521484375,-0.0248260498046875,0.03179931640625,-0.01517486572265625,-0.0006771087646484375,-0.0117645263671875,0.016510009765625,0.00168609619140625,-0.016387939453125,0.0421142578125,-0.00951385498046875,-0.00388336181640625,-0.04559326171875,-0.0194091796875,0.043853759765625,-0.007541656494140625,0.0275421142578125,-0.005645751953125,0.003803253173828125,-0.01438140869140625,0.018218994140625,-0.006381988525390625,-0.012664794921875,-0.011962890625,0.035186767578125,0.0225067138671875,-0.005321502685546875,-0.007659912109375,0.0022792816162109375,-0.00830078125,-0.0092926025390625,-0.0278778076171875,-0.00011402368545532227,0.0027523040771484375,0.0082855224609375,0.0175933837890625,0.0029430389404296875,0.0721435546875,0.01525115966796875,-0.059967041015625,-0.0626220703125,0.0222625732421875,-0.05810546875,-0.01192474365234375,-0.0056610107421875,0.0173492431640625,-0.0008497238159179688,-0.01050567626953125,-0.01558685302734375,0.0032196044921875,0.00745391845703125,-0.05029296875,0.00310516357421875,0.0333251953125,-0.01166534423828125,-0.0347900390625,-0.00830078125,0.01305389404296875,0.01030731201171875,0.017730712890625,-0.007415771484375,-0.00287628173828125,0.01197052001953125,-0.004016876220703125,-0.038421630859375,0.000743865966796875,-0.006237030029296875,0.0511474609375,-0.003826141357421875,-0.00838470458984375,-0.007572174072265625,0.00522613525390625,0.01514434814453125,0.00557708740234375,-0.035186767578125,0.0077056884765625,-0.0330810546875,-0.0043487548828125,-0.0307464599609375,-0.00670623779296875,0.01395416259765625,-0.0247039794921875,-0.03399658203125,0.0176849365234375,-0.00827789306640625,-0.0132293701171875,0.011016845703125,0.00740814208984375,-0.022735595703125,0.01110076904296875,-0.0127105712890625,-0.01074981689453125,-0.04150390625,-0.05438232421875,-0.0014743804931640625,-0.00507354736328125,-0.05291748046875,-0.0126800537109375,0.032135009765625,0.0266571044921875,-0.0240020751953125,-0.0033702850341796875,0.0021076202392578125,0.0206756591796875,0.01454925537109375,-0.00954437255859375,0.0178680419921875,0.004734039306640625,-0.0014028549194335938,0.0109710693359375,-0.0200042724609375,-0.030029296875,0.04022216796875,-0.0190887451171875,0.028594970703125,0.0205841064453125,-0.0028095245361328125,0.0024242401123046875,-0.0151214599609375,0.0025386810302734375,-0.006633758544921875,0.01265716552734375,-0.019073486328125,0.0030384063720703125,-0.024871826171875,-0.01148223876953125,0.00914764404296875,-0.004367828369140625,-0.0186920166015625,0.021514892578125,-0.027435302734375,0.00736236572265625,0.037872314453125,-0.00222015380859375,0.0041351318359375,-0.0224151611328125,-0.0255279541015625,0.03271484375,-0.0242919921875,0.0097198486328125,-0.02008056640625,-0.01003265380859375,-0.0215606689453125,-0.00974273681640625,-0.0428466796875,-0.0343017578125,-0.0006017684936523438,-0.0230865478515625,0.020782470703125,0.01134490966796875,0.0107421875,-0.0165863037109375,-0.0043487548828125,0.0165252685546875,0.0276947021484375,0.0051116943359375,0.03497314453125,0.0288848876953125,0.0205230712890625,-0.0099029541015625,0.0014505386352539062,-0.045074462890625,-0.0226898193359375,0.002422332763671875,0.0013151168823242188,-0.0031642913818359375,-0.0247344970703125,0.013885498046875,-0.002410888671875,0.046051025390625,0.0328369140625,0.04193115234375,0.006710052490234375,-0.004138946533203125,-0.031768798828125,0.024658203125,0.00417327880859375,-0.01116943359375,0.0097198486328125,-0.021270751953125,0.0285491943359375,0.02581787109375,0.0167083740234375,0.0206298828125,0.009185791015625,0.00794219970703125,-0.0022792816162109375,0.004337310791015625,-0.01166534423828125,-0.01227569580078125,0.00905609130859375,0.0156707763671875,-0.04217529296875,0.025054931640625,-0.01058197021484375,0.0171356201171875,0.001369476318359375,0.003917694091796875,-0.00817108154296875,0.026123046875,0.0200042724609375,-0.0294189453125,0.032440185546875,-0.0297393798828125,-0.0109100341796875,-0.00856781005859375,0.0034465789794921875,0.0186920166015625,0.0199737548828125,-0.03558349609375,-0.025146484375,-0.009307861328125,0.0081024169921875,0.0131378173828125,0.0117340087890625,0.0063018798828125,0.0000546574592590332,0.01898193359375,-0.0167694091796875,0.01666259765625,0.0374755859375,0.02374267578125,-0.0103912353515625,0.01207733154296875,-0.032989501953125,-0.004108428955078125,-0.0026798248291015625,0.01166534423828125,0.0257568359375,-0.056732177734375,0.0282745361328125,-0.0034351348876953125,-0.007415771484375,0.0081634521484375,0.029998779296875,0.0019369125366210938,-0.0014734268188476562,0.004573822021484375,0.04296875,0.025665283203125,-0.0121307373046875,0.029266357421875,0.016815185546875,-0.002536773681640625,-0.015045166015625,-0.0211334228515625,0.0020351409912109375,0.008087158203125,-0.004528045654296875,-0.0172882080078125,0.023712158203125,0.0305633544921875,0.0213470458984375,-0.0154266357421875,-0.035675048828125,0.0002543926239013672,0.01149749755859375,0.00833892822265625,0.01506805419921875,0.019500732421875,-0.01265716552734375,0.01947021484375,0.0242767333984375,-0.017486572265625,-0.01294708251953125,-0.012603759765625,-0.0093994140625,-0.00226593017578125,0.020355224609375,-0.0369873046875,0.0166168212890625,0.034332275390625,-0.0240631103515625,-0.03558349609375,0.036376953125,-0.009246826171875,0.0041656494140625,0.0439453125,-0.023284912109375,0.004749298095703125,-0.0232391357421875,-0.0105743408203125,-0.01030731201171875,-0.01318359375,0.0220184326171875,0.005840301513671875,0.0217437744140625,-0.01007080078125,0.01398468017578125,0.0019063949584960938,-0.011383056640625,-0.00424957275390625,-0.0208282470703125,0.012237548828125,0.01526641845703125,0.00959014892578125,0.027191162109375,0.001735687255859375,0.0177154541015625,-0.01139068603515625,0.0218963623046875,0.03814697265625,-0.018951416015625,0.011016845703125,-0.01287078857421875,0.046875,-0.007415771484375,0.01198577880859375,-0.02532958984375,0.00311279296875,0.018524169921875,0.005390167236328125,-0.01435089111328125,0.0018949508666992188,0.0421142578125,0.0045928955078125,-0.006099700927734375,0.007049560546875,0.00502777099609375,-0.00963592529296875,0.00894927978515625,-0.034515380859375,-0.0035114288330078125,-0.0142974853515625,-0.034515380859375,-0.02142333984375,0.017608642578125,-0.014892578125,-0.01244354248046875,-0.017486572265625,0.00013899803161621094,0.00011283159255981445,-0.00756072998046875,-0.0132293701171875,0.0108489990234375,0.0305328369140625,-0.001163482666015625,-0.002880096435546875,-0.0007386207580566406,0.00370025634765625,0.00797271728515625,-0.010528564453125,-0.0073089599609375,-0.0279693603515625,-0.01343536376953125,-0.005908966064453125,-0.0003764629364013672,0.053955078125,0.0237884521484375,-0.053497314453125,-0.01165771484375,-0.037628173828125,0.0099639892578125,-0.02386474609375,0.032958984375,0.0239715576171875,0.0016231536865234375,-0.033111572265625,0.0007448196411132812,0.0245819091796875,-0.0094757080078125,-0.03131103515625,-0.02459716796875,0.021453857421875,0.01398468017578125,-0.0017442703247070312,0.054107666015625,0.0193328857421875,0.0057373046875,0.03485107421875,0.0258636474609375,0.004131317138671875,-0.02239990234375,-0.002368927001953125,0.01102447509765625,-0.017181396484375,0.01454925537109375,-0.0119781494140625,-0.0017871856689453125,-0.0166778564453125,0.008544921875,-0.0135345458984375,-0.03192138671875,0.0030956268310546875,-0.0279083251953125,0.0235595703125,-0.017974853515625,0.0108184814453125,0.0031032562255859375,-0.003093719482421875,-0.014129638671875,0.01361083984375,-0.03619384765625,-0.00826263427734375,0.033477783203125,-0.004150390625,0.0157012939453125,0.0011501312255859375,0.059844970703125,-0.01555633544921875,0.031219482421875,0.0177001953125,-0.0307464599609375,0.01264190673828125,0.0291290283203125,0.01045989990234375,-0.0097503662109375,0.01226806640625,0.00598907470703125,0.01849365234375,-0.02801513671875,-0.0112152099609375,-0.006011962890625,-0.006664276123046875,0.00928497314453125,0.0002186298370361328,-0.0012874603271484375,-0.0233001708984375,-0.0065155029296875,-0.0220947265625,-0.00310516357421875,0.049041748046875,-0.04925537109375,0.0262451171875,-0.0028095245361328125,-0.0091400146484375,0.0240631103515625,-0.002864837646484375,0.0120391845703125,-0.021942138671875,0.0347900390625,0.023834228515625,-0.0134429931640625,0.00028228759765625,0.0277557373046875,0.03082275390625,0.006237030029296875,-0.015350341796875,-0.005039215087890625,0.0145416259765625,0.01226806640625,-0.01474761962890625,-0.004917144775390625,-0.005733489990234375,-0.010986328125,0.0223236083984375,0.0224609375,-0.035736083984375,-0.008544921875,-0.0009150505065917969,-0.0119476318359375,0.0178070068359375,-0.005352020263671875,-0.01558685302734375,-0.0208740234375,-0.0160675048828125,0.0069122314453125,-0.0357666015625,0.01319122314453125,-0.00457000732421875,0.00502777099609375,-0.0006170272827148438,0.0032196044921875,-0.008209228515625,0.0026721954345703125,-0.022705078125,0.01666259765625,-0.0217132568359375,-0.024017333984375,-0.00527191162109375,0.0005908012390136719,0.0028228759765625,-0.0205841064453125,-0.05108642578125,0.02947998046875,-0.00861358642578125,-0.035552978515625,-0.0090484619140625,-0.044464111328125,-0.0284881591796875,0.004901885986328125,0.00669097900390625,0.020538330078125,0.01218414306640625,0.01477813720703125,0.0011930465698242188,0.027587890625,-0.037811279296875,0.0273284912109375,-0.0006680488586425781,0.0179901123046875,0.047393798828125,0.033355712890625,-0.018646240234375,-0.031585693359375,-0.0190887451171875,0.0059051513671875,-0.005916595458984375,0.0247802734375,0.00881195068359375,-0.004108428955078125,-0.0091552734375,0.021697998046875,-0.0207061767578125,0.0207977294921875,-0.048095703125,-0.01544189453125,0.015533447265625,0.0228424072265625,0.0255126953125,-0.0172119140625,-0.0450439453125,0.0005936622619628906,0.0027103424072265625,0.03704833984375,-0.018218994140625,-0.00972747802734375,0.0067901611328125,-0.000598907470703125,-0.00482940673828125,-0.00786590576171875,0.0011510848999023438,0.0364990234375,-0.0128631591796875,-0.0198822021484375,0.0000896453857421875,-0.022735595703125,0.01479339599609375,-0.0034351348876953125,0.0120086669921875,0.0070037841796875,-0.01971435546875,0.04010009765625,0.0034389495849609375,-0.0109100341796875,0.01395416259765625,0.03509521484375,0.01096343994140625,-0.0209808349609375,-0.0009293556213378906,-0.00043487548828125,0.005519866943359375,-0.016448974609375,0.032470703125,0.0284881591796875,0.0144195556640625,-0.0307464599609375,0.0217437744140625,-0.0303497314453125,-0.05926513671875,0.01444244384765625,-0.01264190673828125,0.040313720703125,-0.012603759765625,-0.0178375244140625,-0.04339599609375,0.01222991943359375,-0.0025005340576171875,-0.010406494140625,-0.003086090087890625,-0.0214385986328125,0.01045989990234375,0.005886077880859375,-0.0175933837890625,0.04840087890625,-0.0168914794921875,0.01800537109375,-0.01354217529296875,-0.01383209228515625,0.04083251953125,0.034271240234375,0.021514892578125,0.04022216796875,0.0231781005859375,-0.01110076904296875,-0.0224151611328125,0.0021991729736328125,-0.01206207275390625,-0.01557159423828125,0.0548095703125,0.02618408203125,0.023956298828125,-0.00994110107421875,-0.004299163818359375,0.007030487060546875,-0.0113372802734375,0.0140228271484375,-0.01084136962890625,0.010711669921875,-0.0236358642578125,0.01776123046875,0.04461669921875,-0.0460205078125,-0.012969970703125,0.0078277587890625,-0.040313720703125,-0.004344940185546875,-0.00681304931640625,-0.00937652587890625,0.00601959228515625,-0.0086669921875,0.038238525390625,-0.00726318359375,-0.00667572021484375,-0.0282745361328125,-0.01448822021484375,-0.004566192626953125,0.002193450927734375,0.0408935546875,-0.018951416015625,-0.0347900390625,-0.0038661956787109375,0.0011167526245117188,0.00603485107421875,0.004985809326171875,0.004299163818359375,0.009552001953125,-0.04736328125,0.018310546875,0.004238128662109375,0.028839111328125,-0.02349853515625,0.00798797607421875,0.021270751953125,-0.01384735107421875,-0.02392578125,0.03662109375,0.0032825469970703125,0.056182861328125,-0.007129669189453125,-0.0014019012451171875,0.030426025390625,-0.017974853515625,-0.0118560791015625,0.0104827880859375,-0.0132293701171875,0.01959228515625,-0.0006871223449707031,-0.038055419921875,0.03125,0.01332855224609375,0.0675048828125,0.0005002021789550781,0.0117950439453125,0.0179901123046875,-0.0034618377685546875,-0.029205322265625,0.0136871337890625,-0.01409149169921875,-0.020111083984375,-0.06976318359375,-0.03985595703125,-0.020965576171875,0.002532958984375,-0.000797271728515625,0.00029206275939941406,-0.04278564453125,0.01293182373046875,-0.0178375244140625,-0.01496124267578125,-0.0289154052734375,-0.00551605224609375,-0.0135498046875,-0.0019350051879882812,-0.0008111000061035156,0.032958984375,0.005794525146484375,-0.00988006591796875,0.0147247314453125,0.0008878707885742188,-0.0347900390625,0.04827880859375,0.03656005859375,0.0005245208740234375,0.0078887939453125,0.0218048095703125,0.0177764892578125,0.02093505859375,-0.028656005859375,0.0273284912109375,-0.038818359375,0.01300811767578125,0.0174102783203125,0.01216888427734375,-0.0258941650390625,0.028778076171875,-0.024658203125,0.00337982177734375,-0.00594329833984375,-0.00948333740234375,0.036773681640625,-0.006595611572265625,-0.01033782958984375,0.001506805419921875,-0.03656005859375,-0.0239105224609375,0.041229248046875,-0.04071044921875,-0.0152435302734375,0.0151214599609375,0.037994384765625,-0.01058197021484375,-0.01062774658203125,0.002964019775390625,0.0294189453125,0.01041412353515625,0.038299560546875,-0.036163330078125,-0.036346435546875,-0.00850677490234375,-0.0098876953125,-0.051788330078125,0.02398681640625,-0.0219268798828125,0.023406982421875,0.008941650390625,0.010772705078125,-0.0265960693359375,-0.0099639892578125,-0.00727081298828125,0.0234222412109375,0.0023441314697265625,-0.01409912109375,0.01169586181640625,0.0023250579833984375,-0.0189208984375,-0.01013946533203125,-0.01739501953125,-0.0309295654296875,-0.00823974609375,0.029205322265625,0.01111602783203125,-0.01509857177734375,-0.01160430908203125,0.0173187255859375,0.0169830322265625,-0.00464630126953125,0.0253448486328125,0.0095062255859375,-0.0179443359375,0.0223846435546875,-0.0219879150390625,-0.0004260540008544922,-0.025421142578125,-0.007659912109375,-0.01485443115234375,-0.0166168212890625,0.011444091796875,0.0185394287109375,-0.02984619140625,0.061767578125,0.0189971923828125,-0.016693115234375,0.002613067626953125,-0.01242828369140625,0.0262298583984375,0.029388427734375,-0.0711669921875,-0.0263519287109375,0.01184844970703125,0.00977325439453125,-0.0232696533203125,-0.0131072998046875,0.00910186767578125,0.0251617431640625,0.04644775390625,-0.00033926963806152344,0.00894927978515625,0.01216888427734375,-0.00942230224609375,0.01220703125,0.002918243408203125,0.0167694091796875,0.0286865234375,0.01436614990234375,-0.02581787109375,-0.0123138427734375,-0.0143890380859375,0.0200042724609375,-0.020660400390625,-0.017791748046875,-0.006740570068359375,0.02484130859375,-0.028472900390625,-0.0142364501953125,-0.007534027099609375,0.021697998046875,-0.013580322265625,-0.003910064697265625,0.01214599609375,-0.01267242431640625,-0.005466461181640625,0.0239410400390625,0.01348876953125,0.0171661376953125,-0.00982666015625,-0.009613037109375,0.0189208984375,-0.01146697998046875,-0.01364898681640625,-0.021820068359375,-0.017181396484375,0.0097503662109375,-0.0240478515625,0.031829833984375,0.0172271728515625,0.01308441162109375,0.006938934326171875,0.0212249755859375,-0.007843017578125,-0.041839599609375,0.003757476806640625,-0.01332855224609375,-0.0081024169921875,-0.0252227783203125,0.0125732421875,0.00164794921875,-0.009490966796875,-0.0182647705078125,-0.03497314453125,-0.0187225341796875,-0.001026153564453125,-0.06793212890625,-0.05291748046875,-0.0297393798828125,-0.005031585693359375,-0.026519775390625,-0.00891876220703125,0.0096893310546875,-0.0189056396484375,0.01444244384765625,-0.0270233154296875,-0.0010528564453125,0.006771087646484375,-0.00942230224609375,0.03399658203125,-0.0203094482421875,-0.004795074462890625,0.0025959014892578125,0.01538848876953125,-0.00620269775390625,-0.035675048828125,-0.01142120361328125,0.0011234283447265625,-0.0278778076171875,0.00807952880859375,-0.017547607421875,0.0211639404296875,0.037139892578125,-0.0108642578125,-0.0287017822265625,-0.0008664131164550781,-0.00862884521484375,-0.006320953369140625,-0.00901031494140625,-0.012451171875,0.017913818359375,0.005092620849609375,-0.04345703125,-0.027801513671875,0.023040771484375,0.007328033447265625,-0.013916015625,-0.007678985595703125,-0.0031185150146484375,0.01546478271484375,0.02020263671875,-0.01259613037109375,0.0040130615234375,0.005023956298828125,0.00421142578125,-0.0018835067749023438,0.0369873046875,-0.0006284713745117188,0.007049560546875,-0.0213165283203125,-0.02215576171875,-0.05023193359375,-0.006420135498046875,0.001811981201171875,0.01995849609375,0.007694244384765625,-0.0081329345703125,-0.0347900390625,0.01042938232421875,-0.03131103515625,0.0312042236328125,-0.00971221923828125,-0.0352783203125,0.021209716796875,-0.009490966796875,0.00710296630859375,-0.004848480224609375,-0.01030731201171875,0.0037136077880859375,0.0234222412109375,0.004337310791015625,-0.03436279296875,0.0008835792541503906,-0.036712646484375,0.007740020751953125,0.003978729248046875,-0.0178985595703125,-0.0027065277099609375,0.035491943359375,0.01148223876953125,0.01496124267578125,-0.0025386810302734375,0.014404296875,0.007572174072265625,0.016876220703125,-0.0023212432861328125,0.002727508544921875,-0.005374908447265625,0.01690673828125,-0.020599365234375,-0.00002384185791015625,0.0305328369140625,-0.052734375,0.01496124267578125,0.0039215087890625,-0.00762176513671875,0.031585693359375,-0.01617431640625,-0.01222991943359375,0.00873565673828125,-0.033966064453125,0.01061248779296875,-0.0209197998046875,-0.0198516845703125,0.035247802734375,0.0244598388671875,0.0082550048828125,-0.00787353515625,-0.01544952392578125,0.01302337646484375,-0.0166168212890625,-0.0147247314453125,0.02618408203125,-0.0158233642578125,-0.0394287109375,0.0151214599609375,-0.004146575927734375,-0.035369873046875,0.045928955078125,0.04241943359375,0.01354217529296875,0.0343017578125,-0.007183074951171875,0.0129241943359375,-0.004955291748046875,0.025299072265625,0.01538848876953125,-0.0054779052734375,-0.00630950927734375,-0.010711669921875,0.043914794921875,-0.004856109619140625,0.05169677734375,-0.020111083984375,0.023406982421875,-0.0021114349365234375,-0.039215087890625,-0.01314544677734375,-0.0036773681640625,0.01031494140625,-0.00981903076171875,0.01366424560546875,0.0101776123046875,0.0274658203125,-0.0386962890625,0.0194244384765625,-0.04803466796875,0.033172607421875,0.0269775390625,-0.0176849365234375,-0.0016927719116210938,-0.02783203125,0.0015516281127929688,0.01325225830078125,-0.028472900390625,0.01470947265625,0.036773681640625,-0.038482666015625,-0.0009303092956542969,0.0236053466796875,-0.00498199462890625,0.0165557861328125,0.00003445148468017578,-0.03741455078125,-0.0517578125,-0.0090179443359375,-0.033966064453125,-0.0170440673828125,0.0013637542724609375,-0.04473876953125,-0.059478759765625,-0.0165557861328125,-0.047119140625,-0.033721923828125,0.018890380859375,0.00160980224609375,0.050811767578125,-0.0221099853515625,0.0306396484375,-0.01096343994140625,-0.007175445556640625,0.01580810546875,-0.00650787353515625,-0.00467681884765625,0.0256500244140625,0.006931304931640625,0.00316619873046875,-0.0170745849609375,-0.003265380859375,0.00554656982421875,-0.0166473388671875,0.0006661415100097656,0.0297393798828125,-0.00568389892578125,0.01043701171875,-0.03863525390625,0.01531982421875,0.021087646484375,0.002185821533203125,0.00977325439453125,-0.028594970703125,-0.0166473388671875,-0.00018537044525146484,-0.0014066696166992188,0.014312744140625,0.025299072265625,-0.0149383544921875,0.001495361328125,0.03692626953125,0.00438690185546875,0.05572509765625,-0.00350189208984375,0.0156402587890625,0.005992889404296875,-0.005748748779296875,-0.01739501953125,0.017059326171875,0.0006203651428222656,-0.0163726806640625,-0.0203704833984375,-0.005962371826171875,0.006130218505859375,-0.00022983551025390625,-0.014007568359375,-0.0025844573974609375,-0.0171356201171875,0.0130157470703125,-0.005809783935546875,0.0174560546875,-0.0196075439453125,-0.017486572265625,-0.035369873046875,0.0016012191772460938,-0.02008056640625,-0.0213775634765625,0.04119873046875,-0.0125732421875,-0.00983428955078125,0.01010894775390625,-0.01099395751953125,-0.009613037109375,-0.01091766357421875,0.0032520294189453125,-0.004924774169921875,-0.041656494140625,0.01227569580078125,0.011077880859375,-0.040740966796875,0.002017974853515625,-0.0193023681640625,0.014739990234375,-0.0018491744995117188,0.008636474609375,0.017791748046875,-0.0012598037719726562,-0.004123687744140625,-0.006511688232421875,-0.0179443359375,-0.03619384765625,-0.0009822845458984375,0.0066680908203125,-0.0012950897216796875,0.0031185150146484375,-0.05401611328125,0.0266876220703125,-0.035308837890625,-0.0234375,0.0234222412109375,-0.037384033203125,0.002349853515625,0.01290130615234375,-0.0321044921875,0.019622802734375,-0.052337646484375,-0.00556182861328125,0.005496978759765625,0.0078125,0.010101318359375,-0.0055084228515625,0.021087646484375,0.016754150390625,0.0192413330078125,-0.024261474609375,0.0457763671875,-0.0185394287109375,0.0007729530334472656,0.0173187255859375,0.0224456787109375,0.0283355712890625,0.00576019287109375,0.04150390625,-0.005279541015625,0.01000213623046875,0.01496124267578125,0.003604888916015625,-0.033447265625,0.013824462890625,-0.0014410018920898438,-0.0225067138671875,-0.0017547607421875,0.0235443115234375,0.0171966552734375,0.0234375,-0.00482177734375,-0.0062103271484375,0.01885986328125,-0.003917694091796875,0.0172119140625,0.0240478515625,-0.006069183349609375,-0.0166015625,-0.00955963134765625,-0.01861572265625,0.0198822021484375,-0.046875,-0.0011920928955078125,-0.00972747802734375,0.01349639892578125,-0.00629425048828125,-0.0087738037109375,0.01393890380859375,0.0006022453308105469,-0.007038116455078125,-0.017181396484375,-0.00965118408203125,0.0133514404296875,-0.0025787353515625,0.017547607421875,-0.0276641845703125,0.018890380859375,0.01517486572265625,-0.0311737060546875,-0.016815185546875,0.00264739990234375,-0.0214080810546875,0.0181884765625,-0.01145172119140625,-0.0011072158813476562,0.02880859375,0.00782012939453125,-0.0238037109375,0.039031982421875,-0.00690460205078125,0.0018301010131835938,0.0305023193359375,0.005344390869140625,-0.003803253173828125,-0.033782958984375,0.01241302490234375,0.0206146240234375,0.00766754150390625,0.0177459716796875,-0.002201080322265625,-0.01444244384765625,0.031402587890625,-0.04498291015625,-0.02203369140625,-0.017486572265625,0.031341552734375,0.032562255859375,-0.031951904296875,0.0182037353515625,-0.01207733154296875,0.0235748291015625,0.0391845703125,0.00971221923828125,0.029388427734375,-0.038360595703125,0.025726318359375,-0.0040435791015625,0.020233154296875,0.0009427070617675781,0.0347900390625,-0.0226287841796875,0.01318359375,0.01505279541015625,0.01042938232421875,-0.011749267578125,-0.022705078125,-0.006938934326171875,0.008087158203125,-0.00205230712890625,-0.018463134765625,0.02960205078125,-0.0309600830078125,-0.024749755859375,0.004817962646484375,-0.01258087158203125,0.00850677490234375,-0.00560760498046875,-0.021881103515625,-0.004638671875,0.0244903564453125,-0.020416259765625,0.02655029296875,-0.0226287841796875,0.030029296875,0.024139404296875,0.03497314453125,0.0161285400390625,0.0206756591796875,-0.040924072265625,0.01042938232421875,0.048126220703125,0.006565093994140625,-0.00260162353515625,0.037139892578125,0.0006361007690429688,-0.01007843017578125,0.0282745361328125,-0.013702392578125,0.044525146484375,-0.006237030029296875,0.034637451171875,0.0285186767578125,0.0124053955078125,-0.034423828125,0.0007100105285644531,-0.045501708984375,-0.0219268798828125,-0.00836181640625,-0.03704833984375,-0.07000732421875,0.006748199462890625,-0.0036602020263671875,0.00751495361328125,-0.0162353515625,-0.0137176513671875,0.0227203369140625,0.001644134521484375,-0.028656005859375,-0.00397491455078125,-0.0088043212890625,-0.0007772445678710938,0.035797119140625,0.0389404296875,0.0380859375,-0.005031585693359375,0.0059967041015625,-0.016815185546875,-0.0027980804443359375,0.0127410888671875,0.03399658203125,-0.0003867149353027344,0.00679779052734375,-0.0079193115234375,-0.02294921875,0.023101806640625,0.0009560585021972656,0.042694091796875,-0.031768798828125,-0.00247955322265625,0.0197296142578125,0.0196075439453125,-0.0229339599609375,-0.0250396728515625,-0.0006723403930664062,0.011871337890625,0.0308990478515625,0.002803802490234375,0.003803253173828125,-0.0112762451171875,0.0016689300537109375,-0.040985107421875,0.0175933837890625,0.029083251953125,-0.00962066650390625,-0.0384521484375,-0.006683349609375,0.00439453125,0.0269012451171875,0.02252197265625,-0.027587890625,0.003749847412109375,-0.004119873046875,-0.015228271484375,-0.031036376953125,-0.0042724609375,-0.043853759765625,-0.0016918182373046875,-0.015411376953125,0.03643798828125,-0.03814697265625,0.020599365234375,-0.007030487060546875,-0.02532958984375,-0.0216522216796875,0.0016412734985351562,0.00982666015625,0.0205230712890625,0.02484130859375,0.0078887939453125,-0.0261077880859375,0.0247039794921875,-0.01251983642578125,0.0090789794921875,0.013092041015625,0.0082550048828125,0.006603240966796875,-0.00423431396484375,0.01424407958984375,0.01349639892578125,-0.02264404296875,0.0236358642578125,-0.001506805419921875,0.007030487060546875,-0.01727294921875,-0.0249481201171875,-0.00611114501953125,0.0177459716796875,-0.0077056884765625,0.023773193359375,0.01357269287109375,0.012237548828125,0.0338134765625,-0.029022216796875,0.02880859375,-0.0018472671508789062,-0.024139404296875,-0.032989501953125,0.055084228515625,0.02984619140625,0.040618896484375,0.0006160736083984375,0.03814697265625,0.022552490234375,-0.01071929931640625,0.0250091552734375,0.033782958984375,0.00806427001953125,-0.005443572998046875,-0.00899505615234375,-0.00969696044921875,0.01045989990234375,0.037384033203125,0.01308441162109375,-0.01435089111328125,-0.0032367706298828125,0.0186004638671875,-0.0330810546875,-0.014617919921875,0.01088714599609375,-0.00847625732421875,0.02984619140625,-0.0283355712890625,0.023162841796875,0.019134521484375,-0.01218414306640625,-0.033966064453125,-0.028839111328125,-0.022552490234375,-0.02001953125,0.005214691162109375,-0.01418304443359375,0.0035915374755859375,-0.011993408203125,0.0076751708984375,-0.0098876953125,-0.002002716064453125,-0.0008831024169921875,-0.01294708251953125,-0.05120849609375,0.0008082389831542969,0.0205535888671875,-0.0017843246459960938,0.006366729736328125,0.0137939453125,0.060699462890625,-0.0177459716796875,-0.005641937255859375,0.0170440673828125,0.0026397705078125,0.009857177734375,-0.024658203125,0.006175994873046875,0.04205322265625,0.0253143310546875,0.00972747802734375,0.0031375885009765625,-0.022064208984375,0.0006480216979980469,-0.004180908203125,-0.00794219970703125,-0.015106201171875,-0.00901031494140625,-0.00812530517578125,-0.01406097412109375,-0.0247039794921875,-0.0221405029296875,0.025543212890625,0.037353515625,-0.01702880859375,-0.0021762847900390625,0.0237274169921875,0.016632080078125,-0.0335693359375,0.002178192138671875,-0.022705078125,-0.011810302734375,0.01666259765625,0.0287628173828125,-0.02313232421875,-0.011199951171875,0.026702880859375,-0.0195770263671875,0.0278778076171875,0.0106658935546875,-0.0199432373046875,-0.035919189453125,0.028656005859375,0.0009784698486328125,-0.004291534423828125,-0.0309906005859375,0.03277587890625,0.011260986328125,0.0112457275390625,-0.034698486328125,-0.01111602783203125,0.0309906005859375,0.042236328125],"object":"embedding"}],"model":"nvidia/llama-3.2-nv-embedqa-1b-v2","usage":{"prompt_tokens":10,"total_tokens":10}}
Last Modified Oct 3, 2025

Configure the Prometheus Receiver

10 minutes  

Now that our LLM is up and running, we’ll add the Prometheus receiver to our OpenTelemetry collector to gather metrics from it.

Capture the NVIDIA DCGM Exporter metrics

The NVIDIA DCGM exporter is running in our OpenShift cluster. It exposes GPU metrics that we can send to Splunk.

To do this, let’s customize the configuration of the collector by editing the otel-collector-values.yaml file that we used earlier when deploying the collector.

Add the following content, just below the kubeletstats section:

      receiver_creator/nvidia:
        # Name of the extensions to watch for endpoints to start and stop.
        watch_observers: [ k8s_observer ]
        receivers:
          prometheus/dcgm:
            config:
              config:
                scrape_configs:
                  - job_name: gpu-metrics
                    scrape_interval: 10s
                    static_configs:
                      - targets:
                          - '`endpoint`:9400'
            rule: type == "pod" && labels["app"] == "nvidia-dcgm-exporter"

This tells the collector to look for pods with a label of app=nvidia-dcgm-exporter. And when it finds a pod with this label, scrape the /v1/metrics endpoint using port 9400.

To ensure the receiver is used, we’ll need to add a new pipeline to the otel-collector-values.yaml file as well.

Add the following code to the bottom of the file:

    service:
      pipelines:
        metrics/nvidia-metrics:
          exporters:
            - signalfx
          processors:
            - memory_limiter
            - batch
            - resourcedetection
            - resource
          receivers:
            - receiver_creator/nvidia

Before applying the changes, let’s add one more Prometheus receiver in the next section.

Capture the NVIDIA NIM metrics

The meta-llama-3-2-1b-instruct LLM that we just deployed with NVIDIA NIM also includes a Prometheus endpoint that we can scrape with the collector. Let’s add the following to the otel-collector-values.yaml file, just below the receiver we added earlier:

          prometheus/nim-llm:
            config:
              config:
                scrape_configs:
                  - job_name: nim-for-llm-metrics
                    scrape_interval: 10s
                    metrics_path: /v1/metrics
                    static_configs:
                      - targets:
                          - '`endpoint`:8000'
            rule: type == "pod" && labels["app"] == "meta-llama-3-2-1b-instruct"

This tells the collector to look for pods with a label of app=meta-llama-3-2-1b-instruct. And when it finds a pod with this label, scrape the /v1/metrics endpoint using port 8000.

There’s no need to make changes to the pipeline, as this receiver will already be picked up as part of the receiver_creator/nvidia receiver.

Add a Filter Processor

Prometheus endpoints can expose a large number of metrics, sometimes with high cardinality.

Let’s add a filter processor that defines exactly what metrics we want to send to Splunk. Specifically, we’ll send only the metrics that are utilized by a dashboard chart or an alert detector.

Add the following code to the otel-collector-values.yaml file, after the exporters section but before the receivers section:

    processors:
      filter/metrics_to_be_included:
        metrics:
          # Include only metrics used in charts and detectors
          include:
            match_type: strict
            metric_names:
              - DCGM_FI_DEV_FB_FREE
              - DCGM_FI_DEV_FB_USED
              - DCGM_FI_DEV_GPU_TEMP
              - DCGM_FI_DEV_GPU_UTIL
              - DCGM_FI_DEV_MEM_CLOCK
              - DCGM_FI_DEV_MEM_COPY_UTIL
              - DCGM_FI_DEV_MEMORY_TEMP
              - DCGM_FI_DEV_POWER_USAGE
              - DCGM_FI_DEV_SM_CLOCK
              - DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION
              - DCGM_FI_PROF_DRAM_ACTIVE
              - DCGM_FI_PROF_GR_ENGINE_ACTIVE
              - DCGM_FI_PROF_PCIE_RX_BYTES
              - DCGM_FI_PROF_PCIE_TX_BYTES
              - DCGM_FI_PROF_PIPE_TENSOR_ACTIVE
              - generation_tokens_total
              - go_info
              - go_memstats_alloc_bytes
              - go_memstats_alloc_bytes_total
              - go_memstats_buck_hash_sys_bytes
              - go_memstats_frees_total
              - go_memstats_gc_sys_bytes
              - go_memstats_heap_alloc_bytes
              - go_memstats_heap_idle_bytes
              - go_memstats_heap_inuse_bytes
              - go_memstats_heap_objects
              - go_memstats_heap_released_bytes
              - go_memstats_heap_sys_bytes
              - go_memstats_last_gc_time_seconds
              - go_memstats_lookups_total
              - go_memstats_mallocs_total
              - go_memstats_mcache_inuse_bytes
              - go_memstats_mcache_sys_bytes
              - go_memstats_mspan_inuse_bytes
              - go_memstats_mspan_sys_bytes
              - go_memstats_next_gc_bytes
              - go_memstats_other_sys_bytes
              - go_memstats_stack_inuse_bytes
              - go_memstats_stack_sys_bytes
              - go_memstats_sys_bytes
              - go_sched_gomaxprocs_threads
              - gpu_cache_usage_perc
              - gpu_total_energy_consumption_joules
              - http.server.active_requests
              - num_request_max
              - num_requests_running
              - num_requests_waiting
              - process_cpu_seconds_total
              - process_max_fds
              - process_open_fds
              - process_resident_memory_bytes
              - process_start_time_seconds
              - process_virtual_memory_bytes
              - process_virtual_memory_max_bytes
              - promhttp_metric_handler_requests_in_flight
              - promhttp_metric_handler_requests_total
              - prompt_tokens_total
              - python_gc_collections_total
              - python_gc_objects_collected_total
              - python_gc_objects_uncollectable_total
              - python_info
              - request_finish_total
              - request_success_total
              - system.cpu.time
              - e2e_request_latency_seconds
              - time_to_first_token_seconds
              - time_per_output_token_seconds
              - request_prompt_tokens
              - request_generation_tokens

Ensure this processor is included in the pipeline we added earlier to the bottom of the file:

    service:
      pipelines:
        metrics/nvidia-metrics:
          exporters:
            - signalfx
          processors:
            - memory_limiter
            - filter/metrics_to_be_included
            - batch
            - resourcedetection
            - resource
          receivers:
            - receiver_creator/nvidia

Verify Changes

Before applying the configuration changes to the collector, take a moment to compare the contents of your modified otel-collector-values.yaml file with the otel-collector-values-with-nvidia.yaml file. Update your file as needed to ensure the contents match. Remember that indentation is important for yaml files, and needs to be precise.

Update the OpenTelemetry Collector Config

Now we can update the OpenTelemetry collector configuration by running the following Helm command:

helm upgrade splunk-otel-collector \
  --set="clusterName=$CLUSTER_NAME" \
  --set="environment=$ENVIRONMENT_NAME" \
  --set="splunkObservability.accessToken=$SPLUNK_ACCESS_TOKEN" \
  --set="splunkObservability.realm=$SPLUNK_REALM" \
  --set="splunkPlatform.endpoint=$SPLUNK_HEC_URL" \
  --set="splunkPlatform.token=$SPLUNK_HEC_TOKEN" \
  --set="splunkPlatform.index=$SPLUNK_INDEX" \
  -f ./otel-collector/otel-collector-values.yaml \
  -n otel \
  splunk-otel-collector-chart/splunk-otel-collector

Confirm Metrics are Sent to Splunk

Navigate to the Cisco AI Pod dashboard in Splunk Observability Cloud. Ensure it’s filtered on your OpenShift cluster name, and that the charts are populated as in the following example:

Kubernetes Pods Kubernetes Pods

Last Modified Sep 26, 2025

Deploy the Vector Database

10 minutes  

In this step, we’ll deploy a vector database to the OpenShift cluster and populate it with test data.

What is a Vector Database?

A vector database stores and indexes data as numerical “vector embeddings,” which capture the semantic meaning of information like text or images. Unlike traditional databases, they excel at similarity searches, finding conceptually related data points rather than exact matches.

How is a Vector Database Used?

Vector databases play a key role in a pattern called Retrieval Augmented Generation (RAG), which is widely used by applications that leverage Large Language Models (LLMs).

The pattern is as follows:

  • The end-user asks a question to the application
  • The application takes the question and calculates a vector embedding for it
  • The app then performs a similarity search, looking for related documents in the vector database
  • The app then takes the original question and the related documents, and sends it to the LLM as context
  • The LLM reviews the context and returns a response to the application

Deploy a Vector Database

For the workshop, we’ll deploy an open-source vector database named Weaviate.

First, add the Weaviate helm repo that contains the Weaviate helm chart:

helm repo add weaviate https://weaviate.github.io/weaviate-helm
helm repo update

The weaviate/weaviate-values.yaml file includes the configuration we’ll use to deploy the Weviate vector database.

We’ve set the following environment variables to TRUE, to ensure Weaviate exposes metrics that we can scrape later with the Prometheus receiver:

  PROMETHEUS_MONITORING_ENABLED: true
  PROMETHEUS_MONITORING_GROUP: true

Review Weaviate documentation to explore additional customization options available.

Let’s create a new namespace:

oc create namespace weaviate

Run the following command to allow Weaviate to run a privileged container:

Note: this approach is not recommended for production environments

oc adm policy add-scc-to-user privileged -z default -n weaviate

Then deploy Weaviate:

helm upgrade --install \
  "weaviate" \
  weaviate/weaviate \
  --namespace "weaviate" \
  --values ./weaviate/weaviate-values.yaml

Capture Weaviate Metrics with Prometheus

Now that Weaviate is installed in our OpenShift cluster, let’s modify the OpenTelemetry collector configuration to scrape Weaviate’s Prometheus metrics.

To do so, let’s add an additional Prometheus receiver creator section to the otel-collector-values.yaml file:

      receiver_creator/weaviate:
        # Name of the extensions to watch for endpoints to start and stop.
        watch_observers: [ k8s_observer ]
        receivers:
          prometheus/weaviate:
            config:
              config:
                scrape_configs:
                  - job_name: weaviate-metrics
                    scrape_interval: 10s
                    static_configs:
                      - targets:
                          - '`endpoint`:2112'
            rule: type == "pod" && labels["app"] == "weaviate"

We’ll need to ensure that Weaviate’s metrics are added to the filter/metrics_to_be_included filter processor configuration as well:

    processors:
      filter/metrics_to_be_included:
        metrics:
          # Include only metrics used in charts and detectors
          include:
            match_type: strict
            metric_names:
              - DCGM_FI_DEV_FB_FREE
              - ...
              - object_count
              - vector_index_size
              - vector_index_operations
              - vector_index_tombstones
              - vector_index_tombstone_cleanup_threads
              - vector_index_tombstone_cleanup_threads
              - requests_total
              - objects_durations_ms_sum
              - objects_durations_ms_count
              - batch_delete_durations_ms_sum
              - batch_delete_durations_ms_count

We also want to add a Resource processor to the configuration file, with the following configuration:

      resource/weaviate:
        attributes:
          - key: weaviate.instance.id
            from_attribute: service.instance.id
            action: insert

This processor takes the service.instance.id property on the Weaviate metrics and copies it into a new property called weaviate.instance.id. This is done so that we can more easily distinguish Weaviate metrics from other metrics that use service.instance.id, which is a standard OpenTelemetry property used in Splunk Observability Cloud.

We’ll need to add a new metrics pipeline for Weaviate metrics as well (we need to use a separate pipeline since we don’t want the weaviate.instance.id metric to be added to non-Weaviate metrics):

        metrics/weaviate:
          exporters:
            - signalfx
          processors:
            - memory_limiter
            - filter/metrics_to_be_included
            - resource/weaviate
            - batch
            - resourcedetection
            - resource
          receivers:
            - receiver_creator/weaviate

Before applying the configuration changes to the collector, take a moment to compare the contents of your modified otel-collector-values.yaml file with the otel-collector-values-with-weaviate.yaml file. Update your file as needed to ensure the contents match. Remember that indentation is important for yaml files, and needs to be precise.

Now we can update the OpenTelemetry collector configuration by running the following Helm command:

helm upgrade splunk-otel-collector \
  --set="clusterName=$CLUSTER_NAME" \
  --set="environment=$ENVIRONMENT_NAME" \
  --set="splunkObservability.accessToken=$SPLUNK_ACCESS_TOKEN" \
  --set="splunkObservability.realm=$SPLUNK_REALM" \
  --set="splunkPlatform.endpoint=$SPLUNK_HEC_URL" \
  --set="splunkPlatform.token=$SPLUNK_HEC_TOKEN" \
  --set="splunkPlatform.index=$SPLUNK_INDEX" \
  -f ./otel-collector/otel-collector-values.yaml \
  -n otel \
  splunk-otel-collector-chart/splunk-otel-collector

In Splunk Observability Cloud, navigate to Infrastructure -> AI Frameworks -> Weaviate. Filter on the k8s.cluster.name of interest, and ensure the navigator is populated as in the following example:

Kubernetes Pods Kubernetes Pods

Populate the Vector Database

Now that Weaviate is up and running, and we’re capturing metrics from it, let’s add some data to it that we’ll use in the next part of the workshop with a custom application.

The application used to do this is based on LangChain Playbook for NeMo Retriever Text Embedding NIM.

Per the configuration in ./load-embeddings/k8s-job.yaml, we’re going to load a datasheet for the NVIDIA H200 Tensor Core GPU into our vector database.

This document includes information about NVIDIA’s H200 GPUs that our large language model isn’t trained on. And in the next part of the workshop, we’ll build an application that uses an LLM to answer questions using the context from this document, which will be loaded into the vector database.

We’ll deploy a Kubernetes Job to our OpenShift cluster to load the embeddings. A Kubernetes Job is used rather than a Pod to ensure that this process runs only once:

oc create namespace llm-app
oc apply -f ./load-embeddings/k8s-job.yaml

Note: to build a Docker image for the Python application that loads the embeddings into Weaviate, we executed the following commands:

cd workshop/cisco-ai-pods/load-embeddings
docker build --platform linux/amd64 -t derekmitchell399/load-embeddings:1.0 .
docker push derekmitchell399/load-embeddings:1.0
Last Modified Oct 3, 2025

Deploy the LLM Application

10 minutes  

In the final step of the workshop, we’ll deploy an application to our OpenShift cluster that uses the instruct and embeddings models that we deployed earlier using the NVIDIA NIM operator.

Application Overview

Like most applications that interact with LLMs, our application is written in Python. It also uses LangChain, which is an open-source orchestration framework that simplifies the development of applications powered by LLMs.

Our application starts by connecting to two LLMs that we’ll be using:

  • meta/llama-3.2-1b-instruct: used for responding to user prompts
  • nvidia/llama-3.2-nv-embedqa-1b-v2: used to calculate embeddings
# connect to a LLM NIM at the specified endpoint, specifying a specific model
llm = ChatNVIDIA(base_url=INSTRUCT_MODEL_URL, model="meta/llama-3.2-1b-instruct")

# Initialize and connect to a NeMo Retriever Text Embedding NIM (nvidia/llama-3.2-nv-embedqa-1b-v2)
embeddings_model = NVIDIAEmbeddings(model="nvidia/llama-3.2-nv-embedqa-1b-v2",
                                   base_url=EMBEDDINGS_MODEL_URL)

The URL’s used for both LLMs are defined in the k8s-manifest.yaml file:

    - name: INSTRUCT_MODEL_URL
      value: "http://meta-llama-3-2-1b-instruct.nim-service:8000/v1"
    - name: EMBEDDINGS_MODEL_URL
      value: "http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1"

The application then defines a prompt template that will be used in interactions with the LLM:

prompt = ChatPromptTemplate.from_messages([
    ("system",
        "You are a helpful and friendly AI!"
        "Your responses should be concise and no longer than two sentences."
        "Do not hallucinate. Say you don't know if you don't have this information."
        "Answer the question using only the context"
        "\n\nQuestion: {question}\n\nContext: {context}"
    ),
    ("user", "{question}")
])

Note how we’re explicitly instructing the LLM to just say it doesn’t know the answer if it doesn’t know, which helps minimize hallucinations. There’s also a placeholder for us to provide context that the LLM can use to answer the question.

The application uses Flask, and defines a single endpoint named /askquestion to respond to questions from end users. To implement this endpoint, the application connects to the Weaviate vector database, and then invokes a chain (using LangChain) that takes the user’s question, converts it to an embedding, and then looks up similar documents in the vector database. It then sends the user’s question to the LLM, along with the related documents, and returns the LLM’s response.

   # connect with the vector store that was populated earlier
    vector_store = WeaviateVectorStore(
        client=weaviate_client,
        embedding=embeddings_model,
        index_name="CustomDocs",
        text_key="page_content"
    )

    chain = (
        {
            "context": vector_store.as_retriever(),
            "question": RunnablePassthrough()
        }
        | prompt
        | llm
        | StrOutputParser()
    )

    response = chain.invoke(question)

Instrument the Application with OpenTelemetry

To capture metrics, traces, and logs from our application, we’ve instrumented it with OpenTelemetry. This required adding the following package to the requirements.txt file (which ultimately gets installed with pip install):

splunk-opentelemetry==2.7.0

We also added the following to the Dockerfile used to build the container image for this application, to install additional OpenTelemetry instrumentation packages:

# Add additional OpenTelemetry instrumentation packages
RUN opentelemetry-bootstrap --action=install

Then we modified the ENTRYPOINT in the Dockerfile to call opentelemetry-instrument when running the application:

ENTRYPOINT ["opentelemetry-instrument", "flask", "run", "-p", "8080", "--host", "0.0.0.0"]

Finally, to enhance the traces and metrics collected with OpenTelemetry, we added a package named OpenLIT to the requirements.txt file:

openlit==1.35.4

OpenLIT supports LangChain, and adds additional context to traces at instrumentation time, such as the number of tokens used to process the request, and what the prompt and response were.

To initialize OpenLIT, we added the following to the application code:

import openlit
...
openlit.init(environment="llm-app")

Deploy the LLM Application

Use the following command to deploy this application to the OpenShift cluster:

oc apply -f ./llm-app/k8s-manifest.yaml

Note: to build a Docker image for this Python application, we executed the following commands:

cd workshop/cisco-ai-pods/llm-app
docker build --platform linux/amd64 -t derekmitchell399/llm-app:1.0 .
docker push derekmitchell399/llm-app:1.0

Test the LLM Application

Let’s ensure the application is working as expected.

Start a pod that has access to the curl command:

oc run --rm -it -n default curl --image=curlimages/curl:latest -- sh

Then run the following command to send a question to the LLM:

curl -X "POST" \
 'http://llm-app.llm-app.svc.cluster.local:8080/askquestion' \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "question": "How much memory does the NVIDIA H200 have?"
  }'
The NVIDIA H200 has 141GB of HBM3e memory, which is twice the capacity of the NVIDIA H100 Tensor Core GPU with 1.4X more memory bandwidth.

View Trace Data in Splunk Observability Cloud

In Splunk Observability Cloud, navigate to APM and then select Service Map. Ensure the llm-app environment is selected. You should see a service map that looks like the following:

Service Map Service Map

Click on Traces on the right-hand side menu. Then select one of the slower running traces. It should look like the following example:

Trace Trace

The trace shows all the interactions that our application executed to return an answer to the users question (i.e. “How much memory does the NVIDIA H200 have?”)

For example, we can see where our application performed a similarity search to look for documents related to the question at hand in the Weaviate vector database:

Document Retrieval Document Retrieval

We can also see how the application created a prompt to send to the LLM, including the context that was retrieved from the vector database:

Prompt Template Prompt Template

Finally, we can see the response from the LLM, the time it took, and the number of input and output tokens utilized:

LLM Response LLM Response

Last Modified Oct 3, 2025

Wrap-Up

5 minutes  

Wrap-Up

We hope you enjoyed this workshop, which provided hands-on experience deploying and working with several of the technologies that are used to monitor Cisco AI PODs with Splunk Observability Cloud. Specifically, you had the opportunity to:

  • Deploy a RedHat OpenShift cluster with GPU-based worker nodes.
  • Deploy the NVIDIA NIM Operator and NVIDIA GPU Operator.
  • Deploy Large Language Models (LLMs) using NVIDIA NIM to the cluster.
  • Deploy the OpenTelemetry Collector in the Red Hat OpenShift cluster.
  • Add Prometheus receivers to the collector to ingest infrastructure metrics.
  • Deploy the Weaviate vector database to the cluster.
  • Instrument Python services that interact with Large Language Models (LLMs) with OpenTelemetry.
  • Understand which details which OpenTelemetry captures in the trace from applications that interact with LLMs.

Clean Up Steps

Follow the steps in this section to uninstall the OpenShift cluster.

Get the cluster ID, the Amazon Resource Names (ARNs) for the cluster-specific Operator roles, and the endpoint URL for the OIDC provider by running the following command:

rosa describe cluster --cluster=$CLUSTER_NAME

Delete the cluster using the following command

rosa delete cluster --cluster=$CLUSTER_NAME --watch

Delete the cluster-specific Operator IAM roles:

Note: just accept the default values when prompted.

rosa delete operator-roles --prefix $OPERATOR_ROLES_PREFIX

Delete the OIDC provider:

Note: just accept the default values when prompted.

rosa delete oidc-provider --oidc-config-id $OIDC_ID

Refer to OpenShift documentation if you’d like to completely remove the Red Hat OpenShift Service from your AWS account.