Workshop

Deploy the LLM Application

10 minutes

Deploy the LLM Application

Use the following command to deploy this application to the OpenShift cluster:

bash
cd ~/workshop/cisco-ai-pods
oc apply -f ./llm-app/k8s-manifest.yaml

Note

To build a Docker image for this Python application, we executed the following commands:

bash
cd workshop/cisco-ai-pods/llm-app
docker build --platform linux/amd64 -t ghcr.io/splunk/cisco-ai-pod-workshop-app:1.0 .
docker push ghcr.io/splunk/cisco-ai-pod-workshop-app:1.0

Test the LLM Application

Let’s ensure the application is working as expected.

Start a pod that has access to the curl command:

bash
oc run curl --rm -it --image=curlimages/curl:latest \
  --overrides='{
    "spec": {
      "containers": [{
        "name": "curl",
        "image": "curlimages/curl:latest",
        "stdin": true,
        "tty": true,
        "command": ["sh"],
        "resources": {
          "limits": {
            "cpu": "50m",
            "memory": "100Mi"
          },
          "requests": {
            "cpu": "50m",
            "memory": "100Mi"
          }
        }
      }]
    }
  }'

Then run the following command to send a question to the LLM:

bash
curl -X "POST" \
 'http://llm-app:8080/askquestion' \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "question": "How much memory does the NVIDIA H200 have?"
  }'

The NVIDIA H200 has 141GB of HBM3e memory, which is twice the capacity of the NVIDIA H100 Tensor Core GPU with 1.4X more memory bandwidth.

Deploy the LLM Application

Deploy the LLM Application #

Test the LLM Application #

Deploy the LLM Application

Test the LLM Application