Workshop
Deploy the LLM Application
Deploy the LLM Application #
Use the following command to deploy this application to the OpenShift cluster:
bash
cd ~/workshop/cisco-ai-pods
oc apply -f ./llm-app/k8s-manifest.yamlNote: to build a Docker image for this Python application, we executed the following commands:
bashcd workshop/cisco-ai-pods/llm-app docker build --platform linux/amd64 -t ghcr.io/splunk/cisco-ai-pod-workshop-app:1.0 . docker push ghcr.io/splunk/cisco-ai-pod-workshop-app:1.0
Test the LLM Application #
Let’s ensure the application is working as expected.
Start a pod that has access to the curl command:
bash
oc run curl --rm -it --image=curlimages/curl:latest \
--overrides='{
"spec": {
"containers": [{
"name": "curl",
"image": "curlimages/curl:latest",
"stdin": true,
"tty": true,
"command": ["sh"],
"resources": {
"limits": {
"cpu": "50m",
"memory": "100Mi"
},
"requests": {
"cpu": "50m",
"memory": "100Mi"
}
}
}]
}
}'Then run the following command to send a question to the LLM:
bash
curl -X "POST" \
'http://llm-app:8080/askquestion' \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"question": "How much memory does the NVIDIA H200 have?"
}'bash
The NVIDIA H200 has 141GB of HBM3e memory, which is twice the capacity of the NVIDIA H100 Tensor Core GPU with 1.4X more memory bandwidth.
