The Splunk AI Operator provides a collection of custom resources you can use to manage Splunk AI Platform deployments in your Kubernetes cluster.
For examples on how to use these custom resources, please see Configuring Splunk Enterprise Deployments.
All resources in Kubernetes include a metadata section. You can use this
to define a name for a specific instance of the resource, and which namespace
you would like the resource to reside within:
| Key | Type | Description |
|---|---|---|
| name | string | Each instance of your resource is distinguished using this name. |
| namespace | string | Your instance will be created within this namespace. You must ensure that this namespace exists beforehand. |
If you do not provide a namespace, you current context will be used.
apiVersion: ai.splunk.com/v1
kind: AIPlatform
metadata:
name: example
namespace: test
apiVersion: ai.splunk.com/v1
kind: AIPlatform
metadata:
name: example
labels:
app.kubernetes.io/name: splunk-ai-platform-example
app.kubernetes.io/instance: example
app.kubernetes.io/version: 0.1.0
spec:
objectStorage:
path: "s3://my-ai-bucket"
region: "us-west-2"
secretRef: s3-secret
serviceAccountName: "ai-platform-sa"
features:
- name: "saia"
serviceAccountName: "saia-sa"
version: "0.1.0"
workerGroupConfig:
serviceAccountName: "ray-worker-sa"
imageRegistry: "123456789012.dkr.ecr.us-west-2.amazonaws.com/ray/ray-worker-gpu"
sidecars:
envoy: true
otel: true
prometheusOperator: true
certificateRef: "platform-issuer"
clusterDomain: "cluster.local"
images:
saiaImage: "splunkai/saia:latest"
weaviateImage: "docker.io/weaviate:latest"
rayHeadGroupImage: "rayproject/ray-head:latest"
rayWorkerGroupImage: "rayproject/ray-worker:latest"
defaultAcceleratorType: "L40S"
splunkConfiguration:
crName: "splunk-standalone"
crNamespace: "default"
secretRef:
name: "splunk-secret"
namespace: "default"
endpoint: "https://splunk.default.svc.cluster.local:8089"
# Optional, if not using secretRef
# token: "splunk-token"
# Persistent storage for Weaviate vector database
storage:
vectorDB:
# Option 1: Use existing PVC
# pvcName: "my-existing-pvc"
# Option 2: Create dynamic PVC (recommended)
size: "100Gi"
storageClassName: "gp3" # Use appropriate StorageClass
# Scheduling for GPU workloads (Ray workers)
gpuScheduler:
nodeSelector:
node.kubernetes.io/instance-type: "g5.2xlarge"
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
# Scheduling for CPU workloads (Ray head, Weaviate)
cpuScheduler:
nodeSelector:
workload-type: "cpu"
tolerations: []
# External access via Ingress (optional)
ingress:
enabled: true
className: "nginx" # or "alb", "traefik", etc.
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
hosts:
- host: "ai.example.com"
paths:
- path: "/"
pathType: "Prefix"
tls:
- hosts:
- "ai.example.com"
secretName: "ai-platform-tls"
# mTLS certificates for secure communication (optional)
mtls:
enabled: true
termination: "operator" # Operator manages certificates
secretName: "ai-platform-mtls"
issuerRef:
name: "ca-issuer"
kind: "ClusterIssuer"
dnsNames:
- "saia.default.svc.cluster.local"
The AIPlatform resource provides the following Spec configuration parameters:
| Key | Type | Description |
|---|---|---|
| objectStorage | object | Required. S3/GCS/Azure storage configuration for model artifacts. See Service Artifacts Storage |
| serviceAccountName | string | Kubernetes Service Account name. Used for IAM roles (IRSA on AWS) to access cloud resources |
| features | array | List of AI features to enable (e.g., saia for Splunk AI Assistant) |
| defaultAcceleratorType | string | GPU type for AI workloads (e.g., nvidia-tesla-t4, nvidia-a100, L40S) |
| gpuInstanceType | string | GPU instance type for Ray worker groups (e.g., g6.24xlarge, p4d.24xlarge) |
| workerGroupConfig | object | Ray worker node configuration (service account, image registry) |
| sidecars | object | Enable/disable sidecars: envoy, otel, prometheusOperator |
| clusterDomain | string | Kubernetes cluster domain suffix. Default: cluster.local |
| images | object | Container image overrides for Ray head/worker, SAIA, Weaviate |
| certificateRef | string | References a cert-manager Certificate or Issuer for mTLS |
| splunkConfiguration | object | Connection details for Splunk Enterprise instance |
| storage | object | Persistent storage for Weaviate vector database. See Storage Configuration |
| gpuScheduler | object | Node selectors, affinity, tolerations for GPU workloads |
| cpuScheduler | object | Node selectors, affinity, tolerations for CPU workloads (head, Weaviate) |
| ingress | object | External access configuration. Exposes AI services via HTTP/HTTPS. See Ingress Usage |
| mtls | object | mTLS/TLS certificates managed by cert-manager for secure service communication |
| serviceTemplate | object | Template used to create Kubernetes services for platform components |
The AIService CR is created automatically by the AIPlatform CR, so there are no additional spec values to deploy an AIService CR on its own.
View the overall status of your AI Platform:
# View status conditions
kubectl get aiplatform <name> -n <namespace> -o jsonpath='{.status.conditions}' | jq .
# Check if platform is ready
kubectl get aiplatform <name> -n <namespace> -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
Key Status Conditions:
Ready - Overall platform healthRayServiceReady - Ray cluster statusRayClusterReady - Ray pods readinessRayServeRouteReady - AI inference endpoint availabilityWeaviateDatabaseReady - Vector database statusIngressReady - External access (if enabled)See what’s happening with your deployment:
# Watch all events
kubectl get events -n <namespace> --watch --field-selector involvedObject.name=<name>
# See recent events
kubectl describe aiplatform <name> -n <namespace> | grep -A 20 Events:
# Filter specific event types
kubectl get events -n <namespace> --field-selector reason=RayServiceReady
kubectl get events -n <namespace> --field-selector reason=PlatformDegraded
For more details on events and troubleshooting, see Error Handling and Events.
# One-liner to check if platform is ready
kubectl get aiplatform <name> -n <namespace> -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
# Output: True (ready) or False (not ready)
# Get Ray service name for accessing inference API
kubectl get aiplatform <name> -n <namespace> -o jsonpath='{.status.rayServiceName}'
# Get Weaviate service name
kubectl get aiplatform <name> -n <namespace> -o jsonpath='{.status.vectorDbServiceName}'
# Get Ingress address (if enabled)
kubectl get aiplatform <name> -n <namespace> -o jsonpath='{.status.conditions[?(@.type=="IngressReady")].message}'