The Splunk AI team has provided global artifact storage in a publicly readable S3 bucket. This bucket contains LLM model files and weaviate bootstrap data. In order to create the Splunk AI Platform and Splunk AI Service CRs, users need to have a storage bucket created to transfer the data. Include the bucket connection information in the spec.volume field in the Splunk AI Platform CR to trigger a job to transfer the data from the public bucket to the local bucket.
Utilizing the AI Platform requires one of the following remote storage providers:
artifacts, tasks, models
So, the three paths should be:
s3://bucket/artifacts
s3://bucket/tasks
s3://bucket/models
DNS names from the endpoint. Please ensure that the endpoint has access to the S3 buckets using the credentials configured. Similarly other endpoint URLs with access to the S3 buckets can also be used.To use GCP storage, follow these setup requirements:
Create a role and role-binding for the splunk-ai-operator service account. This allows read-write access to the GCP bucket to retrieve Splunk AI artifacts.
Configure credentials through either a Kubernetes secret (e.g., storing a GCP service account key in key.json) or use Workload Identity for secure access:
kubectl create secret generic gcs-secret --from-file=key.json=path/to/your-service-account-key.json
Azure Managed Identities can be used to provide IAM access to the blobs. With managed identities, the AKS nodes that host the pods can retrieve an OAuth token that provides authorization for the Splunk AI Operator pod to read the app packages stored in the Azure Storage account. The key point here is that the AKS node is associated with a Managed Identity, and this managed identity is given a role for read and write access called Storage Blob Data Contributor to the Azure Storage account.
Create an Azure Resource Group
az group create --name splunkAIOperatorResourceGroup --location westus2
Create AKS Cluster with Managed Identity Enabled
az aks create -g splunkAIOperatorResourceGroup -n splunkAIOperatorCluster --enable-managed-identity
Get Credentials to Access Cluster
az aks get-credentials --resource-group splunkAIOperatorResourceGroup --name splunkAIOperatorCluster
Get the Kubelet User Managed Identity
Run:
az identity list
Find the section that has <AKS Cluster Name>-agentpool under name. For example, look for the block that contains:
{
"clientId": "a5890776-24e6-4f5b-9b6c-**************",
"id": "/subscriptions/<subscription-id>/resourceGroups/MC_splunkAIOperatorResourceGroup_splunkAIOperatorCluster_westus2/providers/Microsoft.ManagedIdentity/userAssignedIdentities/splunkAIOperatorCluster-agentpool",
"location": "westus2",
"name": "splunkAIOperatorCluster-agentpool",
"principalId": "f0f04120-6a36-49bc--**************",
"resourceGroup": "MC_splunkAIOperatorResourceGroup_splunkAIOperatorCluster_westus2",
"tags": {},
"tenantId": "8add7810-b62a--**************",
"type": "Microsoft.ManagedIdentity/userAssignedIdentities"
}
Extract the principalId value from the output above. Alternatively, use the following command to get the principalId:
az identity show --name <identityName> --resource-group "<resourceGroup>" --query 'principalId' --output tsv
Example:
principalId=$(az identity show --name splunkAIOperatorCluster-agentpool --resource-group "MC_splunkAIOperatorResourceGroup_splunkAIOperatorCluster_westus2" --query 'principalId' --output tsv)
echo $principalId
Output:
f0f04120-6a36-49bc--**************
Assign Read-Write Access for Kubelet User Managed Identity to the Storage Account
Use the principalId from the above section and assign it to the storage account:
az role assignment create --assignee "<principalId>" --role 'Storage Blob Data Contributor' --scope /subscriptions/<subscription_id>/resourceGroups/<storageAccountResourceGroup>/providers/Microsoft.Storage/storageAccounts/<storageAccountName>
For Example:
If <storageAccountResourceGroup> is splunkAIOperatorResourceGroup and <storageAccountName> is mystorageaccount, the command would be:
az role assignment create --assignee "f0f04120-6a36-49bc--**************" --role 'Storage Blob Data Contributor' --scope /subscriptions/f428689e-c379-4712--**************/resourceGroups/splunkAIOperatorResourceGroup/providers/Microsoft.Storage/storageAccounts/mystorageaccount
After this command, you can connect to Azure Blob without secrets.
Granular Access: Azure allows “Managed Identities” assignment at the “storage accounts” level as well as at specific containers (buckets) levels. A managed identity assigned read permissions at a storage account level will have read access for all containers within that storage account. As a good security practice, assign the managed identity to only the specific containers it needs to access, rather than the entire storage account.
Avoid Shared Access Keys: In contrast to “Managed Identities”, Azure allows “shared access keys” configurable only at the storage accounts level. When using the secretRef configuration in the CRD, the underlying secret key will allow both read and write access to the storage account (and all containers within it). Based on your security needs, consider using “Managed Identities” instead of secrets. Additionally, there’s no automated way to rotate the secret key, so if you’re using these keys, rotate them regularly (e.g., every 90 days).