splunk-ai-operator

Service Artifacts Storage

Splunk AI Artifacts

The Splunk AI team has provided global artifact storage in a publicly readable S3 bucket. This bucket contains LLM model files and weaviate bootstrap data. In order to create the Splunk AI Platform and Splunk AI Service CRs, users need to have a storage bucket created to transfer the data. Include the bucket connection information in the spec.volume field in the Splunk AI Platform CR to trigger a job to transfer the data from the public bucket to the local bucket.

Prerequisites

Utilizing the AI Platform requires one of the following remote storage providers:

Prerequisites common to all remote storage providers

Prerequisites for S3 based remote object storage

Prerequisites for Azure Blob remote object storage

Prerequisites for GCP bucket based remote object storage

To use GCP storage, follow these setup requirements:

Role & Role Binding for Access:

Create a role and role-binding for the splunk-ai-operator service account. This allows read-write access to the GCP bucket to retrieve Splunk AI artifacts.

Credentials via Kubernetes Secret or Workload Identity:

Configure credentials through either a Kubernetes secret (e.g., storing a GCP service account key in key.json) or use Workload Identity for secure access:

Example for creating the secret

kubectl create secret generic gcs-secret --from-file=key.json=path/to/your-service-account-key.json

Setup Azure Blob Access with Managed Identity

Azure Managed Identities can be used to provide IAM access to the blobs. With managed identities, the AKS nodes that host the pods can retrieve an OAuth token that provides authorization for the Splunk AI Operator pod to read the app packages stored in the Azure Storage account. The key point here is that the AKS node is associated with a Managed Identity, and this managed identity is given a role for read and write access called Storage Blob Data Contributor to the Azure Storage account.

Assumptions:

Steps to Assign Managed Identity:

  1. Create an Azure Resource Group

     az group create --name splunkAIOperatorResourceGroup --location westus2
    
  2. Create AKS Cluster with Managed Identity Enabled

     az aks create -g splunkAIOperatorResourceGroup -n splunkAIOperatorCluster --enable-managed-identity
    
  3. Get Credentials to Access Cluster

     az aks get-credentials --resource-group splunkAIOperatorResourceGroup --name splunkAIOperatorCluster
    
  4. Get the Kubelet User Managed Identity

    Run:

     az identity list
    

    Find the section that has <AKS Cluster Name>-agentpool under name. For example, look for the block that contains:

     {
       "clientId": "a5890776-24e6-4f5b-9b6c-**************",
       "id": "/subscriptions/<subscription-id>/resourceGroups/MC_splunkAIOperatorResourceGroup_splunkAIOperatorCluster_westus2/providers/Microsoft.ManagedIdentity/userAssignedIdentities/splunkAIOperatorCluster-agentpool",
       "location": "westus2",
       "name": "splunkAIOperatorCluster-agentpool",
       "principalId": "f0f04120-6a36-49bc--**************",
       "resourceGroup": "MC_splunkAIOperatorResourceGroup_splunkAIOperatorCluster_westus2",
       "tags": {},
       "tenantId": "8add7810-b62a--**************",
       "type": "Microsoft.ManagedIdentity/userAssignedIdentities"
     }
    

    Extract the principalId value from the output above. Alternatively, use the following command to get the principalId:

     az identity show --name <identityName> --resource-group "<resourceGroup>" --query 'principalId' --output tsv
    

    Example:

     principalId=$(az identity show --name splunkAIOperatorCluster-agentpool --resource-group "MC_splunkAIOperatorResourceGroup_splunkAIOperatorCluster_westus2" --query 'principalId' --output tsv)
     echo $principalId
    

    Output:

     f0f04120-6a36-49bc--**************
    
  5. Assign Read-Write Access for Kubelet User Managed Identity to the Storage Account

    Use the principalId from the above section and assign it to the storage account:

     az role assignment create --assignee "<principalId>" --role 'Storage Blob Data Contributor' --scope /subscriptions/<subscription_id>/resourceGroups/<storageAccountResourceGroup>/providers/Microsoft.Storage/storageAccounts/<storageAccountName>
    

    For Example:

    If <storageAccountResourceGroup> is splunkAIOperatorResourceGroup and <storageAccountName> is mystorageaccount, the command would be:

     az role assignment create --assignee "f0f04120-6a36-49bc--**************" --role 'Storage Blob Data Contributor' --scope /subscriptions/f428689e-c379-4712--**************/resourceGroups/splunkAIOperatorResourceGroup/providers/Microsoft.Storage/storageAccounts/mystorageaccount
    

    After this command, you can connect to Azure Blob without secrets.

Azure Blob Authorization Recommendations: