Troubleshoot the Splunk Add-on for Microsoft Cloud Services¶
For helpful troubleshooting tips that you can apply to all add-ons, see Troubleshoot add-ons in Splunk Add-ons. For additional resources, see Support and resource links for add-ons in Splunk Add-ons.
Accessing logs of Azure inputs¶
The following table describes the logs for different inputs:
Log File | Sourcetype | Description | Troubleshooting SPL search |
---|---|---|---|
splunk_ta_microsoft-cloudservices_storage_table.log |
mscs:storage:table:log |
Azure Storage Table and VIrtual Machine Metrics channel log | index=_internal sourcetype=”mscs:storage:table:log” ERROR |
splunk_ta_microsoft-cloudservices_storage_blob.log |
mscs:storage:blob:log |
Azure Storage Blob channel log | index=_internal sourcetype=”mscs:storage:blob:log” ERROR |
splunk_ta_microsoft-cloudservices_azure_resource.log |
mscs:azure:resource:lo |
Azure Resource channel log | index=_internal sourcetype=”mscs:azure:resource:log” ERROR |
splunk_ta_microsoft-cloudservices_azure_audit.log |
mscs:azure:audit:log |
Azure Audit Log Channel related log | index=_internal sourcetype=”mscs:azure:audit:log” ERROR |
Checkpoint directories¶
The following data sources are stored in the following directories:
Data source | Directory |
---|---|
Azure Storage Table | $SPLUNK_HOME/var/lib/splunk/modinputs/mscs_storage_table |
Azure Storage Blob | $SPLUNK_HOME/var/lib/splunk/modinputs/mscs_storage_blob |
Azure Resource | n/a |
Azure Audit Log | $SPLUNK_HOME/var/lib/splunk/modinputs/mscs_azure_audit |
Cannot get data in¶
If you can’t get data in using Azure Resource and Azure Audit, follow these steps:
- If you can’t get data, check that you are using the correct Client ID, Client Secret, and Tenant ID. See Grant the Active Directory Application Read Access.
- Use the search in the Accessing logs of Azure inputs table to check for errors.
- If you have no errors and cannot collect data, remove the checkpoint file and try again.
If you can’t get data in using Azure Storage Table, Azure Storage Blob and Azure Virtual Machine Metrics:
- If you can’t get data, check that you are using the correct Account Name and Account Secret.
- Use the search in the Accessing logs of Azure inputs table to check for errors.
- If you have no errors and cannot collect data, remove the checkpoint file and try again.
Truncated events¶
The default number of maximum lines for any event in the Splunk software is 256. If the number of lines in an event exceeds this limit, then the Splunk software truncates the event. If the maximum number of lines in a file exceeds the default, change the max_events
setting in the props.conf file under the file’s source type stanza.
To increase the character limit beyond 10K bytes in a single line, use the truncate
setting in the props.conf file to define the size of the line.
See Props.conf in the Splunk Enterprise Admin manual.
Scripted inputs causing a spike in CPU percentage¶
If your Microsoft Cloud Services deployment experiences a CPU spike after you install and configure the Splunk Add-on for Microsoft Cloud Services, your deployment might have too many inputs enabled and too short an interval in the code.
To fix this issue, follow these steps:
- Navigate to your Task Manager, and verify a high amount of python.exe tasks.
- Increase intervals in proportion to the number of inputs you have configured in your deployment.
- Save your changes.
Event Hub input using the old proxy or account configuration even if the configurations are changed from UI¶
If the Event hub input is using the old proxy/account configuration, turn off and then turn on the Event Hub input so the new configurations are reloaded.
Azure KQL Log Analytics Input - PartialError¶
If you are seeing “PartialError” in logs, then a possible cause is that the Azure Log Analytics workspace API used in the input has limits on the maximum number of events and maximum size of responses returned from the API. If configured KQL Query has results that exceed the default API limits, then partial events will be returned and ingested into Splunk. Check for error messages in the input log files for more information on possible ways to optimize KQL Query.
Azure Metrics Input - ‘code’: ‘RateLimiting’ Error¶
If you repeatedly face the Rate Limiting error, try to resolve it by merging the multiple metrics inputs to one input. Add the comma separated metric namespaces in the namespaces field while configuring the input.
Azure Storage Blob Input - Data Ingestion stuck issue¶
If the Storage Blob input data ingestion gets stuck, it may occur when the API service accepts the request but fails to return a response, causing the thread to get stuck until a response is received. To address this, the SDK includes 80000 seconds (around 22 hours) read timeout so that it fails and retries the same request, resuming the data collection. As a workaround to this API issue, Input configuration includes the Read Timeout parameter which can be used to set lower read timeout value (instead of 80000 seconds) to resume data collection sooner. If Read Timeout parameter is set to a very small value, the input might start to report the read timeout error, which will cause a data ingestion issue. This is because it increases the value to the point which works best without causing the read timeout error. Please refer to Storage Blob input configuration manual for more details about the Read Timeout parameter.
Error message: azure.core.exceptions.ResourceModifiedError: The condition specified using HTTP conditional header(s) is not met¶
If you receive the following error in your Storage Blob input:
azure.core.exceptions.ResourceModifiedError: The condition specified using HTTP conditional header(s) is not met.
This is expected behavior from your Storage Blob because the Splunk Add-on for Microsoft Cloud Services is trying to download the blob file at the same time that blob file is being updated or modified in the Azure Portal.
After the blob file’s modification is completed in your Azure portal, the blob data is collected at the next scheduled interval.
Increased CPU usage for enabled Event Hub inputs due to constant occurrence of errors¶
Whenever Event Hub inputs are enabled and errors constantly occur, CPU usage will increase. The workaround is to disable the corresponding Event Hub inputs that are experiencing these errors, resolve the errors, and then re-enable the Event Hub inputs.
Authentication Put-Token failed. Retries exhausted.¶
This error can occur for multiple different reasons, but one common reason is that the configured Event Hub has been deleted from your Azure portal while data collection is still happening.
An error occurred while load-balancing and claiming ownership. The exception is AssertionError(). Retrying after 10.527246428161302 seconds.¶
This error occurs when the configured Event Hub has more than 64 partitions. The Event Hub input for the Splunk Add-on for Microsoft Cloud Services is only be able to collect data if the configured Event Hub has a maximum of 64 partitions. The fix for this error is to delete the affected Event Hub and create a new one with up to 64 partitions.
Password corruption for Azure App Account upon add-on version downgrade¶
There is a potential risk of password corruption for your Azure App Account when downgrading your version of the Splunk Add-on for Microsoft Cloud Services.
To resolve this issue, reconfigure the affected app accounts and then re-enable the associated inputs with that app account.