Troubleshoot the Splunk Add-on for Microsoft Office 365¶
General troubleshooting¶
For troubleshooting tips that you can apply to all add-ons, see Troubleshoot add-ons in Splunk Add-ons. For additional resources, see Support and resource links for add-ons in Splunk Add-ons.
Cannot ingest data after configuring a new application and tenant¶
The Splunk Add-on for Microsoft Office 365 requires Application permission to read the service health, activity data, and DLP policy events. Make sure these permissions are selected, saved and then granted within the Office 365 Management Activity API configuration on Azure Active Directory.
-
Navigate to the Enable Access pane in the Microsoft Azure Active Directory application configuration UI
-
Set the Application permissions.
- Read service health information for your organization
- Read activity data for your organization
- (Optional) Read DLP policy events including detected sensitive data
Note
Accessing DLP policy events requires an additional Microsoft Azure Active Directory subscription. If you are unable to ingest DLP policy events, make sure you have the correct Microsoft Azure Active Directory subscription. Refer to the Microsoft Azure Active Directory documentation for more information.
-
Click Save after you change permissions.
-
Click Grant permissions to finish applying the permission changes.
Cannot ingest Message Trace data after configuring a new application and tenant¶
HTTP Request error: 401 Client Error¶
The Splunk Add-on for Microsoft Office 365 requires ReportingWebService.Read.All. Verify this permission is selected, saved, and then granted within the Office 365 Management Activity API configuration on Azure Active Directory.
Certificate verify failed (_ssl.c:741) error message¶
If you create a new input, you might receive the following error message: certificate verify failed (_ssl.c:741)
Perform the following steps to resolve the error:
Navigate to $SPLUNK_HOME/etc/auth/cacert.pem
and open the cacert.pem file with a text editor. Copy the text from your deployment’s proxy server certificate, and paste it into the cacert.pem file. Save your changes.
SSL Cert Errors¶
O365 TA supports HTTP proxy only, so it will not work with the HTTPS proxy. Make sure the proxy configured in the add-on is of the HTTP type.
If there is ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1106) error, please check below.
- Check that HTTPS proxy is not set at the splunk(e.g. in splunk-launch.conf) or system level(https_proxy/http_proxy environment variables).
If there is a CERTIFICATE_VERIFY_FAILED error, make sure the required proxy server certificates and vendor-specific certificates are appended to the following available file paths:
- $SPLUNK_HOME/etc/apps/splunk_ta_o365/lib/certifi/cacert.pem
- $SPLUNK_HOME/lib/python3.7/site-packages/certifi/cacert.pem
Data collection stops working - HTTP errors¶
The Client Secrets in your Microsoft Azure deployment can rotate on a predefined schedule, according to your organization’s security requirements. If the secret is not updated in the Splunk Add-on for Microsoft Office 365, data collection will stop. You may see HTTP Error 401 - Unauthorized
or HTTP Error 500 - Internal Server Error
in the logs.
- Navigate to the Splunk Web home screen.
- Click on Splunk Add-on for Microsoft Office 365 in the left navigation banner.
- Click on the Configuration > Tenant tab.
- Select the Tenant that needs an updated Client Secret and click Edit.
- Select Change and update the Client Secret.
- Click Update to save the changes.
Audit events are delayed or missing¶
As the number of events in your deployment increases, the Splunk Add-on for Microsoft Office 365 may not be able return all events in one query before the next query executes, and events from the previous query may be delayed or even missed. One root cause for this can be the number of threads that are available and used to collect the necessary data sets. If events are being queued, you can increase the number of threads in increments of four until all events are returned in one query.
- Navigate to
$SPLUNK_HOME/etc/apps/splunk_ta_o365/local
, and create aninputs.conf
file, if it does not already exist. - Add the following stanza to the
$SPLUNK_HOME/etc/apps/splunk_ta_o365/local/inputs.conf
file.
[splunk_ta_o365_management_activity]
interval = 300
disabled = 0
sourcetype = o365:management:activity
number_of_threads = 4
- Increase the number of threads in increments of 4. The maximum number of threads is 64.
Note
Increase the thread count gradually until it stops boosting performance. Avoid having high thread count unless the system is of high specifications and you are observing performance improvement with increase in threads.
- Restart Splunk.
- Test to see if all events are being returned:
Splunk Search
index=_internal sourcetype=”splunk:ta:o365:log” message=”Ingesting content success.” | eval content_time = strptime(content_id, “%Y%m%d%H%M%S”) | chart count by content_time span=600
You can add a filter on the data_input
field to narrow down the search for a particular data input:
Splunk Search
index=_internal sourcetype=”splunk:ta:o365:log” message=”Ingesting content success.” data_input=my_test_input | eval content_time = strptime(content_id, “%Y%m%d%H%M%S”) | chart count by content_time span=600
Change my_test_input
to the data input name you would like to check.
You could also deploy the Splunk Add-on for Microsoft Office 365 as a tuned standalone add-on to capture Microsoft Azure Active Directory audit events separately from Service Events and Service Messages.
Data ingestion stops on Debian or Ubuntu Linux Server¶
Splunk Enterprise launches modular inputs under a shell process on Debian or Ubuntu Linux Server and this can block new modular input
instances. If you are running the add-on with Debian or Ubuntu Linux Server, set the option start_by_shell = false
in each stanza of
inputs.conf
.
- Navigate to
$SPLUNK_HOME/etc/apps/splunk_ta_o365/local
, and create aninputs.conf
file, if it does not already exist. - Add the folowing stanzas to the
$SPLUNK_HOME/etc/apps/splunk_ta_o365/local/inputs.conf
file:
[splunk_ta_o365_management_activity]
interval = 300
disabled = 0
sourcetype = o365:management:activity
number_of_threads = 4
start_by_shell = false
[splunk_ta_o365_service_status]
interval = 1800
disabled = 0
sourcetype = o365:service:status
start_by_shell = false
[splunk_ta_o365_service_message]
interval = 600
disabled = 0
sourcetype = o365:service:message
start_by_shell = false
- Restart Splunk.
Data collection hangs while calling the Office 365 management API¶
While calling the Office 365 management API, you receive the following error message in your logs.
ReadTimeout: HTTPSConnectionPool(host='manage.office.com', port=443): Read timed out. (read timeout=60)
The modular input is hung during data collection. Configure the request_timeout
parameter in inputs.conf
.
Data ingestion stops for management activity¶
If data collection for the management activity input stops, and you receive the following message in your error logs.
message="failed to get error code" body="{\"Message\":\"Authorization has been denied for this request.\"}"
Configure token_refresh_window
parameter in inputs.conf
. Enter the number of seconds before the token’s expiration time when the token should be refreshed. The range for the parameter is from 400 seconds to 3600 seconds. See the inputs.conf.spec
file in the README directory for this add-on for more information.
Data duplication issues when fetching multiple content URLs¶
Microsoft’s o365:management:activity
API is not like typical event services and does not forward actual events. The API is a front end to an at-least-once delivery message bus, and returns lists of urls pointing to data, and not unique events. With each call to this API, the API clients (like the Splunk software) retrieve new events by time. But the at-least-once nature of the API means that clients get instructed to
process the same set of data more than once.
This API design from Microsoft provides assurance that both internal and external failures in process will avoid lost events. A consequence of this design assurance is the occasional duplication of events whenever there is any doubt about the delivery of a message. This API design is highly scalable, as it does not require consistency or checkpoints from the O365 API.
Modular inputs have the ability to manage checkpoints such as counters and last queried time. However for the sake of performance, modular inputs in this add-on are stateless and do not retain data from previous calls, so cannot determine if the current or prior thread has been given the same content by value or key/identifier. This design is intentional, in order to minimize the overhead of high volume interfaces.
Typically these duplicate events from the API should have minimal impact on most use cases, but can impact some aggregate (threshold) or anomaly detection use cases. If these events impact your use case significantly, the best practice is to either raise a request with Microsoft for any possible enhancements to the API design, or alternatively build a message-format compatible webhook using Azure functions or other serverless technologies or any of the available API gateway solutions, that can be used to check for duplicate events by maintaining a history of messages sent by the API over a period of time. This alternative solution can also easily send data to Splunk via the HTTP Event Collector (HEC).
Service health information is not getting ingested¶
If your service health information is not getting ingested, check to see if you are using the ServiceHealth.Read.All API from Office 365 Management APIs, or the ServiceHealth.Read.All API in Microsoft Graph.
The ServiceHealth.Read.All API from Office 365 Management APIs was retired by Microsoft on December 17, 2021. Use ServiceHealth.Read.All API in Microsoft Graph
If upgrading to version 3.0.0 or later, disable ServiceHealth.Read.All in Office 365 Management APIs, and enable ServiceHealth.Read.All in Microsoft Graph.
Input page not showing any configured inputs with “Unexpected Error” shown on UI¶
Troubleshoot the “Unexpected Error” in input page shown on UI
-
While configuring inputs, if the "Content Type" field is not selected.
- Determine the inputs without the "Content Type" field.
- Go to "Settings" -> "Data Inputs"
- Find out the already configured Office 365 Add-on Inputs which don't have the "Content Type" field provided.
- Delete those inputs from the same "Data Inputs" UI
- Reconfigure your inputs using the appropriate Content Type.
-
If any input is configured from "Settings" -> "Data Inputs" UI, then validations handled in the Add-on will be skipped resulting in the above error.
- Delete those inputs from the same "Data Inputs" UI
- Reconfigure your inputs using the appropriate Content Type
Data ingestion stops for cloud application security input¶
If data collection for the cloud application security input stops, and you receive the following message in your error logs.
message="Error retrieving Cloud Application Security messages." exception=401:{"detail":"Invalid token header. Token string should not contain spaces."}
One of the reasons for the this error is because of issues following the upgrade steps to migrate to versions 4.1.0 and higher. For more information, see the upgrade topic in this manual.
Duplicate events for Cloud App Security and Management Activity¶
Problem¶
You encounter duplicate events for Cloud App Security and Management Activity data ingestion.
Possible solution¶
After upgrading the Splunk Add-on for Microsoft Office 365 to version 4.1.0, due to a change in checkpoint logic, your Splunk platform deployment might receive duplicate events for a maximum of 7 days. Duplicate events will stop ingesting after 7 days. You may observe a rise in the usage of your deployment’s memory/CPU resources.