Alerting and Monitoring with Splunk IT Service Intelligence
Author
Doug Erkkila
Workshop: Monitoring and Alerting with Splunk IT Service Intelligence
This hands-on workshop is designed specifically for anyone looking to effectively demonstrate and position the combined power of Splunk Enterprise, AppDynamics, Splunk Observability Cloud, and Splunk IT Service Intelligence (ITSI). Participants will gain practical experience integrating these platforms, focusing on real-world scenarios and use cases that resonate with potential clients. The workshop emphasizes translating technical capabilities into business value, enabling Solution Architects to confidently showcase how these solutions address critical customer challenges.
Introduction and Overview
In today’s complex IT landscape, ensuring the performance and availability of applications and services is paramount. This workshop will introduce you to a powerful combination of tools – Splunk, AppDynamics, Splunk Observability Cloud, and Splunk IT Service Intelligence (ITSI) – that work together to provide comprehensive monitoring and alerting capabilities.
The Challenge of Modern Monitoring
Modern applications often rely on distributed architectures, microservices, and cloud infrastructure. This complexity makes it challenging to pinpoint the root cause of performance issues or outages. Traditional monitoring tools often focus on individual components, leaving gaps in understanding the overall health and performance of a service.
The Solution: Integrated Observability
A comprehensive observability strategy requires integrating data from various sources and correlating it to gain actionable insights. This workshop will demonstrate how Splunk, AppDynamics, Splunk Observability Cloud, and ITSI work together to achieve this:
AppDynamics: Provides deep Application Performance Monitoring (APM). It instruments applications to capture detailed performance metrics, including transaction traces, code-level diagnostics, and user experience data. AppDynamics excels at identifying performance bottlenecks within the application.
Splunk Observability Cloud: Offers full-stack observability, encompassing infrastructure metrics, distributed traces, and logs. It provides a unified view of the health and performance of your entire infrastructure, from servers and containers to cloud services and custom applications. Splunk Observability Cloud helps correlate performance issues across the entire stack.
Splunk: Acts as the central platform for log analytics, security information and event management (SIEM), and broader data analysis. It ingests data from AppDynamics, Splunk Observability Cloud, and other sources, enabling powerful search, visualization, and correlation capabilities. Splunk provides a holistic view of your IT environment.
Splunk IT Service Intelligence (ITSI): Provides service intelligence by correlating data from all the other platforms. ITSI allows you to define services, map dependencies, and monitor Key Performance Indicators (KPIs) that reflect the overall health and performance of those services. ITSI is essential for understanding the business impact of IT issues.
Workshop Objectives
By the end of this workshop, participants will be able to:
Articulate the complementary value proposition of Splunk, AppDynamics, Splunk Observability Cloud, and Splunk IT Service Intelligence within a comprehensive observability strategy.
Confidently demonstrate the integration points between these platforms, highlighting data flow and correlation capabilities.
Create and configure basic alerts in Splunk Enterprise, AppDynamics, and Splunk Observability Cloud showcasing practical alerting scenarios.
Build and present compelling demonstrations of Key Performance Indicator (KPI) creation and alerting in Splunk ITSI, emphasizing service-centric monitoring.
Explain and demonstrate the value of episodes in Splunk ITSI for improved incident management and reduced MTTR.
Translate technical features into business outcomes, focusing on ROI and addressing specific customer pain points.
Tip
The easiest way to navigate through this workshop is by using:
the left/right arrows (< | >) on the top right of this page
the left (◀️) and right (▶️) cursor keys on your keyboard
Subsections of Alerting and Monitoring with Splunk IT Service Intelligence
Getting Started
Monitoring and Alerting with Splunk, AppDynamics, and Splunk Observability Cloud
Introduction and Overview
In today’s complex IT landscape, ensuring the performance and availability of applications and services is paramount. This workshop will introduce you to a powerful combination of tools – Splunk, AppDynamics, Splunk Observability Cloud, and Splunk IT Service Intelligence (ITSI) – that work together to provide comprehensive monitoring and alerting capabilities.
The Challenge of Modern Monitoring
Modern applications often rely on distributed architectures, microservices, and cloud infrastructure. This complexity makes it challenging to pinpoint the root cause of performance issues or outages. Traditional monitoring tools often focus on individual components, leaving gaps in understanding the overall health and performance of a service.
The Solution: Integrated Observability
A comprehensive observability strategy requires integrating data from various sources and correlating it to gain actionable insights. This workshop will demonstrate how Splunk, AppDynamics, Splunk Observability Cloud, and ITSI work together to achieve this:
Splunk: Acts as the central platform for log analytics, security information and event management (SIEM), and broader data analysis. It ingests data from AppDynamics, Splunk Observability Cloud, and other sources, enabling powerful search, visualization, and correlation capabilities. Splunk provides a holistic view of your IT environment.
Splunk Observability Cloud: Offers full-stack observability, encompassing infrastructure metrics, distributed traces, and logs. It provides a unified view of the health and performance of your entire infrastructure, from servers and containers to cloud services and custom applications. Splunk Observability Cloud helps correlate performance issues across the entire stack.
AppDynamics: Provides deep Application Performance Monitoring (APM). It instruments applications to capture detailed performance metrics, including transaction traces, code-level diagnostics, and user experience data. AppDynamics excels at identifying performance bottlenecks within the application.
Splunk IT Service Intelligence (ITSI): Provides service intelligence by correlating data from all the other platforms. ITSI allows you to define services, map dependencies, and monitor Key Performance Indicators (KPIs) that reflect the overall health and performance of those services. ITSI is essential for understanding the business impact of IT issues.
Data Flow and Integration
A key concept to understand is how data flows between these platforms:
Splunk Observability Cloud and AppDynamics collect data: They monitor applications and infrastructure, gathering performance metrics and traces.
Data is sent to Splunk: AppDynamics and Splunk Observability Cloud integrate with Splunk to forward their collected data alongside logs sent directly to Splunk.
Splunk analyzes and indexes data: Splunk processes and stores the data, making it searchable and analyzable.
ITSI leverages Splunk data: ITSI uses the data in Splunk to create services, define KPIs, and monitor the overall health of your IT operations.
Workshop Objectives
By the end of this workshop, you will:
Understand the complementary roles of Splunk, AppDynamics, Splunk Observability Cloud, and ITSI.
Create basic alerts in Splunk, Observability Cloud and AppDynamics.
Configure a new Service and a simple KPI and alerting in ITSI.
Understand the concept of episodes in ITSI.
This workshop provides a foundation for building a robust observability practice. We will focus on the alerting configuration workflows, preparing you to explore more advanced features and configurations in your own environment. We will not be covering ITSI or Add-On installation and configuration.
Here are the instructions on how to access your pre-configured Splunk Enterprise Cloud instance.
Subsections of 1. Getting Started
How to connect to your workshop environment
Starting up your Workshop
This workshop is available on Splunk Show and will take some time to start up all of your resources. It contains a Splunk environment with IT Service Intelligence, the Splunk Infrastructure Monitoring Add-On, as well as the recently updated AppDynamics Add-on all preconfigured.
The Workshop is titled “Tech Summit 2025: OBS-122” or you can go directly to it’s entry on Splunk Show. It takes approximately 15 minutes to start up however data generation and ingestion will take up to a half hour.
Splunk Observability Cloud Access
Creating an alert in Observability Cloud should be done in the Observability Cloud US1 Show Playground Org.
Creating Basic Alerts
Setting Up Basic Alerts in Splunk Enterprise, AppDynamics, and Splunk Observability Cloud
This section covers the creation of basic alerts in Splunk Enterprise, AppDynamics, and Splunk Observability Cloud. These examples focus on simplicity and demonstrating the core concepts. Remember that real-world alerting scenarios often require more complex configurations and thresholds.
1. Splunk Enterprise Alerts
Splunk alerts are triggered by search results that match specific criteria. We’ll create a real-time alert that notifies us when a certain condition is met.
Scenario: Alert when the number of “Invalid user” events in the “main” index exceeds 100 in the last 5 minutes.
Steps:
Create a Search: Start by creating a Splunk search that identifies the events you want to alert on. For example:
index=main "Invalid user"
Use the time picker to select “Last 15 minutes”".
Configure the Alert:
Click “Save As” and select “Alert.”
Give your alert a descriptive name (e.g., “Numerous Invalid User Logins Attempted”).
Alert type:
Scheduled: Choose “Scheduled” to evaluate the search on a set schedule. Below scheduled will be the button to select the frequency, select “Run on Cron Schedule”.
Cron Expression: */15 * * * *
Triggered when: Select “Number of results” “is greater than” “100.”
Time Range: Set to “15 minutes.”
Trigger Actions:
For this basic example, choose “Add to Triggered Alerts.” In a real-world scenario, you’d configure email notifications, Slack integrations, or other actions.
Save: Save the alert.
Explanation: This alert runs the search every 15 minutes and triggers if the search returns more than 100 results. The “Add to Triggered Alerts” action simply adds a Alert to the Splunk Triggered Alerts list.
Time Ranges and Frequency: Since everything in Splunk core is a search, you need to consider the search timespan and frequency so that you are not a) searching the same data multiple times with an overlap timespan, b) missing events because of a gap between timespan and frequency, c) running too frequently and adding overhead or d) running too infrequently and experiencing delays in alerting.
2. Splunk Observability Cloud Alerts (Detectors)
Create a Detector:
Click “Detectors & SLOs” in the lefthand menu
Click “Create Detector -> Custom Detector”
Give the detector a descriptive name (e.g., “High CPU Utilization Alert - INITIALS”).
Signal:
Select the metric you want to monitor (“cpu.utilization”).
Add any necessary filters to specify the host (service.name:otelshop-loadgenerator).
Click “Proceed to Alert Condition”
Condition:
Select Static Threshold
Set the threshold: “is above” “90”
Notifications:
For this example, choose a simple notification method (e.g., a test webhook). In a real-world scenario, you would configure integrations with PagerDuty, Slack, or other notification systems.
Save: Save the detector.
Explanation: This detector monitors the CPU utilization metric for the specified service. If the CPU utilization exceeds 90% for the configured “for” duration, the detector triggers the alert and sends a notification.
Important Considerations for All Platforms:
Thresholds: Carefully consider the thresholds you set for your alerts. Too sensitive thresholds can lead to alert fatigue, while thresholds that are too high might miss critical issues.
Notification Channels: Integrate your alerting systems with appropriate notification channels (email, SMS, Slack, PagerDuty) to ensure that alerts are delivered to the right people at the right time.
Alert Grouping and Correlation: For complex systems, implement alert grouping and correlation to reduce noise and focus on actionable insights. ITSI plays a critical role in this.
Documentation: Document your alerts clearly, including the conditions that trigger them and the appropriate response procedures.
These examples provide a starting point for creating basic alerts. As you become more familiar with these platforms, you can explore more advanced alerting features and configurations to meet your specific monitoring needs.
Creating Services in ITSI
Creating Services in ITSI with Dependencies Based on Entity Type
This workshop outlines how to create a service in Splunk IT Service Intelligence (ITSI) using an existing entity and establishing dependencies based on the entity’s type. We’ll differentiate between entities representing business workflows from Splunk Observability Cloud and those representing AppDynamics Business Transactions.
Scenario:
We have two existing services: “Online-Boutique-US” (representing an application running in Kubernetes and being monitored by Splunk Observability Cloud) and “AD.ECommerce” (representing an application monitored by AppDynamics). We want to create a new service and add it as a dependent of one of those services. It is not necessary to create a service for both during your first run through this workshop so pick one that you are more interested in to start with.
Return to your Splunk Environment and under Apps, select IT Service Intelligence
In the Default Analyzer update the Filter to “Buttercup Business Health”
Subsections of 3. Creating Services in ITSI
Creating an O11y Based Service
Starting with an Observability Cloud Based Service
Access Services: In ITSI click “Configuration”, click on “Services”.
Create New Service: PaymentService2: Click “Create New Service”.
Service Details (PaymentService2):
Title: “PaymentService2”
Description (Optional): e.g., “Payment Service for Hipster Shop - version 2”
Select Template: Choose “Link service to a service template” and search for “Splunk APM Business Workflow KPIs” from the template dropdown. Click Create to save the new service.
Entity Assignment:
The page will load and display the new Service and you will be on the Entities page. This demo defaults to selecting the paymentservice:grpc.hipstershop.PaymentService/Charge entity. In a real world situation you would need to match the workflow to the entity name manually.
Direct Entity Selection (If Available): Search for the entity using sf_workflow="paymentservice:grpc.hipstershop.PaymentService/Charge" and select it.
Save Service (PaymentService2): Click “Save” to create “PaymentService2”.
Settings: Click the “Settings” tab, enable Backfill and keep that standard 7 days. Enable the Service, and click “Save”
Setting PaymentService2’s Service Health as a Dependency for Online-Boutique-US
Locate Online-Boutique-US: Find the “Online-Boutique-US” service in the service list.
Edit Online-Boutique-US: Click “Edit”.
Service Dependencies: Look for the “Service Dependencies” section.
Add Dependency: There should be an option to add a dependent service. Search for “PaymentService2”.
Select KPI: Check the box next to ServiceHealthScore for PaymentService2.
Save Changes: Save the changes to the “Online-Boutique-US” service.
Verification
Click on “Service Analyzer” and select the “Default Analyzer”
Filter the service to just “Buttercup Business Health”
Verify that PaymentService2 is now present below Online-Boutique-US and should be in a grey status.
Creating an AppD Based Service
Starting with an AppDynamics Based Service
Access Services: In ITSI click “Configuration”, click on “Services”.
Create Service: AD-Ecommerce2: Click “Create Service -> Create Service”.
Service Details (AD-Ecommerce2):
Title: “AD-Ecommerce2”
Description (Optional): e.g., “Ecommerce Service - version 2”
Select Template: Choose “Link service to a service template” and search for “AppDynamics App Performance Monitoring” from the template dropdown. Click Create to save the new service.
Entity Assignment:
The page will load and display the new Service and you will be on the Entities page. This demo defaults to selecting the AD-Ecommerce:18112:demo1.saas.appdynamics.com entity. In a real world situation you would need to match the entity_name to the entity name manually.
Direct Entity Selection (If Available): Search for the entity using entity_name="AD-Ecommerce:18112:demo1.saas.appdynamics.com" and select it.
Settings: Click the “Settings” tab, enable Backfill and keep that standard 7 days. Enable the Service, and click “Save”
Setting AD-Ecommerce2’s Service Health as a Dependency for AD.Ecommerce
Navigate back to Services page: Click “Configuration -> Services”
Locate AD.Ecommerce: Find the “AD.Ecommerce” service in the service list.
Edit AD.Ecommerce: Click “Edit”.
Service Dependencies: Look for the “Service Dependencies” section.
Add Dependency: There should be an option to add a dependent service. Search for “AD-Ecommerce2”.
Select KPI: Check the box next to ServiceHealthScore for AD-Ecommerce2.
Save Changes: Save the changes to the “AD.Ecommerce” service.
Verification
Click on “Service Analyzer” and select the “Default Analyzer”
Filter the service to just “Buttercup Business Health”
Verify that AD-Ecommerce2 is now present below AD.Ecommerce and should be in a grey status.
Creating Alerts in ITSI
Configuring a Basic Alert in Splunk ITSI
This section guides you through configuring a basic alert in Splunk IT Service Intelligence (ITSI). We’ll set up an alert that triggers when our previously created Service breaches a KPI threshold.
Depending on the Service You Created, the KPI we use for this alert will change. In the instruction steps below replace Service Name and KPI appropriately
PaymentService2: Business Workflow Error Rate
AD-Ecommerce2: Availability
Steps:
Navigate to the KPI:
In ITSI, go to “Configuration” -> “Correlation Searches”
You will need to wait 5-10 minutes for the alert to run
The alert will be listed in the “Alerts and Episodes” Pane in ITSI.
Important Considerations:
Alert Fatigue: Avoid setting up too many alerts or alerts with overly sensitive thresholds. This can lead to alert fatigue, where people become desensitized to alerts and might miss critical issues.
Creating Episodes in ITSI
Creating an Aggregation Policy in Splunk ITSI
This section outlines the steps to create an aggregation policy in Splunk ITSI that matches the alerts we just set up. This policy will group related alerts, reducing noise and improving incident management.
Depending on the Alert You Created, the title we use for this alert will change. In the instruction steps below replace AlertName with the Service Name used
PaymentService2 or
AD-Ecommerce2
Steps
Navigate to Notable Event Aggregation Policies: In Splunk, go to “Configuration” -> “Notable Event Aggregation Policies”.
Create New Policy: click the green “Create Notable Event Aggregation Policy” button in the upper right corner.
Filtering Criteria: This is the most important part. You’ll define the criteria for alerts to be grouped by this policy. Click “Add Rule (OR)”
Field: Select “title” from the dropdown menu.
Operator: Choose “matches”.
Value: Enter the string “Service Name*”. (make sure to include the *)
Splitting Events: Remove the “hosts” field that is provided by default and update it to use the “service” field. We want this generating new episodes for each Service that is found. In our example, it should only be 1.
Breaking Criteria: Configure how Episodes are broken or ended. We’ll leave it as the default “If an event occurs for which severity = Normal”. Click Preview on the right to confirm it is picking up our Alert
Click Next
Actions (Optional): Define actions to be taken on aggregated alerts. For example, you can automatically create a ticket in ServiceNow or send an email notification. We’re going to skip this part.
Click Next
Policy Title and Description:
Policy Title:Service Name Alert Grouping
Description: Grouping Service Name alerts together.
Save Policy: Click the “Next” button to create the aggregation policy.
Verification
After saving the policy, navigate to the “Go to Episode Review” page and filter alerts for last 15 minutes and add a filter to status=New and search for our Service Name in the search box.
There may already be an episode named after our specific alert already, if so, close it out and wait for a new one to be generated with our new Title.
Using Observability Cloud Detectors in ITSI
Part 2: Sending Alerts from Splunk Observability Cloud to Splunk ITSI
Since we have a detector configured in Splunk Observability Cloud that we set up earlier, the next step is to ensure that when it triggers an alert, this alert is sent to Splunk IT Service Intelligence (ITSI). This integration allows ITSI to ingest these alerts as notable events, which can then be correlated with other events and contribute to service health scores. The most common method to achieve this is by using a webhook in Splunk Observability Cloud to send alert data to an HTTP Event Collector (HEC) endpoint configured in Splunk ITSI.
Step 1: Configure an HTTP Event Collector (HEC) in Splunk (ITSI)
Before Splunk Observability Cloud can send alerts to ITSI, you need an HEC endpoint in your Splunk instance (where ITSI is running) to receive them.
Log in to your Splunk Enterprise or Splunk Cloud instance that hosts ITSI.
Navigate to Settings > Data Inputs.
Click on HTTP Event Collector.
Click Global Settings. Ensure HEC is enabled. If not, enable it and specify a default port (e.g., 8088, though this might be managed differently in Splunk Cloud).
Click New Token.
Give your HEC token a descriptive name, for example, o11y_alerts_for_itsi.
For Source name override, you can optionally specify a sourcetype, or leave it blank to specify it in Observability Cloud or let it default.
For Default Index, select an appropriate index where ITSI can access these events. Often, there’s a dedicated index for ITSI events, or you might use a general events index like main or itsi_event_management.
Ensure the token is enabled and click Submit.
Copy the Token Value that is generated. You will need this for the webhook configuration in Splunk Observability Cloud.
Step 2: Configure a Webhook Integration in Splunk Observability Cloud
Now, return to Splunk Observability Cloud to set up the webhook that will use the HEC token you just created.
In Splunk Observability Cloud, navigate to Data Management > Available Integrations.
Look for an option to add a new Splunk platform.
Give the Integration a name, for example, Splunk ITSI HEC.
In the URL field, enter the HEC endpoint URI for your Splunk instance. This will typically be in the format https://<your-splunk-hec-host-or-ip>:<hec-port>/services/collector/event.
You will need to add an HEC token that you created earlier.
For the Payload, you need to construct a JSON payload that ITSI can understand. Splunk Observability Cloud provides an out of the box payload configured to include fields needed for ITSI event correlation.
Review the Integration and click Save
Step 3: Update the Detector to Use the Webhook
Now, go back to the detector you created in Part 1 and update its notification settings to use the newly configured webhook.
Navigate to Detectors & SLOs in Splunk Observability Cloud.
Find and edit the detector you created for EC2 CPU utilization.
Click the Alert rule that we created earlier
Go to the Alert Recipients section.
Click Add recipient > Splunk platform and select the integration you just configured (Splunk ITSI HEC) for the desired alert severities (e.g., Critical, Warning).
Save the changes to your detector.
Step 4: Validate
To test the integration, you can wait for a genuine alert to trigger or, if your detector settings allow, you might be able to manually trigger a test alert or temporarily lower the threshold to force an alert. Once an alert triggers in Splunk Observability Cloud, it should send the payload via the webhook to your Splunk HEC endpoint.
Verify in Splunk by searching your target index (e.g., index=itsi_event_management sourcetype=o11y:itsi:alert host=<your-ec2-instance-id>). You should see the event data arriving from Splunk Observability Cloud.
With these steps, alerts from your Splunk Observability Cloud detector are now being sent to Splunk ITSI. Correlating Events and generating Notables all function exactly the same as we covered earlier in this workshop.