Subsections of 4. Splunk APM
1. APM Explore
When you click into the APM section of Splunk Observability Cloud you are greated with an overview of your APM data including top services by error rates, and R.E.D. metrics for services and workflows.
The APM Service Map displays the dependencies and connections among your instrumented and inferred services in APM. The map is dynamically generated based on your selections in the time range, environment, workflow, service, and tag filters.
You can see the services involved in any of your APM user workflows by clicking into the Service Map. When you select a service in the Service Map, the charts in the Business Workflow sidepane are updated to show metrics for the selected service. The Service Map and any indicators are syncronized with the time picker and chart data displayed.
Exercise
- Click on the wire-transfer-service in the Service Map.

Splunk APM also provides built-in Service Centric Views to help you see problems occurring in real time and quickly determine whether the problem is associated with a service, a specific endpoint, or the underlying infrastructure. Let’s have a closer look.
Exercise
- In the right hand pane, click on wire-transfer-service in blue.

2. APM Service View
Service View
As a service owners you can use the service view in Splunk APM to get a complete view of your service health in a single pane of glass. The service view includes a service-level indicator (SLI) for availability, dependencies, request, error, and duration (RED) metrics, runtime metrics, infrastructure metrics, Tag Spotlight, endpoints, and logs for a selected service. You can also quickly navigate to code profiling and memory profiling for your service from the service view.

Exercise
- In the Time box change the timeframe to -1h. Note how the charts update.
- These charts are very useful to quickly identify performance issues. You can use this dashboard to keep an eye on the health of your service.
- Scroll down the page and expand Infrastructure Metrics. Here you will see the metrics for the Host and Pod.
- Runtime Metrics are not available as profiling data is not available for services written in Node.js.
- Now let’s go back to the explore view, you can hit the back button in your Browser

Exercise
In the Service Map hover over the wire-transfer-service. What can you conclude from the popup service chart?
The error percentage is very high.

We need to understand if there is a pattern to this error rate. We have a handy tool for that, Tag Spotlight.
3. APM Tag Spotlight
Exercise
- To view the tags for the wire-transfer-service click on the wire-transfer-service and then click on Tag Spotlight in the right-hand side functions pane (you may need to scroll down depending upon your screen resolution).* Once in Tag Spotlight ensure the toggle Show tags with no values is off.

The views in Tag Spotlight are configurable for both the chart and cards. The view defaults to Requests & Errors.
It is also possible to configure which tag metrics are displayed in the cards. It is possible to select any combinations of:
- Requests
- Errors
- Root cause errors
- P50 Latency
- P90 Latency
- P99 Latency
Also ensure that the Show tags with no values toggle is unchecked.
Scroll through the cards and get familiar with the tags provided by the wire-transfer-service’s telemetry.
Exercise
Which card exposes the tag that identifies what the problem is?
The version card. The number of requests against v350.10
matches the number of errors i.e. 100%
Now that we have identified the version of the wire-transfer-service that is causing the issue, let’s see if we can find out more information about the error. Press the back button on your browser to get back to the Service Map.
4. APM Service Breakdown
Exercise
- Select the wire-transfer-service in the Service Map.
- In the right-hand pane click on the Breakdown.
- Select
tenant.level
in the list. - Back in the Service Map click on gold (our most valuable user tier).
- Click on Breakdown and select
version
, this is the tag that exposes the service version. - Repeat this for silver and bronze.
What can you conclude from what you are seeing?
Every tenant.level
is being impacted by v350.10
You will now see the wire-transfer-service broken down into three services, gold, silver and bronze. Each tenant is broken down into two services, one for each version (v350.10
and v350.9
).

Span Tags
Using span tags to break down services is a very powerful feature. It allows you to see how your services are performing for different customers, different versions, different regions, etc. In this exercise, we have determined that v350.10
of the wire-transfer-service is causing problems for all our customers.
Next, we need to drill down into a trace to see what is going on.
5. APM Trace Analyzer
As Splunk APM provides a NoSample end-to-end visibility of every service Splunk APM captures every trace. For this workshop, the wire transfer orderId is available as a tag. This means that we can use this to search for the exact trace of the poor user experience encountered by users.
Trace Analyzer
Splunk Observability Cloud provides several tools for exploring application monitoring data. Trace Analyzer is suited to scenarios where you have high-cardinality, high-granularity searches and explorations to research unknown or new issues.
Exercise
- With the outer box of the wire-transfer-service selected, in the right-hand pane, click on Traces.
- Set Time Range to Last 15 minutes.
- Ensure the Sample Ratio is set to
1:1
and not 1:10
.

The Trace & error count view shows the total traces and traces with errors in a stacked bar chart. You can use your mouse to select a specific period within the available time frame.
Exercise
- Click on the dropdown menu that says Trace & error count, and change it to Trace duration

The Trace Duration view shows a heatmap of traces by duration. The heatmap represents 3 dimensions of data:
- Time on the x-axis
- Trace duration on the y-axis
- The traces (or requests) per second are represented by the heatmap shades
You can use your mouse to select an area on the heatmap, to focus on a specific time period and trace duration range.
Exercise
- Switch from Trace duration back to Trace & Error count.
- In the time picker select Last 1 hour.
- Note, that most of our traces have errors (red) and there are only a limited amount of traces that are error-free (blue).
- Make sure the Sample Ratio is set to
1:1
and not 1:10
. - Click on Add filters, type in
orderId
and select orderId from the list. - Find and select the orderId provided by your workshop leader and hit enter.

We have now filtered down to the exact trace where users reported a poor experience with a very long processing wait.
A secondary benefit to viewing this trace is that the trace will be accessible for up to 13 months. This will allow developers to come back to this issue at a later stage and still view this trace for example.
Exercise
- Click on the trace in the list.
Next, we will walk through the trace waterfall.
6. APM Waterfall
We have arrived at the Trace Waterfall from the Trace Analyzer. A trace is a collection of spans that share the same trace ID, representing a unique transaction handled by your application and its constituent services.
Each span in Splunk APM captures a single operation. Splunk APM considers a span to be an error span if the operation that the span captures results in an error.

Exercise
- Click on the ! next to any of the
wire-transfer-service
spans in the waterfall.
What is the error message and version being reported in the Span Details?
Invalid request
and v350.10
.
Now that we have identified the version of the wire-transfer-service that is causing the issue, let’s see if we can find out more information about the error. This is where Related Logs come in.
Related Content relies on specific metadata that allow APM, Infrastructure Monitoring, and Log Observer to pass filters around Observability Cloud. For related logs to work, you need to have the following metadata in your logs:
service.name
deployment.environment
host.name
trace_id
span_id
Exercise
- At the very bottom of the Trace Waterfall click on Logs (1). This highlights that there are Related Logs for this trace.
- Click on the Logs for trace xxx entry in the pop-up, this will open the logs for the complete trace in Log Observer.

Next, let’s find out more about the error in the logs.