Monitoring Agentic AI Applications with Splunk Observability Cloud
2 minutesAuthor
Derek Mitchell
Splunk Observability for AI monitors the performance, quality, security,
and cost of AI application stack. It includes the following:
AI Agent Monitoring, which monitors the performance, quality, security, and cost of LLM and agentic applications.
AI Infrastructure Monitoring, which monitors the health, availability, and consumption (or usage) of AI infrastructure.
This workshop provides hands-on experience deploying and working with these capabilities
in Splunk Observability Cloud. This includes:
Understanding how to connect an Azure account to Splunk Observability Cloud to capture AI infrastructure-related metrics.
Exploring out-of-the box dashboards and navigators related to AI infrastructure.
Reviewing the architecture of an Agentic AI application built with LangChain and LangGraph.
Practice deploying an Agentic AI application and instrumenting it with OpenTelemetry.
Exploring how metrics, traces, and logs can be used in Splunk Observability Cloud to understand agent performance.
Practice modifying an Agentic AI application to use tool calls and agents.
Practice adding quality issues to an application and detecting them with Splunk Observability Cloud using semantic quality evals.
Practice adding AI Defense instrumentation to the application and security risks, and detecting them with Splunk Observability Cloud.
Tip
The easiest way to navigate through this workshop is by using:
the left/right arrows (< | >) on the top right of this page
the left (âď¸) and right (âśď¸) cursor keys on your keyboard
Subsections of Monitoring Agentic AI Applications with Splunk Observability Cloud
Connect to EC2 Instance
5 minutes
Connect to your EC2 Instance
Weâve prepared an Ubuntu Linux instance in AWS/EC2 for each attendee:
Access the Splunk Show event by clicking on the link for your region
Click Enroll on the top-right corner
Then look near the bottom of the page for your EC2 instance details
You should see connection information such as the following:
Using the IP address (which is part of the SSH Command)
and SSH Password provided as part of the Connection Information,
connect to your EC2 instance using one of the methods below:
Mac OS / Linux
ssh splunk@IP address
Windows 10+
Use the OpenSSH client
Earlier versions of Windows
Use Putty
VPN Connection
If you’re working from an office and having trouble connecting, try connecting
to your corporate VPN first.
Retrieve your Instance Name
Once you’ve logged into your EC2 instance via ssh, use the following command to
get your instance name:
echo$INSTANCE
Make a note of this, as your instance name is unique to you and will be
used later in the workshop to find your data in Splunk Observability Cloud.
Connect Visual Studio Code (Optional)
We’ll be editing several files throughout the workshop. The workshop instructions
include tip for doing this using a vi editor, and workshop participants can
use the nano editor as well.
If you prefer a full-fledged IDE, you can connect Visual Studio Code running
on your laptop to edit remote files on the EC2 instance.
The high-level steps to do this are as follows:
Download and install VS code on your machine using this link.
In VS Code, navigate to Settings and then Extensions.
Search for the Remote â SSH extension (by Microsoft) and install it.
Press F1 (or Ctrl+Shift+P on Windows / Cmd+Shift+P on Mac OS).
Run Remote-SSH: Connect to Host.
Copy your SSH command from Splunk Show: ssh -p 2222 splunk@EC2_PUBLIC_IP.
Choose the default SSH config file when prompted.
Press F1 (or Ctrl+Shift+P on Windows / Cmd+Shift+P on Mac OS) again.
Run Remote-SSH: Connect to Host.
Select the host you just added. VS Code will open a new window and start the connection.
A prompt will appear at the top of VS Code asking for the SSH password. Copy the password from Splunk Show and enter it here.
Click Open Folder then input /home/splunk/workshop/agentic-ai as the folder name:
You can now files remotely with VS Code!
Review Azure OpenAI Metrics, Dashboards, and Navigators
10 minutes
This workshop will use OpenAI models running in Azure.
You can monitor the performance of Azure OpenAI applications by configuring your
Azure OpenAI applications to send metrics to Splunk Observability Cloud.
We’ve already integrated our Azure account with the workshop instance
of Splunk Observability Cloud using the steps described in the
documentation.
To ensure Azure OpenAI metrics are included, the connection was
configured to pull metrics from Cognitive Services:
Azure OpenAI Metrics
A number of metrics are captured for Azure OpenAI:
ProcessedPromptTokens
GeneratedTokens
AzureOpenAIRequests
AzureOpenAITimeToResponse
AzureOpenAIAvailabilityRate
AzureOpenAITokenPerSecond
AzureOpenAIContextTokensCacheMatchRate
Navigate to Metrics -> Metric finder, and then search for the
ProcessedPromptTokens metric and click View in chart:
Note: you can also use this link
to view this metric with the Metric finder.
Azure OpenAI Navigator
Splunk Observability Cloud collects OpenTelemetry generative AI client and model server metrics
to track the token usage and Open AI large-language model (LLM) services running in Azure.
You can view these metrics using the Azure OpenAI navigator. Navigate to Infrastructure ->
Overview -> AI Frameworks and then click on Azure OpenAI:
Azure OpenAI Dashboard
Splunk Observability Cloud provides a built-in dashboard for Azure OpenAI that gives
you immediate visibility into:
The active Azure OpenAI models
Token usage
Invocation latency
Invocations by model
Time to first byte
Total response time
Model availability
Number of tokens per request
Number of tokens processed by model
Number of tokens generated by model
Navigate to Dashboards and then search for Azure OpenAI to view
the dashboard:
Deploy the OpenTelemetry Collector
10 minutes
We’ll be using OpenTelemetry throughout the workshop to capture metrics, traces, and
logs from an Agentic AI application running in Kubernetes. In this section, we’ll
install an OpenTelemetry collector in our Kubernetes cluster using Helm. This will be
used to capture metrics, traces, and logs from our environment and send them to
Splunk.
To save your changes in vi, press the esc key to enter command mode, then type :wq! followed by pressing the enter/return key.
This custom configuration ensures that any histogram metrics received by the exporter
will be sent to Splunk Observability backend in OTLP format without conversion
to SignalFx format. This setting is critical to ensure that histogram metrics used
by AI Agent Monitoring such as gen_ai.evaluation.score are processed as expected.
Now we can use the following command to install the collector:
NAME: splunk-otel-collector
LAST DEPLOYED: Fri Dec 20 01:01:43 2024NAMESPACE: default
STATUS: deployed
REVISION: 1TEST SUITE: None
NOTES:
Splunk OpenTelemetry Collector is installed and configured to send data to Splunk Observability realm us1.
Confirm the Collector is Running
We can confirm whether the collector is running with the following command:
kubectl get pods
NAME READY STATUS RESTARTS AGE
splunk-otel-collector-agent-dkn88 1/1 Running 0 53s
splunk-otel-collector-agent-ksmh4 1/1 Running 0 53s
splunk-otel-collector-agent-lc2lf 1/1 Running 0 53s
splunk-otel-collector-k8s-cluster-receiver-dbf64995b-xgm9b 1/1 Running 0 53s
Confirm your K8s Cluster is in O11y Cloud
Using the New Kubernetes Experience
If you’re configured to use the new Kubernetes experience in O11y Cloud, follow the steps in
this section. Otherwise, refer to the Using the Traditional Kubernetes Experience section
instead.
In Splunk Observability Cloud, navigate to Infrastructure -> Kubernetes overview,
then add your cluster name (which is <your instance name>-cluster):
Tip: use the echo $INSTANCE command if you’ve forgotten your instance name
After clicking Apply Filters you should see an overview for your cluster
similar to the following:
Using the Traditional Kubernetes Experience
In Splunk Observability Cloud, navigate to Infrastructure -> Kubernetes -> Kubernetes Clusters,
and then search for your cluster name (which is <your instance name>-cluster):
Tip: use the echo $INSTANCE command if you’ve forgotten your instance name
Agentic AI Application Architecture
15 minutes
Application Overview
This workshop utilizes an Agentic AI application for booking travel.
In this section, we’ll walk through the application architecture and
highlight the key LangChain and LangGraph concepts it uses.
LangChain vs. LangGraph
LangChain provides the core building blocks for working with large language models,
such as prompts, tools, and model integrations. LangGraph builds on those concepts
to orchestrate complex, stateful workflows between those components. In simple terms,
LangChain helps you define what an LLM-powered step does, while LangGraph helps
control how those steps flow together in an agentic application.
Although the primary goal of the workshop is to instrument the application with OpenTelemetry,
having a basic understanding of how the application is structured will make the observability
work much clearer. Seeing how the agents, tools, and workflows are built will help you
recognize what the telemetry represents once we begin tracing and analyzing the system.
If youâd like to explore the implementation while we go through the architecture,
the application source code is available on your EC2 instance at:
~/workshop/agentic-ai/base-app/main.py.
The application is a Flask API that accepts a travel planning request and runs it through
a LangGraph workflow made up of several LangChain-powered LLM nodes. Each node plays a specific
role, updates shared state, and hands off to the next step.
In this part of the workshop, we will review:
the request lifecycle
the shared state model
how LangGraph nodes work
the LangChain abstractions used in the code
where observability will matter later
Navigate to the subsections to learn more about the application architecture and implementation.
Subsections of 4. Agentic AI Application Architecture
4.1 Request Lifecycle
What the application does
At a high level, the application accepts a request and turns it into a multi-step workflow:
coordinator
flight specialist
hotel specialist
activity specialist
synthesizer
The main flow looks like this:
@app.route("/travel/plan",methods=["POST"])defplan():data=request.get_json()origin=data.get("origin","Seattle")destination=data.get("destination","Paris")user_request=data.get("user_request",f"Planning a week-long trip from {origin} to {destination}. ""Looking for boutique hotel, flights and unique experiences.",)travellers=int(data.get("travellers",2))result=plan_travel_internal(origin=origin,destination=destination,user_request=user_request,travellers=travellers)returnjsonify(result),200
A helpful way to explain this is:
Flask receives the request
plan_travel_internal() builds the workflow state
LangGraph executes the nodes
each node updates the state
the final itinerary is returned as JSON
Knowledge Check
Where does the LangGraph workflow actually start executing in this API flow?
Click here to see the answer
It starts inside plan_travel_internal(). The Flask route only receives
the request and extracts parameters. plan_travel_internal() initializes
the workflow state and invokes the LangGraph graph, which then runs the nodes
(coordinator, specialists, synthesizer) that update the state until
the final itinerary is produced.
4.2 Shared State
Shared State in LangGraph
The most important LangGraph concept in this app is the shared state object:
This state moves through the graph from node to node.
Each node:
reads values from state
does some work
writes new values back to state
sets current_agent to control what happens next
This is a key LangGraph mental model: stateful workflow orchestration.
Knowledge Check
How would you explain the syntax used for the messages field?
messages:Annotated[List[AnyMessage],add_messages]
Click here to see the answer
messages: Annotated[List[AnyMessage], add_messages] does two things.
List[AnyMessage] defines the type of the field: itâs a list of LangChain message objects (system, human, or AI messages).
Annotated[..., add_messages] adds LangGraph behavior that tells the graph how updates to this field should be handled.
Specifically, add_messages means that when a node writes new messages, LangGraph will append them to the existing list instead of overwriting it.
So the conversation history grows as each node adds messages.
4.3 Orchestration
Where execution begins
The main orchestration happens in plan_travel_internal():
This function implements the following application lifecycle:
build initial state
build the graph
compile it
stream execution step by step
Knowledge Check
Question 1
Why does the code use compiled_app.stream(initial_state, config) instead of
simply calling the graph once and getting the final result?
Click here to see the answer
Because streaming executes the workflow step by step as each node runs. This lets
the application observe intermediate states, track which node is executing,
and monitor the workflow in real time instead of waiting only for the final output.
Question 2
Why do we create an initial_state before running the graph?
Click here to see the answer
Because LangGraph workflows operate on a shared state object. The initial_state
provides the starting data that nodes will read from, update, and pass along as
the workflow progresses.
4.4 Defining the Graph
How the graph is defined
The graph is built explicitly in build_workflow():
Even though this uses conditional edges, the workflow is effectively linear:
start
coordinator
flight specialist
hotel specialist
activity specialist
synthesizer
end
Knowledge Check
If the workflow is effectively linear, why does the graph still use
add_conditional_edges and the should_continue() router?
Click here to see the answer
Because it makes the workflow flexible and extensible. Even though the current flow
is linear, the routing function allows the graph to dynamically decide the next node
based on the state. This makes it easy to add branching, retries, or different
execution paths later without redesigning the graph.
4.5 Defining Nodes
How a node works
A LangGraph node in this app is just a Python function that accepts state and returns updated state.
For example, the flight specialist:
defflight_specialist_node(state:PlannerState)->PlannerState:llm=_create_llm("flight_specialist",temperature=0.4,session_id=state["session_id"])step=(f"Find an appealing flight from {state['origin']} to {state['destination']} "f"departing {state['departure']} for {state['travellers']} travellers.")messages=[SystemMessage(content="You are a flight booking specialist. Provide concise options."),HumanMessage(content=step),]result=llm.invoke(messages)state["flight_summary"]=result.contentstate["messages"].append(result)state["current_agent"]="hotel_specialist"returnstate
This exhibits the common node pattern:
create or access an LLM
build a prompt from structured state
invoke the model
save the result into state
set the next node
The hotel and activity nodes follow the same structure, which makes the workflow easy to explain.
Knowledge Check
When creating the LLM for the flight_specialist node, we specified
a temperature of 0.4. What does this mean?
Click here to see the answer
Temperature controls how random or creative the modelâs responses are.
Lower temperature (e.g., 0.0â0.3): more deterministic and consistent responses
Medium (around 0.4â0.7): balanced between accuracy and creativity
Higher (0.8+): more diverse and creative, but less predictable
So setting temperature=0.4 means the flight_specialist agent will produce
responses that are mostly consistent and reliable, with a small amount of
variation, which useful for tasks that need correctness but not completely rigid answers.
4.6 Message Abstractions
LangChain Message Abstractions
The application uses LangChain message abstractions rather than one long prompt string.
messages=[SystemMessage(content="You are a flight booking specialist. Provide concise options."),HumanMessage(content=step),]result=llm.invoke(messages)
Knowledge Check
How would you define system, human, and AI messages?
Click here to see the answer
In LangChain and LangGraph, messages are typically categorized by who is speaking and what role they play in guiding the conversation:
System message: Sets the rules and context for the AIâs behavior. It defines instructions, constraints, tone, and goals that guide how the model should respond throughout the interaction.
Human message: Input from the user. It contains questions, requests, or information that the AI should respond to.
AI message: The modelâs response. It represents the assistantâs generated output based on the system instructions and human input.
4.7 LLM Creation
LLM Creation
The LLM itself is created here:
def_create_llm(agent_name:str,*,temperature:float,session_id:str)->AzureChatOpenAI:azure_deployment_name=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")azure_openai_api_version=os.getenv("AZURE_OPENAI_API_VERSION")returnAzureChatOpenAI(azure_deployment=azure_deployment_name,openai_api_version=azure_openai_api_version,temperature=temperature,model_name=azure_deployment_name,# AZURE_OPENAI_API_KEY and AZURE_OPENAI_ENDPOINT environment variables will be used to connect to the LLM)
This approach separates model configuration from workflow logic.
Different nodes can use different temperatures depending on how deterministic or
creative they should be.
Knowledge Check
How would you create an LLM for OpenAI (rather than Azure OpenAI?)
Click here to see the answer
Creating an LLM for OpenAI has a few differences. The function would return a ChatOpenAI
object instead of AzureChatOpenAI.
With OpenAI directly, you donât use the Azure-specific parameters (azure_deployment,
openai_api_version, Azure endpoint). Instead, you specify the model name and rely
on the standard OPENAI_API_KEY environment variable.
Here’s an example:
def_create_llm(agent_name:str,*,temperature:float,session_id:str)->ChatOpenAI:model_name=os.getenv("OPENAI_MODEL_NAME","gpt-4o-mini")returnChatOpenAI(model=model_name,temperature=temperature,# Uses OPENAI_API_KEY automatically from environment)
4.8 Decomposition Pattern
The synthesizer shows the decomposition pattern
The final node combines the specialist outputs into one answer.
defplan_synthesizer_node(state:PlannerState)->PlannerState:llm=_create_llm("plan_synthesizer",temperature=0.3,session_id=state["session_id"])content=json.dumps({"flight":state["flight_summary"],"hotel":state["hotel_summary"],"activities":state["activities_summary"],},indent=2,)response=llm.invoke([SystemMessage(content="You are the travel plan synthesiser. Combine the specialist insights into a concise, structured itinerary."),HumanMessage(content=(f"Traveller request: {state['user_request']}\n\n"f"Origin: {state['origin']} | Destination: {state['destination']}\n"f"Dates: {state['departure']} to {state['return_date']}\n\n"f"Specialist summaries:\n{content}")),])state["final_itinerary"]=response.contentstate["messages"].append(response)state["current_agent"]="completed"returnstate
This is a classic pattern for agentic apps:
decompose work into specialists
collect intermediate outputs
synthesize into a final response
That is one of the main architectural ideas you should take away from this overview.
Knowledge Check
Why does the app use a separate plan_synthesizer node instead of letting
one agent generate the entire travel plan?
Click here to see the answer
Because the system breaks the problem into specialized tasks first (flights, hotels, activities).
Each specialist produces a focused summary, and the plan_synthesizer node then combines those
outputs into one coherent itinerary.
This pattern improves modularity, reliability, and observability, since each agent
handles a smaller problem and the final node integrates the results.
Deploy the Agentic AI Application
15 minutes
Deploy the Agentic AI Application (Linux)
We’ll start by running the application directly on our Linux EC2 instance.
Set Environment Variables
The document provided by the workshop instructor contains export commands to set the following
environment variables:
AZURE_OPENAI_DEPLOYMENT_NAME
AZURE_OPENAI_API_VERSION
AZURE_OPENAI_ENDPOINT
AZURE_OPENAI_API_KEY
These environment variables tell the application how to connect to an
OpenAI model hosted in Azure.
Copy and paste these export commands from the document and run them in your ssh terminal.
Create Virtual Environment
Next, we’ll create a Python virtual environment and install the packages needed to
run the application:
Then we can run the application with the following command:
python3 main.py
Test the Application
Open a second terminal session connected to your EC2 instance, and run the following
command to test the application. It should return the suggested travel plans in json
format:
curl http://localhost:8080/travel/plan \
-H "Content-Type: application/json"\
-d '{
"origin": "Seattle",
"destination": "Tokyo",
"user_request": "We are planning a week-long trip to Seattle from Tokyo. Looking for boutique hotel, business-class flights and unique experiences.",
"travelers": 2
}'
{"activities_summary":"Sure! Here are signature activities for a week in Tokyo:\n\n1. Day 1: Explore Asakusa and Senso-ji Temple, then stroll Nakamise Shopping Street.\n2. Day 2: Visit Tsukiji Outer Market for fresh sushi breakfast, then tour Ginza for upscale shopping.\n3. Day 3: Spend the day in Shibuya\u2014cross the famous scramble, visit Hachiko statue, and shop in trendy boutiques.\n4. Day 4: Explore Harajuku\u2019s Takeshita Street and Meiji Shrine, followed by Omotesando\u2019s stylish cafes.\n5. Day 5: Discover Akihabara\u2019s electronics and anime culture, with a visit to a themed caf\u00e9.\n6. Day 6: Take a day trip to Odaiba for teamLab Borderless digital art museum and waterfront views.\n7. Day 7: Relax in Ueno Park, visit museums, and shop at Ameya-Yokocho market.\n\nWould you like hotel or dining recommendations as well?","agent_steps":[{"agent":"coordinator","status":"completed"},{"agent":"flight_specialist","status":"completed"},{"agent":"hotel_specialist","status":"completed"}
Stop the Application
Once you’ve confirmed that the application is working successfully, return to your
first terminal and stop the application.
Deploy the Agentic AI Application (Kubernetes)
Now that the application is working successfully, let’s deploy it to Kubernetes.
Build the Docker Image
In this section, we’ll use the Dockerfile located at ~/workshop/agentic-ai/base-app/Dockerfile
to build a Docker image for the application. Run the following commands to build the image:
Tip: if the image is taking too long to build, consider using the pre-built
image instead. To do so, update the image name in
the ~/workshop/agentic-ai/base-app/k8s.yaml file to ghcr.io/splunk/agentic-ai-app:base-app
instead of localhost:9999/agentic-ai-app:base-app.
Create Application Namespace
Let’s create a new namespace to host our application:
kubectl create ns travel-agent
Create Secret with Azure Credentials
We’ll use a Kubernetes secret to store the Azure OpenAI endpoint and key:
Caution: ensure you run this command in the terminal where you set
the AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY environment
variables earlier.
Note: if you get an error that says Missing variables, youâll need to
define your environment variables again using the export commands
provided in the document from your instructor.
Deploy the Application Using the Kubernetes Manifest File
A pre-built Kubernetes manifest can be found in the file named
~/workshop/agentic-ai/base-app/k8s.yaml.
We can deploy the application using the manifest file as follows:
Use the following command to ensure the application pod has a
status of Running:
kubectl get pods -n travel-agent
NAME READY STATUS RESTARTS AGE
travel-planner-langchain-68977dc5c4-4w7p9 1/1 Running 0 41s
Test the Application in Kubernetes
Run the following command to test the application:
curl http://travel-planner.localhost/travel/plan \
-H "Content-Type: application/json"\
-d '{
"origin": "Seattle",
"destination": "Tokyo",
"user_request": "We are planning a week-long trip to Seattle from Tokyo. Looking for boutique hotel, business-class flights and unique experiences.",
"travelers": 2
}'
{"activities_summary":"Sure! Here are signature activities for a week in Tokyo:\n\n1. Day 1: Explore Asakusa and Senso-ji Temple, then stroll Nakamise Shopping Street.\n2. Day 2: Visit Tsukiji Outer Market for fresh sushi breakfast, then tour Ginza for upscale shopping.\n3. Day 3: Spend the day in Shibuya\u2014cross the famous scramble, visit Hachiko statue, and shop in trendy boutiques.\n4. Day 4: Explore Harajuku\u2019s Takeshita Street and Meiji Shrine, followed by Omotesando\u2019s stylish cafes.\n5. Day 5: Discover Akihabara\u2019s electronics and anime culture, with a visit to a themed caf\u00e9.\n6. Day 6: Take a day trip to Odaiba for teamLab Borderless digital art museum and waterfront views.\n7. Day 7: Relax in Ueno Park, visit museums, and shop at Ameya-Yokocho market.\n\nWould you like hotel or dining recommendations as well?","agent_steps":[{"agent":"coordinator","status":"completed"},{"agent":"flight_specialist","status":"completed"},{"agent":"hotel_specialist","status":"completed"}
Troubleshooting
If you need to troubleshoot, use the following command to view the application logs:
Note: this section of the workshop requires changes to multiple files.
If you’re not sure where to make the changes, or your application is no
longer working, please refer to the expected solution for this section
which is in the ~/workshop/agentic-ai/app-with-instrumentation folder.
There are a few steps required to instrument our Agentic AI application
with OpenTelemetry and deploy it to Kubernetes:
Add the instrumentation packages to the requirements.txt file
Update the Dockerfile that invokes the application using opentelemetry-instrument
Build a new Docker image with the instrumentation packages
Update the Kubernetes manifest with environment variables
Deploy the Kubernetes manifest
Add Instrumentation Packages
Next, we need to install several instrumentation packages. We can achieve this by
opening the ~/workshop/agentic-ai/base-app/requirements.txt for editing and adding
the following packages to the bottom of the file:
splunk-opentelemetry: this is the Splunk distribution of OpenTelemetry Python, which instruments a Python application to capture and report distributed traces to Splunk APM.
splunk-otel-instrumentation-langchain: this package provides OpenTelemetry instrumentation for LangChain LLM/chat workflows.
splunk-otel-genai-emitters-splunk: this package provides emitters for Splunk schema for Evaluation Results logs to optimize storage and filtering in Splunk Platform.
splunk-otel-util-genai: this package includes utility functions to provide APIs and data types to ease instrumentation of Generative AI workloads using OpenTelemetry semantic conventions.
opentelemetry-instrumentation-flask: this library builds on the OpenTelemetry WSGI middleware to track web requests in Flask applications.
Hint: run the following command to compare your changes with the expected solution:
Then, we need to enable OpenTelemetry instrumentation. This is done by updating the Dockerfile to
ensure the application is started with opentelemetry-instrument. Open the ~/workshop/agentic-ai/base-app/Dockerfile
file for editing and update the last line as follows:
# Run the server with instrumentationCMD["opentelemetry-instrument","python","main.py"]
Hint: run the following command to compare your changes with the expected solution:
Tip: if the image is taking too long to build, consider using the pre-built
image instead. To do so, update the image name in
the ~/workshop/agentic-ai/base-app/k8s.yaml file to ghcr.io/splunk/agentic-ai-app:app-with-instrumentation
instead of localhost:9999/agentic-ai-app:app-with-instrumentation.
Define the Config Map
When we deploy our application to Kubernetes, we want telemetry (metrics, traces, and logs)
to be sent to Splunk Observability Cloud with a clear and unique environment identifier.
This makes it easier to filter, compare, and troubleshoot data across different deployments.
To do this, weâll set the OpenTelemetry resource attribute named deployment.environment. Rather
than hard-coding the value, weâll derive it from the INSTANCE environment variable that
already exists on our EC2 instance. This ensures each deployment is automatically tagged
with the correct environment name.
Weâll store this configuration in a Kubernetes ConfigMap, which can later be injected into
our application pods as an environment variable.
Defines the OTEL_RESOURCE_ATTRIBUTES environment variable expected by OpenTelemetry.
Sets deployment.environment to a value like agentic-ai-shw-1c43, depending on the value of $INSTANCE.
Creates the ConfigMap in the travel-agent namespace.
Weâll reference this ConfigMap in the next step when we configure our Kubernetes deployment.
Update the Kubernetes Manifest
OpenTelemetry instrumentation, and AI Agent Monitoring in particular, require a number of environment
variables to be set that define how instrumentation data is collected, processed, and
exported.
Open the ~/workshop/agentic-ai/base-app/k8s.yaml file for editing. Update the image
tag to ensure we’re using the image with the instrumentation:
In the same file, add the following environment variables between the comments that say
Begin: Add Environment Variables and End: Add Environment Variables:
Hint: Type :set paste before pasting the contents, to prevent vi from auto-indenting the pasted code.
Note: some of the text may not be visible without scrolling.
Use the Copy text to clipboard button on the top right-hand corner to
ensure you’ve copied all of the text.
Note: indentation is critical with yaml; ensure the new environment variables
align with the existing environment variables.
The following environment variables are specific to Agentic AI monitoring
and can be described as follows:
OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE: this determines if the OTLP metric exporter reports cumulative totals, deltas, or low-memory-friendly temporality for emitted metrics. Setting this to DELTA is recommended for Agentic AI monitoring.
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT: this is used to enable/disable message capture from Agentic AI applications. We’ve set it to true for this workshop.
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT_MODE: this defines how messages should be captured. We’ve set it to SPAN for this workshop, which ensures messages are captured using the span event store.
OTEL_INSTRUMENTATION_GENAI_EMITTERS: we’ve set this to span_metric,splunk for the workshop, which ensures that both span and metric data are captured, as well as Splunk-specific features.
Hint: run the following command to compare your changes with the expected solution:
Splunk Observability Cloud includes an integration that allows you to connect
a Large Language Model (LLM). Splunk uses this connection to evaluate the
semantic quality of LLM responses generated by your applications.
This integration has already been configured in the workshop organization.
To view the configuration, navigate to Data Management â Deployed Integrations,
search for LLM Providers, and select it. You should see the following provider:
Click on the Azure OpenAI O11y Specialists provider to view the details:
In this organization, the sampling rate is set to 20%. This means that,
on average, Splunk evaluates the semantic quality of 20% of the LLM responses
generated by the application.
A rate limit of 50 evaluations per minute is also configured. Both the sampling
rate and the rate limit can be adjusted depending on customer needs. Higher
sampling rates provide more evaluation data, but they also increase
token usage and associated costs.
Review AI Agent Monitoring Configuration
Splunk Observability Cloud also includes a page that allows you to configure
which data source is used for storing details related to AI Agent Monitoring.
The choices include:
Data source â Splunk Observability Cloud
Data source â Splunk logs
You can see these settings by navigating to Settings -> AI Agent Monitoring:
Splunk recommends utilizing Splunk Observability Cloud for storing
AI Agent Monitoring related details. This is the setting we’ve used for this workshop.
Review AI Monitoring Permissions
Due to the potentially sensitive nature of LLM conversation data, a new role called ai_monitoring
has been added to Splunk Observability Cloud to control who can access and view this information:
View Trace Data in Splunk Observability Cloud
In Splunk Observability Cloud, navigate to APM and then select Service Map.
Ensure your environment name is selected (e.g. agentic-ai-$INSTANCE).
Tip: use the echo $INSTANCE command if you’ve forgotten your instance name
You should see a service map that looks like the following:
Click on Traces on the right-hand side menu. Then select one of the slower running
traces. It should look like the following example:
Notice that we don’t see our agent names in the Agent flow section (i.e. coordinator,
flight-specialist, etc.).
Scrolling down, let’s click on one of the AI interactions
in the trace. Here, we can see that the prompt and response has been captured.
We can also see the results of the semantic quality evaluations for this trace:
Next, navigate to APM and then select AI agents. Ensure your environment name
is selected (e.g. agentic-ai-$INSTANCE). You’ll notice that the page is empty!
We’ll address these instrumentation issues in the next section.
Add Tool Calls
15 minutes
Note: this section of the workshop requires changes to multiple files.
If you’re not sure where to make the changes, or your application is no
longer working, please refer to the model solution for this section
which is in the ~/workshop/agentic-ai/app-with-agents-and-tools folder.
In the previous section, we discovered that our agents aren’t appearing on the new
Agents page, nor in the Agent flow at the top of the trace.
The reason is that our application isn’t currently using agents, but is instead invoking
the LLM directly.
In other words, right now, our app is like a scripted play. Every line and every action is written
in the code. When we call the LLM, we are just asking it to read a specific line.
Because the LLM isn’t making choices, the Observability for AI instrumentation doesn’t
recognize it as an autonomous agent.
In this next section, we are going to give the LLM tools and the authority
to decide how to use them. By moving to an agentic model, the LLM will start
generating Tool Calls. Our OpenTelemetry instrumentation will capture these
interactions, allowing us to see the LLM’s thought process and
tool usage, and each of our agents will be represented in Splunk Observability Cloud.
Direct Invocation vs. Agentic Traces
Before making these changes, let’s dive deeper into how traces are captured
when the LLM is invoked directly vs. via an agent.
Direct Invocation Traces:
When you call llm.invoke(), the instrumentation sees a standard “Chat” or “Completion” span.
It records the prompt and the response. Because there is no “loop” or “tool-calling” logic
managed by the agent framework, Splunk Observability Cloud doesn’t see the metadata required
to categorize the span as an “Agent.”
Agentic Traces:
When you use an agent (e.g., create_react_agent),
the framework wraps the execution in specific “Agent” and “Tool” spans. These
spans contain metadata that tells OpenTelemetry: “This isn’t
just a chat; this is a reasoning loop with specific tools.” This is what
populates the Agents Page and the Agent Flow diagrams in the trace visualization.
Make a Backup
Before making changes to the Python code, make a backup of the main.py file
using the following command:
In the same main.py file, add the tool definitions between the lines that
say Begin: Tool Definitions and End: Tool Definitions:
# Begin: Tool Definitions@tooldefmock_search_flights(origin:str,destination:str,departure:str)->str:"""Return mock flight options for a given origin/destination pair."""# create a local random.Random instanceseed=hash((origin,destination,departure))%(2**32)rng=random.Random(seed)airline=rng.choice(["SkyLine","AeroJet","CloudNine"])fare=rng.randint(700,1250)return(f"Top choice: {airline} non-stop service {origin}->{destination}, "f"depart {departure} 09:15, arrive {departure} 17:05. "f"Premium economy fare ${fare} return.")@tooldefmock_search_hotels(destination:str,check_in:str,check_out:str)->str:"""Return mock hotel recommendation for the stay."""seed=hash((destination,check_in,check_out))%(2**32)rng=random.Random(seed)name=rng.choice(["Grand Meridian","Hotel Lumière","The Atlas"])rate=rng.randint(240,410)return(f"{name} near the historic centre. Boutique suites, rooftop bar, "f"average nightly rate ${rate} including breakfast.")@tooldefmock_search_activities(destination:str)->str:"""Return a short list of signature activities for the destination."""data=DESTINATIONS.get(destination.lower(),DESTINATIONS["paris"])bullets="\n".join(f"- {item}"foritemindata["highlights"])returnf"Signature experiences in {destination.title()}:\n{bullets}"# End: Tool Definitions
Configure the Application for AI Agent Monitoring
Currently, our application creates an LLM and invokes it as follows:
“Agents combine language models with tools to create systems that
can reason about tasks, decide which tools to use, and iteratively
work towards solutions.”
In practice, this means the model is no longer limited to generating text. Instead,
it can choose from a set of available tools (such as APIs, databases, or code execution)
to help complete a task.
This style of agent is often called a LangChain ReAct agent.
ReAct stands for Reasoning + Acting. The agent works through a loop where it:
briefly reasons about the task,
selects and calls a relevant tool,
observes the result, and
uses that new information to decide the next step.
This process repeats until the agent has gathered enough information to produce a final answer.
Replace the definitions for the coordinator_node, flight_specialist_node, hotel_specialist_node,
activity_specialist_node, and plan_synthesizer_node functions with the following:
Tip: to delete a large number of lines in bulk using the vi editor, press Shift + v to ensure Visual Line mode, then use the down arrow to select all the lines you want to delete, then press d
to delete the selected lines.
defcoordinator_node(state:PlannerState)->PlannerState:llm=_create_llm("coordinator",temperature=0.2,session_id=state["session_id"])agent=_create_react_agent(llm,tools=[]).with_config({"run_name":"coordinator","tags":["agent","agent:coordinator"],"metadata":{"agent_name":"coordinator","session_id":state["session_id"],},})system_message=SystemMessage(content=("You are the lead travel coordinator. Extract the key details from the ""traveller's request and describe the plan for the specialist agents."))result=agent.invoke({"messages":[system_message]+list(state["messages"])})final_message=result["messages"][-1]state["messages"].append(final_messageifisinstance(final_message,BaseMessage)elseAIMessage(content=str(final_message)))state["current_agent"]="flight_specialist"returnstatedefflight_specialist_node(state:PlannerState)->PlannerState:llm=_create_llm("flight_specialist",temperature=0.4,session_id=state["session_id"])agent=_create_react_agent(llm,tools=[mock_search_flights]).with_config({"run_name":"flight_specialist","tags":["agent","agent:flight_specialist"],"metadata":{"agent_name":"flight_specialist","session_id":state["session_id"],},})step=(f"Find an appealing flight from {state['origin']} to {state['destination']} "f"departing {state['departure']} for {state['travellers']} travellers.")# IMPORTANT: pass a proper list of messages (not stringified)messages=[SystemMessage(content="You are a flight booking specialist. Provide concise options."),HumanMessage(content=step),]result=agent.invoke({"messages":messages})final_message=result["messages"][-1]state["flight_summary"]=final_message.contentifisinstance(final_message,BaseMessage)elsestr(final_message)state["messages"].append(final_messageifisinstance(final_message,BaseMessage)elseAIMessage(content=str(final_message)))state["current_agent"]="hotel_specialist"returnstatedefhotel_specialist_node(state:PlannerState)->PlannerState:llm=_create_llm("hotel_specialist",temperature=0.5,session_id=state["session_id"])agent=_create_react_agent(llm,tools=[mock_search_hotels]).with_config({"run_name":"hotel_specialist","tags":["agent","agent:hotel_specialist"],"metadata":{"agent_name":"hotel_specialist","session_id":state["session_id"],},})step=(f"Recommend a boutique hotel in {state['destination']} between {state['departure']} "f"and {state['return_date']} for {state['travellers']} travellers.")# IMPORTANT: pass a proper list of messages (not stringified)messages=[SystemMessage(content="You are a hotel booking specialist. Provide concise options."),HumanMessage(content=step),]result=agent.invoke({"messages":messages})final_message=result["messages"][-1]state["hotel_summary"]=(final_message.contentifisinstance(final_message,BaseMessage)elsestr(final_message))state["messages"].append(final_messageifisinstance(final_message,BaseMessage)elseAIMessage(content=str(final_message)))state["current_agent"]="activity_specialist"returnstatedefactivity_specialist_node(state:PlannerState)->PlannerState:llm=_create_llm("activity_specialist",temperature=0.6,session_id=state["session_id"])agent=_create_react_agent(llm,tools=[mock_search_activities]).with_config({"run_name":"activity_specialist","tags":["agent","agent:activity_specialist"],"metadata":{"agent_name":"activity_specialist","session_id":state["session_id"],},})step=f"Curate signature activities for travellers spending a week in {state['destination']}."# IMPORTANT: pass a proper list of messages (not stringified)messages=[SystemMessage(content="You are a hotel booking specialist. Provide concise options."),HumanMessage(content=step),]result=agent.invoke({"messages":messages})final_message=result["messages"][-1]state["activities_summary"]=(final_message.contentifisinstance(final_message,BaseMessage)elsestr(final_message))state["messages"].append(final_messageifisinstance(final_message,BaseMessage)elseAIMessage(content=str(final_message)))state["current_agent"]="plan_synthesizer"returnstatedefplan_synthesizer_node(state:PlannerState)->PlannerState:llm=_create_llm("plan_synthesizer",temperature=0.3,session_id=state["session_id"])agent=_create_react_agent(llm,tools=[]).with_config({"run_name":"plan_synthesizer","tags":["agent","agent:plan_synthesizer"],"metadata":{"agent_name":"plan_synthesizer","session_id":state["session_id"],},})system_content=("You are the travel plan synthesiser. Combine the specialist insights into a ""concise, structured itinerary covering flights, accommodation and activities.")content=json.dumps({"flight":state["flight_summary"],"hotel":state["hotel_summary"],"activities":state["activities_summary"],},indent=2,)out=agent.invoke({"messages":[SystemMessage(content=system_content),HumanMessage(content=(f"Traveller request: {state['user_request']}\n\n"f"Origin: {state['origin']} | Destination: {state['destination']}\n"f"Dates: {state['departure']} to {state['return_date']}\n\n"f"Specialist summaries:\n{content}")),]})# 1) Extract the assistantâs final textfinal_msg=next(mforminreversed(out["messages"])ifisinstance(m,AIMessage))state["final_itinerary"]=final_msg.content# 2) Append the new messages to your ongoing conversationstate["messages"].extend(out["messages"])# or append just final_msgstate["current_agent"]="completed"returnstate
Notice how we passed a tool when creating the flight, hotel, and activity specialist agents.
When the agent is invoked, the LLM will decide whether the tool should be invoked to fulfill
the request.
Hint: run the following command to compare your changes with the model solution:
Tip: if the image is taking too long to build, consider using the pre-built
image instead. To do so, update the image name in
the ~/workshop/agentic-ai/base-app/k8s.yaml file to ghcr.io/splunk/agentic-ai-app:app-with-agents-and-tools
instead of localhost:9999/agentic-ai-app:app-with-agents-and-tools.
Update the Kubernetes Manifest
Open the ~/workshop/agentic-ai/base-app/k8s.yaml file for editing and
update the image to ensure we’re using the one with the
agents and tools:
Ensure the new application pod has started successfully and the old pod is no longer present:
kubectl get pods -n travel-agent
NAME READY STATUS RESTARTS AGE
travel-planner-langchain-68977dc5c4-4w7p9 1/1 Running 0 41s
Then, run the following command to test the application:
curl http://travel-planner.localhost/travel/plan \
-H "Content-Type: application/json"\
-d '{
"origin": "Seattle",
"destination": "Tokyo",
"user_request": "We are planning a week-long trip to Seattle from Tokyo. Looking for boutique hotel, business-class flights and unique experiences.",
"travelers": 2
}'
View Data in Splunk Observability Cloud
Let’s return to Splunk Observability Cloud to see how the trace looks now.
Navigate to APM and then select AI agents. Ensure your environment name
is selected (e.g. agentic-ai-$INSTANCE). You’ll notice that the page
populated now!
Navigate to APM -> AI trace data. This is a new page that lets us search
for traces that include AI-related content:
Ensure your environment name is selected (e.g. agentic-ai-$INSTANCE). Select one of the newer traces. We see all of our agents represented in the Agent flow now!
We can also see the tool calls:
Detect Quality Issue
15 minutes
Note: this section of the workshop requires changes to multiple files.
If you’re not sure where to make the changes, or your application is no
longer working, please refer to the model solution for this section
which is in the ~/workshop/agentic-ai/app-with-quality-issue folder.
In the previous sections, we instrumented our application with OpenTelemetry, and configured
it to evaluate the semantic quality of agent responses.
In this section, let’s add some quality issues to our application, so we can see
how Splunk Observability Cloud is able to detect such issues.
About the Poisoned Chat Wrapper
In this section, we’ll use a custom class named PoisonedChatWrapper which wraps the existing
ChatModel to intercept and ‘poison’ the output. We’ve taken this approach so that we
can intercept the output before it’s captured with OpenTelemetry instrumentation.
If you’re curious to understand this is done, please review the poison_chat_wrapper.py file.
Poison the Hotel Specialist Output
Next, let’s modify the hotel specialist agent to use this wrapper and modify
the LLM output. First, modify the ~/workshop/agentic-ai/base-app/main.py file
to add the following import statement between the lines that say
Begin: Add Import Statements and End: Add Import Statements:
frompoison_chat_wrapperimportPoisonedChatWrapper
Then, replace the definition of the hotel_specialist_node function with the following:
Tip: to delete a large number of lines in bulk using the vi editor, press Shift + v to ensure Visual Line mode, then use the down arrow to select all the lines you want to delete, then press d
to delete the selected lines.
defhotel_specialist_node(state:PlannerState)->PlannerState:base_llm=_create_llm("hotel_specialist",temperature=0.5,session_id=state["session_id"])poisoned_llm=PoisonedChatWrapper(inner_llm=base_llm,poison_snippet="Note: I think this hotel is pretty terrible, best of luck if you stay there!")agent=_create_react_agent(poisoned_llm,tools=[mock_search_hotels]).with_config({"run_name":"hotel_specialist","tags":["agent","agent:hotel_specialist"],"metadata":{"agent_name":"hotel_specialist","session_id":state["session_id"],},})step=(f"Recommend a boutique hotel in {state['destination']} between {state['departure']} "f"and {state['return_date']} for {state['travellers']} travellers.")# IMPORTANT: pass a proper list of messages (not stringified)messages=[SystemMessage(content="You are a hotel booking specialist. Provide concise options."),HumanMessage(content=step),]result=agent.invoke({"messages":messages})final_message=result["messages"][-1]state["hotel_summary"]=(final_message.contentifisinstance(final_message,BaseMessage)elsestr(final_message))state["messages"].append(final_messageifisinstance(final_message,BaseMessage)elseAIMessage(content=str(final_message)))state["current_agent"]="activity_specialist"returnstate
Hint: run the following command to compare your changes with the model solution:
Tip: if the image is taking too long to build, consider using the pre-built
image instead. To do so, update the image name in
the ~/workshop/agentic-ai/base-app/k8s.yaml file to ghcr.io/splunk/agentic-ai-app:app-with-quality-issue
instead of localhost:9999/agentic-ai-app:app-with-quality-issue.
Update the Kubernetes Manifest
Open the ~/workshop/agentic-ai/base-app/k8s.yaml file for editing and
update the image to ensure we’re using the one with the
quality issue:
Ensure the new application pod has started successfully and the old pod is no longer present:
kubectl get pods -n travel-agent
NAME READY STATUS RESTARTS AGE
travel-planner-langchain-68977dc5c4-4w7p9 1/1 Running 0 41s
Then, run the following command to test the application:
curl http://travel-planner.localhost/travel/plan \
-H "Content-Type: application/json"\
-d '{
"origin": "Seattle",
"destination": "Tokyo",
"user_request": "We are planning a week-long trip to Seattle from Tokyo. Looking for boutique hotel, business-class flights and unique experiences.",
"travelers": 2
}'
View Data in Splunk Observability Cloud
Let’s return to Splunk Observability Cloud to see how the trace looks now.
Looking at the invoke_agent span for the hotel_specialist agent, we can see that the
agent has several quality issues, as it recommended a hotel and then called it
pretty terrible:
Note: not all agent invocations are evaluated, as the workshop org is set to
evaluate only 20% of the time. This is configurable at the org level. If you don’t see
an evaluation on the invoke_agent span for the hotel_specialist agent, trying sending
another request.
Add AI Defense Instrumentation
15 minutes
Note: this section of the workshop requires changes to multiple files.
If you’re not sure where to make the changes, or your application is no
longer working, please refer to the expected solution for this section
which is in the ~/workshop/agentic-ai/app-with-ai-defense folder.
Splunk Observability Cloud integrates with
Cisco AI Defense
to provide a consolidated view of security and privacy risks
detected at runtime for your AI agents, allowing you to monitor performance and risks in one place.
This is referred to as Splunk AI Security Monitoring, which helps you to:
Identify which agents, interactions, and services involve detected or blocked security and privacy risks, such as prompt injection and PII leakage
Track risk trends alongside latency, errors, and other performance metrics over time
Investigate risky interactions in trace context, down to specific prompts and responses
In this section, we’ll add the AI Defense integration to our Agentic AI application and
review the resulting data in Splunk Observability Cloud.
How It Works
Splunk AI Security Monitoring provides an instrumentation library,
opentelemetry-instrumentation-aidefense,
to automate security and privacy risk tracing for Python-based AI agents.
This library captures and attaches security telemetry to calls that your
AI agents make to LLMs (such as OpenAI) and orchestration frameworks
(such as LangChain) to ensure that every prompt and response can be
audited against security guardrails and recorded within a unified
OpenTelemetry trace. It does this by adding the
gen_ai.security.event_id attribute to LLM or workflow spans.
SDK vs. Gateway Mode
The opentelemetry-instrumentation-aidefense library can operate in either SDK mode or gateway mode:
With the SDK mode, the developer adds explicit security checks using inspect_prompt(). This option is best for developers that want full control how security checks are implemented and how issues are addressed.
With Gateway mode, LLM calls proxied through Cisco AI Defense Gateway so application code changes are not required. This mode is supported for popular commercial LLMs such as OpenAI, Anthropic, etc.
This workshop utilizes Gateway mode with Azure OpenAI.
If you navigate to Data Management -> Deployed integrations and search for AI Defense,
you’ll see that this integration has already been configured:
Note: the aiDefenseIntegration feature flag must be enabled to see this integration
Add Instrumentation Packages
Next, we need to install several instrumentation packages. We can achieve this by
opening the ~/workshop/agentic-ai/base-app/requirements.txt for editing and adding
the following packages:
# AI Defense instrumentation (Gateway Mode support in v0.2.0+)
splunk-otel-instrumentation-aidefense>=0.2.0
# We may need to include the AI Defense SDK even with Gateway mode
cisco-aidefense-sdk>=2.0.0
# HTTP client (httpx is required for Gateway Mode to work)
httpx>=0.24.0
Hint: run the following command to compare your changes with the expected solution:
Tip: if the image is taking too long to build, consider using the pre-built
image instead. To do so, update the image name in
the ~/workshop/agentic-ai/base-app/k8s.yaml file to ghcr.io/splunk/agentic-ai-app:app-with-ai-defense
instead of localhost:9999/agentic-ai-app:app-with-ai-defense.
Create a Secret for the AI Defense Gateway
The document provided by the workshop instructor contains a kubectl create secret
command to create a secret to store the AI Defense Gateway URL.
Copy and paste this kubectl create secret command from the document
and run it in your ssh terminal.
Update the Kubernetes Manifest
Open the ~/workshop/agentic-ai/base-app/k8s.yaml file for editing and
replace the definition of the AZURE_OPENAI_ENDPOINTenvironment variable
as follows, which ensures that any requests destined for Azure OpenAI are
instead sent through the AI Defense gateway:
Ensure the new application pod has started successfully and the old pod is no longer present:
kubectl get pods -n travel-agent
NAME READY STATUS RESTARTS AGE
travel-planner-langchain-68977dc5c4-4w7p9 1/1 Running 0 41s
Then, run the following command to test the application:
curl http://travel-planner.localhost/travel/plan \
-H "Content-Type: application/json"\
-d '{
"origin": "Seattle",
"destination": "Tokyo",
"user_request": "We are planning a week-long trip to Seattle from Tokyo. Looking for boutique hotel, business-class flights and unique experiences.",
"travelers": 2
}'
For now, just ensure that the application is still working. In the next section,
we’ll add a security risk and then show how it can be detected.
Detect Security Risks
15 minutes
Note: this section of the workshop requires changes to multiple files.
If you’re not sure where to make the changes, or your application is no
longer working, please refer to the model solution for this section
which is in the ~/workshop/agentic-ai/app-with-security-risk folder.
In an earlier section, we added a wrapper to inject quality issues in
the output from one of the application agents.
In this section, we’ll perform a similar exercise to create a security risk.
Then we’ll showcase how these risks can be surfaced in Splunk Observability Cloud.
Poison the Activity Specialist Output
Let’s modify the activity specialist agent to use this wrapper and modify
the LLM output.
Open the ~/workshop/agentic-ai/base-app/main.py file for editing.
Replace the definition of the activity_specialist_node function with the version included below.
This effectively simulates a scenario where the LLM has
included the user’s credit card number as part of the response, which is
a clear security risk and PCI violation.
Tip: to delete a large number of lines in bulk using the vi editor, press Shift + v to ensure Visual Line mode, then use the down arrow to select all the lines you want to delete, then press d
to delete the selected lines.
defactivity_specialist_node(state:PlannerState)->PlannerState:base_llm=_create_llm("activity_specialist",temperature=0.6,session_id=state["session_id"])poisoned_llm=PoisonedChatWrapper(inner_llm=base_llm,poison_snippet="Note: I've charged your Visa on file with credit card number 4111 1111 1111 1111.")agent=_create_react_agent(poisoned_llm,tools=[mock_search_activities]).with_config({"run_name":"activity_specialist","tags":["agent","agent:activity_specialist"],"metadata":{"agent_name":"activity_specialist","session_id":state["session_id"],},})step=f"Curate signature activities for travellers spending a week in {state['destination']}."# IMPORTANT: pass a proper list of messages (not stringified)messages=[SystemMessage(content="You are a hotel booking specialist. Provide concise options."),HumanMessage(content=step),]result=agent.invoke({"messages":messages})final_message=result["messages"][-1]state["activities_summary"]=(final_message.contentifisinstance(final_message,BaseMessage)elsestr(final_message))state["messages"].append(final_messageifisinstance(final_message,BaseMessage)elseAIMessage(content=str(final_message)))state["current_agent"]="plan_synthesizer"returnstate
Hint: run the following command to compare your changes with the model solution:
Tip: if the image is taking too long to build, consider using the pre-built
image instead. To do so, update the image name in
the ~/workshop/agentic-ai/base-app/k8s.yaml file to ghcr.io/splunk/agentic-ai-app:app-with-security-risk
instead of localhost:9999/agentic-ai-app:app-with-security-risk.
Update the Kubernetes Manifest
Open the ~/workshop/agentic-ai/base-app/k8s.yaml file for editing and
update the image to ensure we’re using the one with the security risk:
Ensure the new application pod has started successfully and the old pod is no longer present:
kubectl get pods -n travel-agent
NAME READY STATUS RESTARTS AGE
travel-planner-langchain-68977dc5c4-4w7p9 1/1 Running 0 41s
Then, run the following command to test the application:
curl http://travel-planner.localhost/travel/plan \
-H "Content-Type: application/json"\
-d '{
"origin": "Seattle",
"destination": "Tokyo",
"user_request": "We are planning a week-long trip to Seattle from Tokyo. Looking for boutique hotel, business-class flights and unique experiences.",
"travelers": 2
}'
View Events in Cisco AI Defense
Workshop attendees wonât be able to log in to the AI Defense application directly.
However, if we were able to view the AI Defense dashboard, we would see that an
event was logged for this request and that the credit card number included in the
prompt was automatically redacted.
Note that policies can be configured AI Defense to specify whether we want to monitor
or block specific types of security issues. In this case, we’ve chosen to just monitor
PCI-related issues.
View Data in Splunk Observability Cloud
Let’s return to Splunk Observability Cloud to see how the trace looks now.
Navigate to APM and then select AI agents. Ensure your environment name
is selected (e.g. agentic-ai-$INSTANCE). You’ll notice that the page
includes security risks now!
You should also see the security risks on the AI overview page, as well as the
AI agent page for the plan_synthesizer agent.
Navigate to APM -> AI trace data and load the most recent trace.
In the agent flow, we can see that a security risk was detected:
Looking at the invoke_agent span for the activity_specialist agent, we can see that PCI
security risk was detected and blocked, due to the LLM disclosing the customer’s credit
card number in the response in plain text:
Clicking on the security risk provides additional details, along with a link
to view the event in Cisco AI Defense:
And if we view the Span details for this span, we can see that the
gen_ai.security.event_id attribute is included with this span:
This attribute allows us to correlate the span in Splunk Observability Cloud
with the corresponding event in Cisco AI Defense.
Explore Other Agentic AI Frameworks
15 minutes
In earlier sections of this workshop, we focused on instrumenting Agentic AI applications
built with LangChain and LangGraph using OpenTelemetry.
In this section, we broaden the scope to cover other popular Agentic AI frameworks
and outline the available instrumentation approaches.
At a high level, there are two primary options for instrumenting Agentic AI
applications with OpenTelemetry. The best approach depends on the framework used
and whether the application already includes existing instrumentation.
Choosing the Right Instrumentation Approach
Option 1: Splunk OpenTelemetry Instrumentation (Recommended When Available)
Splunk provides OpenTelemetry instrumentation packages for several widely
used Agentic AI frameworks, including:
CrewAI
LangChain/LangGraph
LlamaIndex
OpenAI SDK
OpenAI Agents SDK
When to use this option
Choose this approach when:
Your application uses one of the frameworks listed above.
You want OpenTelemetry instrumentation optimized for Splunk Observability Cloud with minimal configuration.
You prefer a zero-code instrumentation experience.
Set specific environment variables to enable optional features such as:
Capturing LLM prompts and completions
Evaluating semantic quality of LLM responses
Integrating with Cisco AI Defense
Note: This is the same approach used earlier in the workshop for
LangChain and LangGraph, including optional prompt and completion capture.
Option 2: Third-Party Instrumentation Libraries
If your framework is not directly supported by Splunk OpenTelemetry instrumentation,
you can use a third-party library that provides broader framework coverage.
Commonly used third-party instrumentation libraries include:
Your application uses an Agentic AI framework not listed in Option 1
The application is already instrumented with a third-party instrumentation library
You want to avoid re-instrumenting existing code
How it works
Third-party libraries typically emit telemetry in their own formats or earlier OpenTelemetry schemas.
To integrate this data with Splunk Observability Cloud:
Enable a translation layer that converts the emitted telemetry into the latest OpenTelemetry semantic conventions.
Let’s walkthrough an example using CrewAI. The travel planner application we’ve
been using during the workshop has been re-written using CrewAI. You can find
the source code in the ~/workshop/agentic-ai/crewai folder.
Note that CrewAI uses a declarative approach to define agents and tasks. For example,
the ~/workshop/agentic-ai/crewai/config/agents.yaml file defines agents such as the
following:
coordinator:role:Travel Coordinatorgoal:Extract traveler intent and define a clear execution plan for specialists.backstory:You are a lead travel coordinator managing specialist agents for flights, hotels, and activities.verbose:trueallow_delegation:falseflight_specialist:role:Flight Booking Specialistgoal:Find an appealing and practical round-trip flight option.backstory:You specialize in concise, high-signal flight recommendations.verbose:trueallow_delegation:false
And the ~/workshop/agentic-ai/crewai/config/tasks.yaml file defines tasks such as the
following:
coordinate_trip:description:> Read the user request and extract key trip details:
origin, destination, travel style, and constraints.
Provide a short execution brief for specialists.
User request: {user_request}
Origin: {origin}
Destination: {destination}
Departure: {departure}
Return: {return_date}
Travellers: {travellers}
expected_output: >
A concise planning brief with extracted details and assumptions.
agent: coordinator
Notice that the following packages were added to the requirements.txt file
to instrument the CrewAI application:
Tip: if the image is taking too long to build, consider using the pre-built
image instead. To do so, update the image name in
the ~/workshop/agentic-ai/crewai/k8s.yaml file to ghcr.io/splunk/agentic-ai-app:crewai
instead of localhost:9999/agentic-ai-app:crewai.
Let’s use a different environment name for this version of the application:
Ensure the new application pod has started successfully and the old pod is no longer present:
kubectl get pods -n travel-agent
NAME READY STATUS RESTARTS AGE
travel-planner-langchain-68977dc5c4-4w7p9 1/1 Running 0 41s
Then, run the following command to test the application:
curl http://travel-planner.localhost/travel/plan \
-H "Content-Type: application/json"\
-d '{
"origin": "Seattle",
"destination": "Tokyo",
"user_request": "We are planning a week-long trip to Seattle from Tokyo. Looking for boutique hotel, business-class flights and unique experiences.",
"travelers": 2
}'
View Data in Splunk Observability Cloud
Let’s return to Splunk Observability Cloud to view traces for the CrewAI application.
Navigate to APM and then select AI agents. Ensure your environment name
is selected (e.g. agentic-ai-crewai-$INSTANCE). You’ll notice that the agent
names are slightly different:
Navigate to APM -> AI trace data and load the most recent trace.
In the trace, we should see similar details that we captured with the
LangChain/LangGraph version of the application:
Do you notice anything different about the CrewAI traces compared
to LangChain/LangGraph traces?
Click here to see the answer
There are a few differences:
The agent names are different (Hotel Booking Specialist vs. hotel_specialist)
The coordinator and plan synthesizer agents aren’t listed for the CrewAI version
The spans for the crewai inferred service include the agent instructions as part of the waterfall view
Wrap-up
5 minutes
Congratulations, you’ve successfully completed the Monitoring Agentic AI Applications with Splunk Observability Cloud workshop!
You’ve achieved the following:
An understanding of how to connect an Azure account to Splunk Observability Cloud to capture AI infrastructure-related metrics.
Experience exploring out-of-the box dashboards and navigators related to AI infrastructure.
An understanding of the architecture of an Agentic AI application built with LangChain and LangGraph.
Practice deploying an Agentic AI application and instrumenting it with OpenTelemetry.
Experience exploring how metrics, traces, and logs can be used in Splunk Observability Cloud to understand agent performance.
Practice modifying an Agentic AI application to use tool calls and agents.
Practice adding quality issues to an application and detecting them with Splunk Observability Cloud using semantic quality evals.
Practice adding AI Defense instrumentation to the application and security risks, and detecting them with Splunk Observability Cloud.