Splunk .conf25 Workshops
- Advanced OpenTelemetry Collector
Practice setting up the OpenTelemetry Collector configuration from scratch and go though several advanced configuration scenarios's.
Practice setting up the OpenTelemetry Collector configuration from scratch and go though several advanced configuration scenarios's.
The goal of this workshop is to help you gain confidence in creating and modifying OpenTelemetry Collector configuration files. Youβll start with a minimal agent.yaml
and gateway.yaml
file and progressively build them out to handle several advanced, real-world scenarios.
A key focus of this workshop is learning how to configure the OpenTelemetry Collector to store telemetry data locally, rather than sending it to a third-party vendor backend. This approach not only simplifies debugging and troubleshooting but is also ideal for testing and development environments where you want to avoid sending data to production systems.
To make the most of this workshop, you should have:
Everything in this workshop is designed to run locally, ensuring a hands-on and accessible learning experience. Letβs dive in and start building!
During this workshop, we will cover the following topics:
By the end of this workshop, you’ll be familiar with configuring the OpenTelemetry Collector for a variety of real-world use cases.
vi
, vim
, nano
, or your preferred text editor.2222
is required for ssh
access.jq
is required - https://jqlang.org/download/Create a directory: In your environment create a new directory and change into it:
mkdir advanced-otel-workshop && \
cd advanced-otel-workshop
We will refer to this directory as [WORKSHOP]
for the remainder of the workshop.
If you have completed the Splunk IM workshop, please ensure you have deleted the collector running in Kubernetes before continuing. This can be done by running the following command:
helm delete splunk-otel-collector
The EC2 instance in that case may also run some services that can interfere with this workshop , so run the following command to make sure they are stopped if present:
kubectl delete ~/workshop/apm/deployment.yaml
Download workshop binaries: Change into your [WORKSHOP]
directory and download the OpenTelemetry Collector, Load Generator binaries and setup script:
curl -L https://github.com/signalfx/splunk-otel-collector/releases/download/v0.132.0/otelcol_linux_amd64 -o otelcol && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/loadgen/build/loadgen-linux-amd64 -o loadgen && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/setup-workshop.sh -o setup-workshop.sh && \
chmod +x setup-workshop.sh
curl -L https://github.com/signalfx/splunk-otel-collector/releases/download/v0.132.0/otelcol_darwin_arm64 -o otelcol && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/loadgen/build/loadgen-darwin-arm64 -o loadgen && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/setup-workshop.sh -o setup-workshop.sh && \
chmod +x setup-workshop.sh
Run the setup-workshop.sh
script which will configure the correct permissions and also create the initial configurations for the Agent and the Gateway:
./setup-workshop.sh
βββββββββββββββ βββ βββ βββββββ ββββββ βββ βββ
βββββββββββββββββββ βββ ββββββββ ββββββ ββββ ββββ
βββββββββββββββββββ βββ βββββββββ ββββββββββ ββββ
βββββββββββββββ βββ βββ ββββββββββββββββββββ ββββ
βββββββββββ ββββββββββββββββββββ βββββββββ βββ ββββ
βββββββββββ ββββββββ βββββββ βββ ββββββββ βββ βββ
Welcome to the Splunk Advanced OpenTelemetry Workshop!
======================================================
macOS detected. Removing quarantine attributes...
otelcol version v0.126.0
Usage: loadgen [OPTIONS]
Options:
-base Send base traces (enabled by default)
-health Send health traces
-security Send security traces
-logs Enable logging of random quotes to quotes.log
-json Output logs in JSON format (only applicable with -logs)
-count Number of traces or logs to send (default: infinite)
-h, --help Display this help message
Example:
loadgen -health -security -count 10 Send 10 health and security traces
loadgen -logs -json -count 5 Write 5 random quotes in JSON format to quotes.log
Creating workshop directories...
β Created subdirectories:
βββ 1-agent-gateway
βββ 2-building-resilience
βββ 3-dropping-spans
βββ 4-sensitive-data
βββ 5-transform-data
βββ 6-routing-data
Creating configuration files for 1-agent-gateway...
Creating OpenTelemetry Collector agent configuration file: 1-agent-gateway/agent.yaml
β Configuration file created successfully: 1-agent-gateway/agent.yaml
β File size: 4355 bytes
Creating OpenTelemetry Collector gateway configuration file: 1-agent-gateway/gateway.yaml
β Configuration file created successfully: 1-agent-gateway/gateway.yaml
β File size: 3376 bytes
β Completed configuration files for 1-agent-gateway
Creating configuration files for 2-building-resilience...
Creating OpenTelemetry Collector agent configuration file: 2-building-resilience/agent.yaml
β Configuration file created successfully: 2-building-resilience/agent.yaml
β File size: 4355 bytes
Creating OpenTelemetry Collector gateway configuration file: 2-building-resilience/gateway.yaml
β Configuration file created successfully: 2-building-resilience/gateway.yaml
β File size: 3376 bytes
β Completed configuration files for 2-building-resilience
Workshop environment setup complete!
Configuration files created in the following directories:
1-agent-gateway/
βββ agent.yaml
βββ gateway.yaml
2-building-resilience/
βββ agent.yaml
βββ gateway.yaml
[WORKSHOP]
βββ 1-agent-gateway
βββ 2-building-resilience
βββ 3-dropping-spans
βββ 4-sensitive-data
βββ 5-transform-data
βββ 6-routing-data
βββ loadgen
βββ otelcol
βββ setup-workshop.sh
Welcome! In this section, weβll begin with a fully functional OpenTelemetry setup that includes both an Agent and a Gateway.
Weβll start by quickly reviewing their configuration files to get familiar with the overall structure and to highlight key sections that control the telemetry pipeline.
Throughout the workshop, youβll work with multiple terminal windows. To keep things organized, give each terminal a unique name or color. This will help you easily recognize and switch between them during the exercises.
We will refer to these terminals as: Agent, Gateway, Loadgen, and Test.
Create your first terminal window and name it Agent. Navigate to the directory for the first exercise [WORKSHOP]/1-agent-gateway
and verify that the required files have been generated:
cd 1-agent-gateway
ls -l
You should see the following files in the directory. If not, re-run the setup-workshop.sh
script as described in the Pre-requisites section:
.
βββ agent.yaml
βββ gateway.yaml
Letβs review the key components of the agent.yaml
file used in this workshop. Weβve made some important additions to support metrics, traces, and logs.
The receivers
section defines how the Agent ingests telemetry data. In this setup, three types of receivers have been configured:
Host Metrics Receiver
hostmetrics: # Host Metrics Receiver
collection_interval: 3600s # Collection Interval (1hr)
scrapers:
cpu: # CPU Scraper
Collects CPU usage from the local system every hour. Weβll use this to generate example metric data.
OTLP Receiver (HTTP protocol)
otlp: # OTLP Receiver
protocols:
http: # Configure HTTP protocol
endpoint: "0.0.0.0:4318" # Endpoint to bind to
Enables the agent to receive metrics, traces, and logs over HTTP on port 4318
. This is used to send data to the collector in future exercises.
FileLog Receiver
filelog/quotes: # Receiver Type/Name
include: ./quotes.log # The file to read log data from
include_file_path: true # Include file path in the log data
include_file_name: false # Exclude file name from the log data
resource: # Add custom resource attributes
com.splunk.source: ./quotes.log # Source of the log data
com.splunk.sourcetype: quotes # Source type of the log data
Enables the agent to tail a local log file (quotes.log
) and convert it to structured log events, enriched with metadata such as source
and sourceType
.
Debug Exporter
debug: # Exporter Type
verbosity: detailed # Enabled detailed debug output
OTLPHTTP Exporter
otlphttp: # Exporter Type
endpoint: "http://localhost:5318" # Gateway OTLP endpoint
The debug
exporter sends data to the console for visibility and debugging during the workshop while the otlphttp
exporter forwards all telemetry to the local Gateway instance.
This dual-export strategy ensures you can see the raw data locally while also sending it downstream for further processing and export.
The OpenTelemetry Gateway serves as a central hub for receiving, processing, and exporting telemetry data. It sits between your telemetry sources (such as applications and services) and your observability backends like Splunk Observability Cloud.
By centralizing telemetry traffic, the gateway enables advanced features such as data filtering, enrichment, transformation, and routing to one or more destinations. It helps reduce the burden on individual services by offloading telemetry processing and ensures consistent, standardized data across distributed systems.
This makes your observability pipeline easier to manage, scale, and analyzeβespecially in complex, multi-service environments.
Open or create your second terminal window and name it Gateway. Navigate to the first exercise directory [WORKSHOP]/1-agent-gateway
then check the contents of the gateway.yaml
file.
This file outlines the core structure of the OpenTelemetry Collector as deployed in Gateway mode.
Letβs explore the gateway.yaml
file that defines how the OpenTelemetry Collector is configured in Gateway mode during this workshop. This Gateway is responsible for receiving telemetry from the Agent, then processing and exporting it for inspection or forwarding.
OTLP Receiver (Custom Port)
receivers:
otlp:
protocols:
http:
endpoint: "0.0.0.0:5318"
The port 5318
matches the otlphttp
exporter in the Agent configuration, ensuring that all telemetry data sent by the Agent is accepted by the Gateway.
This separation of ports avoids conflicts and keeps responsibilities clear between agent and gateway roles.
File Exporters
The Gateway uses three file exporters to output telemetry data to local files. These exporters are defined as:
exporters: # List of exporters
debug: # Debug exporter
verbosity: detailed # Enable detailed debug output
file/traces: # Exporter Type/Name
path: "./gateway-traces.out" # Path for OTLP JSON output for traces
append: false # Overwrite the file each time
file/metrics: # Exporter Type/Name
path: "./gateway-metrics.out" # Path for OTLP JSON output for metrics
append: false # Overwrite the file each time
file/logs: # Exporter Type/Name
path: "./gateway-logs.out" # Path for OTLP JSON output for logs
append: false # Overwrite the file each time
Each exporter writes a specific signal type to its corresponding file.
These files are created once the gateway is started and will be populated with real telemetry as the agent sends data. You can monitor these files in real time to observe the flow of telemetry through your pipeline.
Now, we can start the Gateway and the Agent, which is configured to automatically send Host Metrics at startup. We do this to verify that data is properly routed from the Agent to the Gateway.
Gateway: In the Gateway terminal window, run the following command to start the Gateway:
../otelcol --config=gateway.yaml
If everything is configured correctly, the collector will start and state Everything is ready. Begin running and processing data.
in the output, similar to the following:
2025-06-09T09:22:11.944+0100 info service@v0.126.0/service.go:289 Everything is ready. Begin running and processing data. {"resource": {}}
Once the Gateway is running, it will listen for incoming data on port 5318
and export the received data to the following files:
gateway-traces.out
gateway-metrics.out
gateway-logs.out
Start the Agent: In the Agent terminal window start the agent with the agent configuration:
../otelcol --config=agent.yaml
Verify CPU Metrics:
<snip>
NumberDataPoints #31
Data point attributes:
-> cpu: Str(cpu3)
-> state: Str(wait)
StartTimestamp: 2025-07-07 16:49:42 +0000 UTC
Timestamp: 2025-07-09 09:36:21.190226459 +0000 UTC
Value: 77.380000
{"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics"}
At this stage, the Agent continues to collect CPU metrics once per hour or upon each restart and sends them to the gateway. The Gateway processes these metrics and exports them to a file named gateway-metrics.out
. This file stores the exported metrics as part of the pipeline service.
Verify Data arrived at Gateway: To confirm that CPU metrics, specifically for cpu0
, have successfully reached the gateway, weβll inspect the gateway-metrics.out
file using the jq
command.
The following command filters and extracts the system.cpu.time
metric, focusing on cpu0
. It displays the metricβs state (e.g., user
, system
, idle
, interrupt
) along with the corresponding values.
Open or create your third terminal window and name it Tests. Run the command below in the Tests terminal to check the system.cpu.time
metric:
jq '.resourceMetrics[].scopeMetrics[].metrics[] | select(.name == "system.cpu.time") | .sum.dataPoints[] | select(.attributes[0].value.stringValue == "cpu0") | {cpu: .attributes[0].value.stringValue, state: .attributes[1].value.stringValue, value: .asDouble}' gateway-metrics.out
{
"cpu": "cpu0",
"state": "user",
"value": 123407.02
}
{
"cpu": "cpu0",
"state": "system",
"value": 64866.6
}
{
"cpu": "cpu0",
"state": "idle",
"value": 216427.87
}
{
"cpu": "cpu0",
"state": "interrupt",
"value": 0
}
Stop the Agent and the Gateway processes by pressing Ctrl-C
in their respective terminals.
The OpenTelemetry Collectorβs FileStorage Extension is a critical component for building a more resilient telemetry pipeline. It enables the Collector to reliably checkpoint in-flight data, manage retries efficiently, and gracefully handle temporary failures without losing valuable telemetry.
With FileStorage enabled, the Collector can persist intermediate states to disk, ensuring that your traces, metrics, and logs are not lost during network disruptions, backend outages, or Collector restarts. This means that even if your network connection drops or your backend becomes temporarily unavailable, the Collector will continue to receive and buffer telemetry, resuming delivery seamlessly once connectivity is restored.
By integrating the FileStorage Extension into your pipeline, you can strengthen the durability of your observability stack and maintain high-quality telemetry ingestion, even in environments where connectivity may be unreliable. This solution will work for metrics as long as the connection downtime is brief, up to 15 minutes. If the downtime exceeds this, Splunk Observability Cloud might drop data to make sure no data-point is out of order. For logs, there are plans to implement a full enterprise-ready solution in one of the upcoming Splunk OpenTelemetry Collector releases.
Note
In this exercise, we will update the extensions:
section of the agent.yaml
file. This section is part of the OpenTelemetry configuration YAML and defines optional components that enhance or modify the OpenTelemetry Collectorβs behavior.
While these components do not process telemetry data directly, they provide valuable capabilities and services to improve the Collectorβs functionality.
Change ALL terminal windows to the 2-building-resilience
directory and run the clear
command.
Your directory structure will look like this:
.
βββ agent.yaml
βββ gateway.yaml
Update the agent.yaml
: In the Agent terminal window, add the file_storage
extension under the existing health_check
extension:
file_storage/checkpoint: # Extension Type/Name
directory: "./checkpoint-dir" # Define directory
create_directory: true # Create directory
timeout: 1s # Timeout for file operations
compaction: # Compaction settings
on_start: true # Start compaction at Collector startup
# Define compaction directory
directory: "./checkpoint-dir/tmp"
max_transaction_size: 65536 # Max. size limit before compaction occurs
Add file_storage
to the exporter: Modify the otlphttp
exporter to configure retry and queuing mechanisms, ensuring data is retained and resent if failures occur. Add the following under the endpoint: "http://localhost:5318"
and make sure the indentation matches endpoint
:
retry_on_failure:
enabled: true # Enable retry on failure
sending_queue: #
enabled: true # Enable sending queue
num_consumers: 10 # No. of consumers
queue_size: 10000 # Max. queue size
storage: file_storage/checkpoint # File storage extension
Update the services
section: Add the file_storage/checkpoint
extension to the existing extensions:
section and the configuration needs to look like this:
service:
extensions:
- health_check
- file_storage/checkpoint # Enabled extensions for this collector
Update the metrics
pipeline: For this exercise we are going to comment out the hostmetrics
receiver from the Metric pipeline to reduce debug and log noise, again the configuration needs to look like this:
metrics:
receivers:
# - hostmetrics # Hostmetric reciever (cpu only)
- otlp
Next, we will configure our environment to be ready for testing the File Storage configuration.
Start the Gateway: In the Gateway terminal window run:
../otelcol --config=gateway.yaml
Start the Agent: In the Agent terminal window run:
../otelcol --config=agent.yaml
Send five test spans: In the Loadgen terminal window run:
../loadgen -count 5
Both the Agent and Gateway should display debug logs, and the Gateway should create a ./gateway-traces.out
file.
If everything functions correctly, we can proceed with testing system resilience.
To assess the Agent’s resilience, we’ll simulate a temporary Gateway outage and observe how the Agent handles it:
Simulate a network failure: In the Gateway terminal stop the Gateway with Ctrl-C
and wait until the gateway console shows that it has stopped. The Agent will continue running, but it will not be able to send data to the gateway. The output in the Gateway terminal should look similar to this:
2025-07-09T10:22:37.941Z info service@v0.126.0/service.go:345 Shutdown complete. {"resource": {}}
Send traces: In the Loadgen terminal window send five more traces using the loadgen
.
../loadgen -count 5
Notice that the agentβs retry mechanism is activated as it continuously attempts to resend the data. In the agentβs console output, you will see repeated messages similar to the following:
2025-01-28T14:22:47.020+0100 info internal/retry_sender.go:126 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlphttp", "error": "failed to make an HTTP request: Post \"http://localhost:5318/v1/traces\": dial tcp 127.0.0.1:5318: connect: connection refused", "interval": "9.471474933s"}
Stop the Agent: In the Agent terminal window, use Ctrl-C
to stop the agent. Wait until the agentβs console confirms it has stopped:
2025-07-09T10:25:59.344Z info service@v0.126.0/service.go:345 Shutdown complete. {"resource": {}}
When you stop the agent, any metrics, traces, or logs held in memory for retry will be lost. However, because we have configured the FileStorage Extension, all telemetry that has not yet been accepted by the target endpoint are safely checkpointed on disk.
Stopping the agent is a crucial step to clearly demonstrate how the system recovers when the agent is restarted.
In this exercise, weβll test how the OpenTelemetry Collector recovers from a network outage by restarting the Gateway collector. When the Gateway becomes available again, the Agent will resume sending data from its last check-pointed state, ensuring no data loss.
Restart the Gateway: In the Gateway terminal window run:
../otelcol --config=gateway.yaml
Restart the Agent: In the Agent terminal window run:
../otelcol --config=agent.yaml
After the Agent is up and running, the File_Storage extension will detect buffered data in the checkpoint folder. It will start to dequeue the stored spans from the last checkpoint folder, ensuring no data is lost.
Verify the Agent Debug output: Note that the Agent debug output does NOT change and still shows the following line indicating no new data is being exported:
2025-07-11T08:31:58.176Z info service@v0.126.0/service.go:289 Everything is ready. Begin running and processing data. {"resource": {}}
Watch the Gateway Debug output
You should see from the Gateway debug screen, it has started receiving the previously missed traces without requiring any additional action on your part e.g.:
Attributes:
-> user.name: Str(Luke Skywalker)
-> user.phone_number: Str(+1555-867-5309)
-> user.email: Str(george@deathstar.email)
-> user.password: Str(LOTR>StarWars1-2-3)
-> user.visa: Str(4111 1111 1111 1111)
-> user.amex: Str(3782 822463 10005)
-> user.mastercard: Str(5555 5555 5555 4444)
-> payment.amount: Double(75.75)
{"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "traces"}
Check the gateway-traces.out
file: Using jq
, count the number of traces in the recreated gateway-traces.out
. It should match the number you send when the Gateway was down.
jq '.resourceSpans | length | "\(.) resourceSpans found"' gateway-traces.out
"5 resourceSpans found"
Stop the Agent and the Gateway processes by pressing Ctrl-C
in their respective terminals.
This exercise demonstrated how to enhance the resilience of the OpenTelemetry Collector by configuring the file_storage
extension, enabling retry mechanisms for the otlp
exporter, and using a file-backed queue for temporary data storage.
By implementing file-based check-pointing and queue persistence, you ensure the telemetry pipeline can gracefully recover from temporary interruptions, making it a more robust and reliable for production environments.
In this section, we will explore how to use the Filter Processor to selectively drop spans based on certain conditions.
Specifically, we will drop traces based on the span name, which is commonly used to filter out unwanted spans such as health checks or internal communication traces. In this case, we will be filtering out spans that contain "/_healthz"
, typically associated with health check requests and usually are quite “noisy”.
Change ALL terminal windows to the 3-dropping-spans
directory and run the clear
command.
Copy *.yaml
from the 2-building-resilience
directory into 3-dropping-spans
. Your updated directory structure will now look like this:
.
βββ agent.yaml
βββ gateway.yaml
Next, we will configure the filter processor and the respective pipelines.
Switch to your Gateway terminal window and open the gateway.yaml
file. Update the processors
section with the following configuration:
Add a filter
processor:
Configure the gateway to exclude spans with the name /_healthz
. The error_mode: ignore
directive ensures that any errors encountered during filtering are ignored, allowing the pipeline to continue running smoothly. The traces
section defines the filtering rules, specifically targeting spans named /_healthz
for exclusion.
filter/health: # Defines a filter processor
error_mode: ignore # Ignore errors
traces: # Filtering rules for traces
span: # Exclude spans named "/_healthz"
- 'name == "/_healthz"'
Add the filter
processor to the traces
pipeline:
Include the filter/health
processor in the traces
pipeline. For optimal performance, place the filter as early as possibleβright after the memory_limiter
and before the batch
processor. Hereβs how the configuration should look:
traces:
receivers:
- otlp
processors:
- memory_limiter
- filter/health # Filters data based on rules
- resource/add_mode
- batch
exporters:
- debug
- file/traces
This setup ensures that health check related spans (/_healthz
) are filtered out early in the pipeline, reducing unnecessary noise in your telemetry data.
To test your configuration, you’ll need to generate some trace data that includes a span named "/_healthz"
.
Start the Gateway: In your Gateway terminal window start the Gateway.
../otelcol --config ./gateway.yaml
Start the Agent: In your Agent terminal window start the Agent.
../otelcol --config ./agent.yaml
Start the Loadgen: In the Loadgen terminal window, execute the following command to start the load generator with health check spans enabled:
../loadgen -health -count 5
The debug output in the Agent terminal will show _healthz
spans:
InstrumentationScope healthz 1.0.0
Span #0
Trace ID : 0cce8759b5921c8f40b346b2f6e2f4b6
Parent ID :
ID : bc32bd0e4ddcb174
Name : /_healthz
Kind : Server
Start time : 2025-07-11 08:47:50.938703979 +0000 UTC
End time : 2025-07-11 08:47:51.938704091 +0000 UTC
Status code : Ok
Status message : Success
They will not be present in the Gateway debug as they are dropped by the filter processor that was configured earlier.
Verify agent.out
: Using jq
, in the Test terminal, confirm the name of the spans received by the Agent:
jq -c '.resourceSpans[].scopeSpans[].spans[] | "Span \(input_line_number) found with name \(.name)"' ./agent.out
"Span 1 found with name /movie-validator"
"Span 2 found with name /_healthz"
"Span 3 found with name /movie-validator"
"Span 4 found with name /_healthz"
"Span 5 found with name /movie-validator"
"Span 6 found with name /_healthz"
"Span 7 found with name /movie-validator"
"Span 8 found with name /_healthz"
"Span 9 found with name /movie-validator"
"Span 10 found with name /_healthz"
Check the Gateway Debug output: Using jq
confirm the name of the spans received by the Gateway:
jq -c '.resourceSpans[].scopeSpans[].spans[] | "Span \(input_line_number) found with name \(.name)"' ./gateway-traces.out
The gateway-metrics.out
file will not contain any spans named /_healthz
.
"Span 1 found with name /movie-validator"
"Span 2 found with name /movie-validator"
"Span 3 found with name /movie-validator"
"Span 4 found with name /movie-validator"
"Span 5 found with name /movie-validator"
To ensure optimal performance with the Filter processor, thoroughly understand your incoming data format and rigorously test your configuration. Use the most specific filtering criteria possible to minimize the risk of inadvertently dropping important data.
This configuration can be extended to filter spans based on various attributes, tags, or custom criteria, enhancing the OpenTelemetry Collector’s flexibility and efficiency for your specific observability requirements.
Stop the Agent and the Gateway processes by pressing Ctrl-C
in their respective terminals.
In this section, you’ll learn how to configure the OpenTelemetry Collector to remove specific tags and redact sensitive data from telemetry spans. This is crucial for protecting sensitive information such as credit card numbers, personal data, or other security-related details that must be anonymized before being processed or exported.
We’ll walk through configuring key processors in the OpenTelemetry Collector, including:
Change ALL terminal windows to the 4-sensitive-data
directory and run the clear
command.
Copy *.yaml
from the 3-dropping-spans
directory into 4-sensitive-data
. Your updated directory structure will now look like this:
.
βββ agent.yaml
βββ gateway.yaml
In this step, we’ll modify agent.yaml
to include the attributes
and redaction
processors. These processors will help ensure that sensitive data within span attributes is properly handled before being logged or exported.
Previously, you may have noticed that some span attributes displayed in the console contained personal and sensitive data. We’ll now configure the necessary processors to filter out and redact this information effectively.
Attributes:
-> user.name: Str(George Lucas)
-> user.phone_number: Str(+1555-867-5309)
-> user.email: Str(george@deathstar.email)
-> user.account_password: Str(LOTR>StarWars1-2-3)
-> user.visa: Str(4111 1111 1111 1111)
-> user.amex: Str(3782 822463 10005)
-> user.mastercard: Str(5555 5555 5555 4444)
{"kind": "exporter", "data_type": "traces", "name": "debug"}
Switch to your Agent terminal window and open the agent.yaml
file in your editor. Weβll add two processors to enhance the security and privacy of your telemetry data.
1. Add an attributes
Processor: The Attributes Processor allows you to modify span attributes (tags) by updating, deleting, or hashing their values. This is particularly useful for obfuscating sensitive information before it is exported.
In this step, weβll:
user.phone_number
attribute to a static value ("UNKNOWN NUMBER")
.user.email
attribute to ensure the original email is not exposed.user.password
attribute to remove it entirely from the span. attributes/update:
actions: # Actions
- key: user.phone_number # Target key
action: update # Update action
value: "UNKNOWN NUMBER" # New value
- key: user.email # Target key
action: hash # Hash the email value
- key: user.password # Target key
action: delete # Delete the password
2. Add a redaction
Processor: The Redaction Processor detects and redacts sensitive data in span attributes based on predefined patterns, such as credit card numbers or other personally identifiable information (PII).
In this step:
We set allow_all_keys: true
to ensure all attributes are processed (if set to false
, only explicitly allowed keys are retained).
We define blocked_values
with regular expressions to detect and redact Visa and MasterCard credit card numbers.
The summary: debug
option logs detailed information about the redaction process for debugging purposes.
redaction/redact:
allow_all_keys: true # If false, only allowed keys will be retained
blocked_values: # List of regex patterns to block
- '\b4[0-9]{3}[\s-]?[0-9]{4}[\s-]?[0-9]{4}[\s-]?[0-9]{4}\b' # Visa
- '\b5[1-5][0-9]{2}[\s-]?[0-9]{4}[\s-]?[0-9]{4}[\s-]?[0-9]{4}\b' # MasterCard
summary: debug # Show debug details about redaction
Update the traces
Pipeline: Integrate both processors into the traces
pipeline. Make sure that you comment out the redaction processor at first (we will enable it later in a separate exercise). Your configuration should look like this:
traces:
receivers:
- otlp
processors:
- memory_limiter
- attributes/update # Update, hash, and remove attributes
#- redaction/redact # Redact sensitive fields using regex
- resourcedetection
- resource/add_mode
- batch
exporters:
- debug
- file
- otlphttp
In this exercise, we will delete the user.account_password
, update the user.phone_number
attribute and hash the user.email
in the span data before it is exported by the Agent.
Start the Gateway: In your Gateway terminal window start the Gateway.
../otelcol --config=gateway.yaml
Start the Agent: In your Agent terminal window start the Agent.
../otelcol --config=agent.yaml
Start the Load Generator: In the Loadgen terminal window start the loadgen
:
../loadgen -count 1
Check the debug output: For both the Agent and Gateway confirm that user.account_password
has been removed, and both user.phone_number
& user.email
have been updated:
-> user.name: Str(George Lucas)
-> user.phone_number: Str(UNKNOWN NUMBER)
-> user.email: Str(62d5e03d8fd5808e77aee5ebbd90cf7627a470ae0be9ffd10e8025a4ad0e1287)
-> payment.amount: Double(51.71)
-> user.visa: Str(4111 1111 1111 1111)
-> user.amex: Str(3782 822463 10005)
-> user.mastercard: Str(5555 5555 5555 4444)
-> user.name: Str(George Lucas)
-> user.phone_number: Str(+1555-867-5309)
-> user.email: Str(george@deathstar.email)
-> user.password: Str(LOTR>StarWars1-2-3)
-> user.visa: Str(4111 1111 1111 1111)
-> user.amex: Str(3782 822463 10005)
-> user.mastercard: Str(5555 5555 5555 4444)
-> payment.amount: Double(95.22)
Check file output: Using jq
validate that user.account_password
has been removed, and user.phone_number
& user.email
have been updated in gateway-taces.out
:
jq '.resourceSpans[].scopeSpans[].spans[].attributes[] | select(.key == "user.password" or .key == "user.phone_number" or .key == "user.email") | {key: .key, value: .value.stringValue}' ./gateway-traces.out
Notice that the user.account_password
has been removed, and the user.phone_number
& user.email
have been updated:
{
"key": "user.phone_number",
"value": "UNKNOWN NUMBER"
}
{
"key": "user.email",
"value": "62d5e03d8fd5808e77aee5ebbd90cf7627a470ae0be9ffd10e8025a4ad0e1287"
}
Stop the Agent and the Gateway processes by pressing Ctrl-C
in their respective terminals.
The redaction
processor gives precise control over which attributes and values are permitted or removed from telemetry data.
In this exercise, we will redact the Start the Gateway: In your Gateway terminal window start the Gateway. Enable the Start the Agent: In your Agent terminal window start the Agent. Start the Load Generator: In the Loadgen terminal window start the Check the debug output: For both the Agent and Gateway confirm the values for By including Check file output: Using Notice that These are just a couple of examples of how user.visa
& user.mastercard
values in the span data before it is exported by the Agent.
Exercise
../otelcol --config=gateway.yaml
redaction/redact
processor: In the Agent terminal window, edit agent.yaml
and remove the #
we inserted in the previous exercise. traces:
receivers:
- otlp
processors:
- memory_limiter
- attributes/update # Update, hash, and remove attributes
- redaction/redact # Redact sensitive fields using regex
- resourcedetection
- resource/add_mode
- batch
exporters:
- debug
- file
- otlphttp
../otelcol --config=agent.yaml
loadgen
:../loadgen -count 1
user.visa
& user.mastercard
have been updated. Notice user.amex
attribute value was NOT redacted because a matching regex pattern was not added to blocked_values
-> user.name: Str(George Lucas)
-> user.phone_number: Str(UNKNOWN NUMBER)
-> user.email: Str(62d5e03d8fd5808e77aee5ebbd90cf7627a470ae0be9ffd10e8025a4ad0e1287)
-> payment.amount: Double(69.71)
-> user.visa: Str(****)
-> user.amex: Str(3782 822463 10005)
-> user.mastercard: Str(****)
-> redaction.masked.keys: Str(user.mastercard,user.visa)
-> redaction.masked.count: Int(2)
-> user.name: Str(George Lucas)
-> user.phone_number: Str(+1555-867-5309)
-> user.email: Str(george@deathstar.email)
-> user.password: Str(LOTR>StarWars1-2-3)
-> user.visa: Str(4111 1111 1111 1111)
-> user.amex: Str(3782 822463 10005)
-> user.mastercard: Str(5555 5555 5555 4444)
-> payment.amount: Double(65.54)
Note
summary:debug
in the redaction processor, the debug output will include summary information about which matching key values were redacted, along with the count of values that were masked. -> redaction.masked.keys: Str(user.mastercard,user.visa)
-> redaction.masked.count: Int(2)
jq
verify that user.visa
& user.mastercard
have been updated in the gateway-traces.out
.jq '.resourceSpans[].scopeSpans[].spans[].attributes[] | select(.key == "user.visa" or .key == "user.mastercard" or .key == "user.amex") | {key: .key, value: .value.stringValue}' ./gateway-traces.out
user.amex
has not been redacted because a matching regex pattern was not added to blocked_values
:{
"key": "user.visa",
"value": "****"
}
{
"key": "user.amex",
"value": "3782 822463 10005"
}
{
"key": "user.mastercard",
"value": "****"
}
attributes
and redaction
processors can be configured to protect sensitive data.
Stop the Agent and the Gateway processes by pressing Ctrl-C
in their respective terminals.
The Transform Processor lets you modify telemetry dataβlogs, metrics, and tracesβas it flows through the pipeline. Using the OpenTelemetry Transformation Language (OTTL), you can filter, enrich, and transform data on the fly without touching your application code.
In this exercise weβll update gateway.yaml
to include a Transform Processor that will:
You may have noticed that in previous logs, fields like SeverityText
and SeverityNumber
were undefined. This is typical of the filelog
receiver. However, the severity is embedded within the log body e.g.:
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(2025-01-31 15:49:29 [WARN] - Do or do not, there is no try.)
Logs often contain structured data encoded as JSON within the log body. Extracting these fields into attributes allows for better indexing, filtering, and querying. Instead of manually parsing JSON in downstream systems, OTTL enables automatic transformation at the telemetry pipeline level.
Change ALL terminal windows to the 5-transform-data
directory and run the clear
command.
Copy *.yaml
from the 4-sensitve-data
directory into 5-transform-data
. Your updated directory structure will now look like this:
.
βββ agent.yaml
βββ gateway.yaml
Add a transform
processor: Switch to your Gateway terminal window and edit the gateway.yaml
and add the following transform
processor:
transform/logs: # Processor Type/Name
log_statements: # Log Processing Statements
- context: resource # Log Context
statements: # List of attribute keys to keep
- keep_keys(attributes, ["com.splunk.sourcetype", "host.name", "otelcol.service.mode"])
By using the -context: resource
key we are targeting the resourceLog
attributes of logs.
This configuration ensures that only the relevant resource attributes (com.splunk.sourcetype
, host.name
, otelcol.service.mode
) are retained, improving log efficiency and reducing unnecessary metadata.
Adding a Context Block for Log Severity Mapping: To properly set the severity_text
and severity_number
fields of a log record, we add a log
context block within log_statements
. This configuration extracts the level
value from the log body, maps it to severity_text
, and assigns the corresponding severity_number
based on the log level:
- context: log # Log Context
statements: # Transform Statements Array
- set(cache, ParseJSON(body)) where IsMatch(body, "^\\{") # Parse JSON log body into a cache object
- flatten(cache, "") # Flatten nested JSON structure
- merge_maps(attributes, cache, "upsert") # Merge cache into attributes, updating existing keys
- set(severity_text, attributes["level"]) # Set severity_text from the "level" attribute
- set(severity_number, 1) where severity_text == "TRACE" # Map severity_text to severity_number
- set(severity_number, 5) where severity_text == "DEBUG"
- set(severity_number, 9) where severity_text == "INFO"
- set(severity_number, 13) where severity_text == "WARN"
- set(severity_number, 17) where severity_text == "ERROR"
- set(severity_number, 21) where severity_text == "FATAL"
The merge_maps
function is used to combine two maps (dictionaries) into one. In this case, it merges the cache
object (containing parsed JSON data from the log body) into the attributes
map.
attributes
: The target map where the data will be merged.cache
: The source map containing the parsed JSON data."upsert"
: This mode ensures that if a key already exists in the attributes
map, its value will be updated with the value from cache
. If the key does not exist, it will be inserted.This step is crucial because it ensures that all relevant fields from the log body (e.g., level
, message
, etc.) are added to the attributes
map, making them available for further processing or exporting.
Summary of Key Transformations:
You should have a single transform
processor containing two context blocks: one whose context is for resource
and one whose context is for log
.
This configuration ensures that log severity is correctly extracted, standardized, and structured for efficient processing.
This method of mapping all JSON fields to top-level attributes should only be used for testing and debugging OTTL. It will result in high cardinality in a production scenario.
Update the logs
pipeline: Add the transform/logs:
processor into the logs:
pipeline so your configuration looks like this:
logs: # Logs pipeline
receivers:
- otlp # OTLP receiver
processors: # Processors for logs
- memory_limiter
- resource/add_mode
- transform/logs
- batch
exporters:
- debug # Debug exporter
- file/logs
Start the Gateway: In the Gateway terminal run:
../otelcol --config=gateway.yaml
Start the Agent: In the Agent terminal run:
../otelcol --config=agent.yaml
Start the Load Generator: In the Loadgen terminal window, execute the following command to start the load generator with JSON enabled:
../loadgen -logs -json -count 5
The loadgen
will write 5 log lines to ./quotes.log
in JSON format.
This test verifies that the com.splunk/source
and os.type
metadata have been removed from the log resource attributes before being exported by the Agent. Additionally, the test ensures that:
SeverityText
and SeverityNumber
are set on the LogRecord
.attributes
.This ensures proper metadata filtering, severity mapping, and structured log enrichment before exporting.
Check the debug output: For both the Agent and Gateway confirm that com.splunk/source
and os.type
have been removed:
Resource attributes:
-> com.splunk.sourcetype: Str(quotes)
-> host.name: Str(workshop-instance)
-> otelcol.service.mode: Str(agent)
Resource attributes:
-> com.splunk.source: Str(./quotes.log)
-> com.splunk.sourcetype: Str(quotes)
-> host.name: Str(workshop-instance)
-> os.type: Str(linux)
-> otelcol.service.mode: Str(agent)
For both the Agent and Gateway confirm that SeverityText
and SeverityNumber
in the LogRecord
is now defined with the severity level
from the log body. Confirm that the JSON fields from the body can be accessed as top-level log Attributes
:
<snip>
SeverityText: WARN
SeverityNumber: Warn(13)
Body: Str({"level":"WARN","message":"Your focus determines your reality.","movie":"SW","timestamp":"2025-03-07 11:17:26"})
Attributes:
-> log.file.path: Str(quotes.log)
-> level: Str(WARN)
-> message: Str(Your focus determines your reality.)
-> movie: Str(SW)
-> timestamp: Str(2025-03-07 11:17:26)
</snip>
<snip>
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str({"level":"WARN","message":"Your focus determines your reality.","movie":"SW","timestamp":"2025-03-07 11:17:26"})
Attributes:
-> log.file.path: Str(quotes.log)
</snip>
Check file output: In the new gateway-logs.out
file verify the data has been transformed:
jq '[.resourceLogs[].scopeLogs[].logRecords[] | {severityText, severityNumber, body: .body.stringValue}]' gateway-logs.out
[
{
"severityText": "DEBUG",
"severityNumber": 5,
"body": "{\"level\":\"DEBUG\",\"message\":\"All we have to decide is what to do with the time that is given us.\",\"movie\":\"LOTR\",\"timestamp\":\"2025-03-07 11:56:29\"}"
},
{
"severityText": "WARN",
"severityNumber": 13,
"body": "{\"level\":\"WARN\",\"message\":\"The Force will be with you. Always.\",\"movie\":\"SW\",\"timestamp\":\"2025-03-07 11:56:29\"}"
},
{
"severityText": "ERROR",
"severityNumber": 17,
"body": "{\"level\":\"ERROR\",\"message\":\"One does not simply walk into Mordor.\",\"movie\":\"LOTR\",\"timestamp\":\"2025-03-07 11:56:29\"}"
},
{
"severityText": "DEBUG",
"severityNumber": 5,
"body": "{\"level\":\"DEBUG\",\"message\":\"Do or do not, there is no try.\",\"movie\":\"SW\",\"timestamp\":\"2025-03-07 11:56:29\"}"
}
]
[
{
"severityText": "ERROR",
"severityNumber": 17,
"body": "{\"level\":\"ERROR\",\"message\":\"There is some good in this world, and it's worth fighting for.\",\"movie\":\"LOTR\",\"timestamp\":\"2025-03-07 11:56:29\"}"
}
]
Stop the Agent and the Gateway processes by pressing Ctrl-C
in their respective terminals.
The Routing Connector in OpenTelemetry is a powerful feature that allows you to direct data (traces
, metrics
, or logs
) to different pipelines/destinations based on specific criteria. This is especially useful in scenarios where you want to apply different processing or exporting logic to subsets of your telemetry data.
For example, you might want to send production data to one exporter while directing test or development data to another. Similarly, you could route certain spans based on their attributes, such as service name, environment, or span name, to apply custom processing or storage logic.
Change ALL terminal windows to the 6-routing-data
directory and run the clear
command.
Copy *.yaml
from the 5-transform-data
directory into 6-routing-data
. Your updated directory structure will now look like this:
.
βββ agent.yaml
βββ gateway.yaml
Next, we will configure the routing connector and the respective pipelines.
In this exercise, you will configure the Routing Connector in the gateway.yaml
. The Routing Connector can route metrics, traces, and logs based on any attributes, we will focus exclusively on trace routing based on the deployment.environment
attribute (though any span/log/metirc attribute can be used).
Add new file
exporters: The routing
connector requires different targets for routing. Create two new file exporters, file/traces/route1-regular
and file/traces/route2-security
, to ensure data is directed correctly in the exporters
section of the gateway.yaml
:
file/traces/route1-regular: # Exporter for regular traces
path: "./gateway-traces-route1-regular.out" # Path for saving trace data
append: false # Overwrite the file each time
file/traces/route2-security: # Exporter for security traces
path: "./gateway-traces-route2-security.out" # Path for saving trace data
append: false # Overwrite the file each time
Enable Routing by adding the routing
connector. In OpenTelemetry configuration files, connectors
have their own dedicated section, similar to receivers and processors.
In the Gateway terminal window, edit gateway.yaml
and find and uncomment the #connectors:
section. Then, add the following below the connectors:
section:
routing:
default_pipelines: [traces/route1-regular] # Default pipeline if no rule matches
error_mode: ignore # Ignore errors in routing
table: # Define routing rules
# Routes spans to a target pipeline if the resourceSpan attribute matches the rule
- statement: route() where attributes["deployment.environment"] == "security-applications"
pipelines: [traces/route2-security] # Security target pipeline
The default pipeline in the configuration file works at a Catch all. It will be the routing target for any data (spans in our case) that do not match a rule in the routing rules table, In this table you find the pipeline that is the target for any span that matches the following rule: ["deployment.environment"] == "security-applications"
With the routing
configuration complete, the next step is to configure the pipelines
to apply these routing rules.
Update the original traces
pipeline to use routing:
To enable routing
, update the original traces
pipeline to use routing
as the only exporter. This ensures all span data is sent through the Routing Connector for evaluation and then onwards to connected pipelines. Also, remove all processors and replace it with an empty array ([]
) as this will now behandeld in the traces/route1-regular
and traces/route2-security
pipelines, allowing for custom behaviour for each route. Your traces:
configuration should look like this:
traces: # Traces pipeline
receivers:
- otlp # OTLP receiver
processors: [] # Processors for traces
exporters:
- routing
Add both the route1-regular
and route2-security
traces pipelines below the existing traces
pipeline:
Configure Route1-regular pipeline: This pipeline will handle all spans that have no match in the routing table in the connector.
Notice this uses routing
as its only receiver and will recieve data thought its connection
from the original traces pipeline.
traces/route1-regular: # Default pipeline for unmatched spans
receivers:
- routing # Receive data from the routing connector
processors:
- memory_limiter # Memory Limiter Processor
- resource/add_mode # Adds collector mode metadata
- batch
exporters:
- debug # Debug Exporter
- file/traces/route1-regular # File Exporter for unmatched spans
Add the route2-security pipeline: This pipeline processes all spans that do match our rule "[deployment.environment"] == "security-applications"
in the the routing rule. This pipeline is also using routing
as its receiver. Add this pipline below the traces/route1-regular
one.
traces/route2-security: # Default pipeline for unmatched spans
receivers:
- routing # Receive data from the routing connector
processors:
- memory_limiter # Memory Limiter Processor
- resource/add_mode # Adds collector mode metadata
- batch
exporters:
- debug # Debug exporter
- file/traces/route2-security # File exporter for unmatched spans
In this section, we will test the routing
rule configured for the Gateway. The expected result is that a span
generated by the loadgen
that match the "[deployment.environment"] == "security-applications"
rule will be sent to the gateway-traces-route2-security.out
file.
Start the Gateway: In your Gateway terminal window start the Gateway.
../otelcol --config gateway.yaml
Start the Agent: In your Agent terminal window start the Agent.
../otelcol --config agent.yaml
Send a Regular Span: In the Loadgen terminal window send a regular span using the loadgen
:
../loadgen -count 1
Both the Agent and Gateway will display debug information. The gateway will also generate a new gateway-traces-route1-regular.out
file, as this is now the designated destination for regular spans.
If you check gateway-traces-route1-regular.out
, it will contain the span
sent by loadgen
. You will also see an empty gateway-traces-route2-security..out
file, as the routing configuration creates output files immediately, even if no matching spans have been processed yet.
Send a Security Span: In the Loadgen terminal window send a security span using the security
flag:
../loadgen -security -count 1
Again, both the Agent and Gateway should display debug information, including the span you just sent. This time, the Gateway will write a line to the gateway-traces-route2-security.out
file, which is designated for spans where the deployment.environment
resource attribute matches "security-applications"
.
jq -c '.resourceSpans[] as $resource | $resource.scopeSpans[].spans[] | {spanId: .spanId, deploymentEnvironment: ($resource.resource.attributes[] | select(.key == "deployment.environment") | .value.stringValue)}' gateway-traces-route2-security.out
{"spanId":"cb799e92e26d5782","deploymentEnvironment":"security-applications"}
You can repeat this scenario multiple times, and each trace will be written to its corresponding output file.
Stop the Agent and the Gateway processes by pressing Ctrl-C
in their respective terminals.
In this section, we successfully tested the routing connector in the gateway by sending different spans and verifying their destinations.
Regular spans were correctly routed to gateway-traces-route1-regular.out
, confirming that spans without a matching deployment.environment
attribute follow the default pipeline.
Security-related spans were routed to gateway-traces-route2-security.out
, demonstrating that the routing rule based on "deployment.environment": "security-applications"
works as expected.
By inspecting the output files, we confirmed that the OpenTelemetry Collector correctly evaluates span attributes and routes them to the appropriate destinations. This validates that routing rules can effectively separate and direct telemetry data for different use cases.
You can now extend this approach by defining additional routing rules to further categorize spans, metrics, and logs based on different attributes.