Troubleshoot OpenTelemetry Collector Issues
20 minutesIn the previous section, we added the debug exporter to the collector configuration, and made it part of the pipeline for traces and logs. We see the debug output written to the agent collector logs as expected.
However, traces are no longer sent to o11y cloud. Let’s figure out why and fix it.
Review the Collector Config
Whenever a change to the collector config is made via a values.yaml
file, it’s helpful
to review the actual configuration applied to the collector by looking at the config map:
kubectl describe cm splunk-otel-collector-otel-agent
Let’s review the pipelines for logs and traces in the agent collector config. They should look like this:
pipelines:
logs:
exporters:
- debug
processors:
- memory_limiter
- k8sattributes
- filter/logs
- batch
- resourcedetection
- resource
- resource/logs
- resource/add_environment
receivers:
- filelog
- fluentforward
- otlp
...
traces:
exporters:
- debug
processors:
- memory_limiter
- k8sattributes
- batch
- resourcedetection
- resource
- resource/add_environment
receivers:
- otlp
- jaeger
- smartagent/signalfx-forwarder
- zipkin
Do you see the problem? Only the debug exporter is included in the traces and logs pipelines.
The otlphttp
and signalfx
exporters that were present in the traces pipeline configuration previously are gone.
This is why we no longer see traces in o11y cloud. And for the logs pipeline, the splunk_hec/platform_logs
exporter has been removed.
How did we know what specific exporters were included before? To find out, we could have reverted our earlier customizations and then checked the config map to see what was in the traces pipeline originally. Alternatively, we can refer to the examples in the GitHub repo for splunk-otel-collector-chart which shows us what default agent config is used by the Helm chart.
How did these exporters get removed?
Let’s review the customizations we added to the values.yaml
file:
logsEngine: otel
splunkObservability:
infrastructureMonitoringEventsEnabled: true
agent:
config:
receivers:
...
exporters:
debug:
verbosity: detailed
service:
pipelines:
traces:
exporters:
- debug
logs:
exporters:
- debug
When we applied the values.yaml
file to the collector using helm upgrade
, the
custom configuration got merged with the previous collector configuration.
When this happens, the sections of the yaml
configuration that contain lists,
such as the list of exporters in the pipeline section, get replaced with what we
included in the values.yaml
file (which was only the debug exporter).
Let’s Fix the Issue
So when customizing an existing pipeline, we need to fully redefine that part of the configuration.
Our values.yaml
file should thus be updated as follows:
logsEngine: otel
splunkObservability:
infrastructureMonitoringEventsEnabled: true
agent:
config:
receivers:
...
exporters:
debug:
verbosity: detailed
service:
pipelines:
traces:
exporters:
- otlphttp
- signalfx
- debug
logs:
exporters:
- splunk_hec/platform_logs
- debug
Let’s apply the changes:
helm upgrade splunk-otel-collector \
--set="splunkObservability.realm=$REALM" \
--set="splunkObservability.accessToken=$ACCESS_TOKEN" \
--set="clusterName=$INSTANCE-cluster" \
--set="environment=otel-$INSTANCE" \
--set="splunkPlatform.token=$HEC_TOKEN" \
--set="splunkPlatform.endpoint=$HEC_URL" \
--set="splunkPlatform.index=splunk4rookies-workshop" \
-f values.yaml \
splunk-otel-collector-chart/splunk-otel-collector
And then check the agent config map:
kubectl describe cm splunk-otel-collector-otel-agent
This time, we should see a fully defined exporters pipeline for both logs and traces:
pipelines:
logs:
exporters:
- splunk_hec/platform_logs
- debug
processors:
...
traces:
exporters:
- otlphttp
- signalfx
- debug
processors:
...
Reviewing the Log Output
The Splunk Distribution of OpenTelemetry .NET automatically exports logs enriched with tracing context
from applications that use Microsoft.Extensions.Logging
for logging (which our sample app does).
Application logs are enriched with tracing metadata and then exported to a local instance of the OpenTelemetry Collector in OTLP format.
Let’s take a closer look at the logs that were captured by the debug exporter to see if that’s happening.
To tail the collector logs, we can use the following command:
kubectl logs -l component=otel-collector-agent -f
Once we’re tailing the logs, we can use curl to generate some more traffic. Then we should see something like the following:
2024-12-20T21:56:30.858Z info Logs {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-12-20T21:56:30.858Z info ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
Resource attributes:
-> splunk.distro.version: Str(1.8.0)
-> telemetry.distro.name: Str(splunk-otel-dotnet)
-> telemetry.distro.version: Str(1.8.0)
-> os.type: Str(linux)
-> os.description: Str(Debian GNU/Linux 12 (bookworm))
-> os.build_id: Str(6.8.0-1021-aws)
-> os.name: Str(Debian GNU/Linux)
-> os.version: Str(12)
-> host.name: Str(derek-1)
-> process.owner: Str(app)
-> process.pid: Int(1)
-> process.runtime.description: Str(.NET 8.0.11)
-> process.runtime.name: Str(.NET)
-> process.runtime.version: Str(8.0.11)
-> container.id: Str(5bee5b8f56f4b29f230ffdd183d0367c050872fefd9049822c1ab2aa662ba242)
-> telemetry.sdk.name: Str(opentelemetry)
-> telemetry.sdk.language: Str(dotnet)
-> telemetry.sdk.version: Str(1.9.0)
-> service.name: Str(helloworld)
-> deployment.environment: Str(otel-derek-1)
-> k8s.node.name: Str(derek-1)
-> k8s.cluster.name: Str(derek-1-cluster)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope HelloWorldController
LogRecord #0
ObservedTimestamp: 2024-12-20 21:56:28.486804 +0000 UTC
Timestamp: 2024-12-20 21:56:28.486804 +0000 UTC
SeverityText: Information
SeverityNumber: Info(9)
Body: Str(/hello endpoint invoked by {name})
Attributes:
-> name: Str(Kubernetes)
Trace ID: 78db97a12b942c0252d7438d6b045447
Span ID: 5e9158aa42f96db3
Flags: 1
{"kind": "exporter", "data_type": "logs", "name": "debug"}
In this example, we can see that the Trace ID and Span ID were automatically written to the log output by the OpenTelemetry .NET instrumentation. This allows us to correlate logs with traces in Splunk Observability Cloud.
You might remember though that if we deploy the OpenTelemetry collector in a K8s cluster using Helm, and we include the log collection option, then the OpenTelemetry collector will use the File Log receiver to automatically capture any container logs.
This would result in duplicate logs being captured for our application. For example, in the following screenshot we can see two log entries for each request made to our service:
How do we avoid this?
Avoiding Duplicate Logs in K8s
To avoid capturing duplicate logs, we can set the OTEL_LOGS_EXPORTER
environment variable to none
,
to tell the Splunk Distribution of OpenTelemetry .NET to avoid exporting logs to the collector using OTLP.
We can do this by adding the OTEL_LOGS_EXPORTER
environment variabl to the deployment.yaml
file:
env:
- name: PORT
value: "8080"
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://$(NODE_IP):4318"
- name: OTEL_SERVICE_NAME
value: "helloworld"
- name: OTEL_RESOURCE_ATTRIBUTES
value: "deployment.environment=otel-$INSTANCE"
- name: OTEL_LOGS_EXPORTER
value: "none"
And then running:
# update the deployment
kubectl apply -f deployment.yaml
Setting the OTEL_LOGS_EXPORTER
environment variable to none
is straightforward. However, the Trace ID
and Span ID are not written to the stdout logs generated by the application,
which would prevent us from correlating logs with traces.
To resolve this, we will need to define a custom logger, such as the example defined in/home/splunk/workshop/docker-k8s-otel/helloworld/SplunkTelemetryConfigurator.cs
.
We could include this in our application by updating the Program.cs
file as follows:
using SplunkTelemetry;
using Microsoft.Extensions.Logging.Console;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddControllers();
SplunkTelemetryConfigurator.ConfigureLogger(builder.Logging);
var app = builder.Build();
app.MapControllers();
app.Run();
Then we’ll build a new Docker image that includes the custom logging configuration:
cd /home/splunk/workshop/docker-k8s-otel/helloworld
docker build -t helloworld:1.3 .
And then we’ll import the updated image into Kubernetes:
cd /home/splunk
# Export the image from docker
docker save --output helloworld.tar helloworld:1.3
# Import the image into k3s
sudo k3s ctr images import helloworld.tar
Finally, we’ll need to update the `deployment.yaml’ file to use the 1.3 version of the container image:
spec:
containers:
- name: helloworld
image: docker.io/library/helloworld:1.3
And then apply the changes:
# update the deployment
kubectl apply -f deployment.yaml
Now we can see that the duplicate log entries have been eliminated. And the remaining log entries have been formatted as JSON, and include the span and trace IDs: