Troubleshoot OpenTelemetry Collector Issues
20 minutesIn the previous section, we added the debug exporter to the collector configuration, and made it part of the pipeline for traces and logs. We see the debug output written to the agent collector logs as expected.
However, traces are no longer sent to o11y cloud. Let’s figure out why and fix it.
Review the Collector Config
Whenever a change to the collector config is made via a values.yaml
file, it’s helpful
to review the actual configuration applied to the collector by looking at the config map:
kubectl describe cm splunk-otel-collector-otel-agent
Let’s review the pipelines for logs and traces in the agent collector config. They should look like this:
pipelines:
logs:
exporters:
- debug
processors:
- memory_limiter
- k8sattributes
- filter/logs
- batch
- resourcedetection
- resource
- resource/logs
- resource/add_environment
receivers:
- filelog
- fluentforward
- otlp
...
traces:
exporters:
- debug
processors:
- memory_limiter
- k8sattributes
- batch
- resourcedetection
- resource
- resource/add_environment
receivers:
- otlp
- jaeger
- smartagent/signalfx-forwarder
- zipkin
Do you see the problem? Only the debug exporter is included in the traces and logs pipelines.
The otlphttp
and signalfx
exporters that were present in the traces pipeline configuration previously are gone.
This is why we no longer see traces in o11y cloud. And for the logs pipeline, the splunk_hec/platform_logs
exporter has been removed.
How did we know what specific exporters were included before? To find out, we could have reverted our earlier customizations and then checked the config map to see what was in the traces pipeline originally. Alternatively, we can refer to the examples in the GitHub repo for splunk-otel-collector-chart which shows us what default agent config is used by the Helm chart.
How did these exporters get removed?
Let’s review the customizations we added to the values.yaml
file:
logsEngine: otel
splunkObservability:
infrastructureMonitoringEventsEnabled: true
agent:
config:
receivers:
...
exporters:
debug:
verbosity: detailed
service:
pipelines:
traces:
exporters:
- debug
logs:
exporters:
- debug
When we applied the values.yaml
file to the collector using helm upgrade
, the
custom configuration got merged with the previous collector configuration.
When this happens, the sections of the yaml
configuration that contain lists,
such as the list of exporters in the pipeline section, get replaced with what we
included in the values.yaml
file (which was only the debug exporter).
Let’s Fix the Issue
So when customizing an existing pipeline, we need to fully redefine that part of the configuration.
Our values.yaml
file should thus be updated as follows:
logsEngine: otel
splunkObservability:
infrastructureMonitoringEventsEnabled: true
agent:
config:
receivers:
...
exporters:
debug:
verbosity: detailed
service:
pipelines:
traces:
exporters:
- otlphttp
- signalfx
- debug
logs:
exporters:
- splunk_hec/platform_logs
- debug
Let’s apply the changes:
helm upgrade splunk-otel-collector \
--set="splunkObservability.realm=$REALM" \
--set="splunkObservability.accessToken=$ACCESS_TOKEN" \
--set="clusterName=$INSTANCE-cluster" \
--set="environment=otel-$INSTANCE" \
--set="splunkPlatform.token=$HEC_TOKEN" \
--set="splunkPlatform.endpoint=$HEC_URL" \
--set="splunkPlatform.index=splunk4rookies-workshop" \
-f values.yaml \
splunk-otel-collector-chart/splunk-otel-collector
And then check the agent config map:
kubectl describe cm splunk-otel-collector-otel-agent
This time, we should see a fully defined exporters pipeline for both logs and traces:
pipelines:
logs:
exporters:
- splunk_hec/platform_logs
- debug
processors:
...
traces:
exporters:
- otlphttp
- signalfx
- debug
processors:
...
Reviewing the Log Output
The Splunk Distribution of OpenTelemetry .NET automatically exports logs enriched with tracing context
from applications that use Microsoft.Extensions.Logging
for logging (which our sample app does).
Application logs are enriched with tracing metadata and then exported to a local instance of the OpenTelemetry Collector in OTLP format.
Let’s take a closer look at the logs that were captured by the debug exporter to see if that’s happening.
To tail the collector logs, we can use the following command:
kubectl logs -l component=otel-collector-agent -f
Once we’re tailing the logs, we can use curl to generate some more traffic. Then we should see something like the following:
2024-12-20T21:56:30.858Z info Logs {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-12-20T21:56:30.858Z info ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
Resource attributes:
-> splunk.distro.version: Str(1.8.0)
-> telemetry.distro.name: Str(splunk-otel-dotnet)
-> telemetry.distro.version: Str(1.8.0)
-> os.type: Str(linux)
-> os.description: Str(Debian GNU/Linux 12 (bookworm))
-> os.build_id: Str(6.8.0-1021-aws)
-> os.name: Str(Debian GNU/Linux)
-> os.version: Str(12)
-> host.name: Str(derek-1)
-> process.owner: Str(app)
-> process.pid: Int(1)
-> process.runtime.description: Str(.NET 8.0.11)
-> process.runtime.name: Str(.NET)
-> process.runtime.version: Str(8.0.11)
-> container.id: Str(5bee5b8f56f4b29f230ffdd183d0367c050872fefd9049822c1ab2aa662ba242)
-> telemetry.sdk.name: Str(opentelemetry)
-> telemetry.sdk.language: Str(dotnet)
-> telemetry.sdk.version: Str(1.9.0)
-> service.name: Str(helloworld)
-> deployment.environment: Str(otel-derek-1)
-> k8s.node.name: Str(derek-1)
-> k8s.cluster.name: Str(derek-1-cluster)
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope HelloWorldController
LogRecord #0
ObservedTimestamp: 2024-12-20 21:56:28.486804 +0000 UTC
Timestamp: 2024-12-20 21:56:28.486804 +0000 UTC
SeverityText: Information
SeverityNumber: Info(9)
Body: Str(/hello endpoint invoked by {name})
Attributes:
-> name: Str(Kubernetes)
Trace ID: 78db97a12b942c0252d7438d6b045447
Span ID: 5e9158aa42f96db3
Flags: 1
{"kind": "exporter", "data_type": "logs", "name": "debug"}
In this example, we can see that the Trace ID and Span ID were automatically written to the log output by the OpenTelemetry .NET instrumentation. This allows us to correlate logs with traces in Splunk Observability Cloud.
You might remember though that if we deploy the OpenTelemetry collector in a K8s cluster using Helm, and we include the log collection option, then the OpenTelemetry collector will use the File Log receiver to automatically capture any container logs.
This would result in duplicate logs being captured for our application. For example, in the following screenshot we can see two log entries for each request made to our service:
How do we avoid this?
Avoiding Duplicate Logs in K8s
To avoid capturing duplicate logs, we have one of two options:
- We can set the
OTEL_LOGS_EXPORTER
environment variable tonone
, to tell the Splunk Distribution of OpenTelemetry .NET to avoid exporting logs to the collector using OTLP. - We can manage log ingestion using annotations.
Option 1
Setting the OTEL_LOGS_EXPORTER
environment variable to none
is straightforward. However, the Trace ID and Span ID are not written to the stdout logs generated by the application,
which would prevent us from correlating logs with traces.
To resolve this, we could define a custom logger, such as the example defined in/home/splunk/workshop/docker-k8s-otel/helloworld/SplunkTelemetryConfigurator.cs
.
We could include this in our application by updating the Program.cs
file as follows:
using SplunkTelemetry;
using Microsoft.Extensions.Logging.Console;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddControllers();
SplunkTelemetryConfigurator.ConfigureLogger(builder.Logging);
var app = builder.Build();
app.MapControllers();
app.Run();
Option 2
Option 2 requires updating the deployment manifest for the application
to include an annotation. In our case, we would edit the deployment.yaml
file to add the
splunk.com/exclude
annotation as follows:
apiVersion: apps/v1
kind: Deployment
metadata:
name: helloworld
spec:
selector:
matchLabels:
app: helloworld
replicas: 1
template:
metadata:
labels:
app: helloworld
annotations:
splunk.com/exclude: "true"
spec:
containers:
...
Please refer to Managing Log Ingestion by Using Annotations for further details on this option.