Troubleshoot OpenTelemetry Collector Issues

20 minutes  

In the previous section, we added the debug exporter to the collector configuration, and made it part of the pipeline for traces and logs. We see the debug output written to the agent collector logs as expected.

However, traces are no longer sent to o11y cloud. Let’s figure out why and fix it.

Review the Collector Config

Whenever a change to the collector config is made via a values.yaml file, it’s helpful to review the actual configuration applied to the collector by looking at the config map:

kubectl describe cm splunk-otel-collector-otel-agent

Let’s review the pipelines for logs and traces in the agent collector config. They should look like this:

  pipelines:
    logs:
      exporters:
      - debug
      processors:
      - memory_limiter
      - k8sattributes
      - filter/logs
      - batch
      - resourcedetection
      - resource
      - resource/logs
      - resource/add_environment
      receivers:
      - filelog
      - fluentforward
      - otlp
    ...
    traces:
      exporters:
      - debug
      processors:
      - memory_limiter
      - k8sattributes
      - batch
      - resourcedetection
      - resource
      - resource/add_environment
      receivers:
      - otlp
      - jaeger
      - smartagent/signalfx-forwarder
      - zipkin

Do you see the problem? Only the debug exporter is included in the traces and logs pipelines. The otlphttp and signalfx exporters that were present in the traces pipeline configuration previously are gone. This is why we no longer see traces in o11y cloud. And for the logs pipeline, the splunk_hec/platform_logs exporter has been removed.

How did we know what specific exporters were included before? To find out, we could have reverted our earlier customizations and then checked the config map to see what was in the traces pipeline originally. Alternatively, we can refer to the examples in the GitHub repo for splunk-otel-collector-chart which shows us what default agent config is used by the Helm chart.

How did these exporters get removed?

Let’s review the customizations we added to the values.yaml file:

logsEngine: otel
splunkObservability:
  infrastructureMonitoringEventsEnabled: true
agent:
  config:
    receivers:
     ...
    exporters:
      debug:
        verbosity: detailed
    service:
      pipelines:
        traces:
          exporters:
            - debug
        logs:
          exporters:
            - debug

When we applied the values.yaml file to the collector using helm upgrade, the custom configuration got merged with the previous collector configuration. When this happens, the sections of the yaml configuration that contain lists, such as the list of exporters in the pipeline section, get replaced with what we included in the values.yaml file (which was only the debug exporter).

Let’s Fix the Issue

So when customizing an existing pipeline, we need to fully redefine that part of the configuration. Our values.yaml file should thus be updated as follows:

logsEngine: otel
splunkObservability:
  infrastructureMonitoringEventsEnabled: true
agent:
  config:
    receivers:
     ...
    exporters:
      debug:
        verbosity: detailed
    service:
      pipelines:
        traces:
          exporters:
            - otlphttp
            - signalfx
            - debug
        logs:
          exporters:
            - splunk_hec/platform_logs
            - debug

Let’s apply the changes:

helm upgrade splunk-otel-collector \
  --set="splunkObservability.realm=$REALM" \
  --set="splunkObservability.accessToken=$ACCESS_TOKEN" \
  --set="clusterName=$INSTANCE-cluster" \
  --set="environment=otel-$INSTANCE" \
  --set="splunkPlatform.token=$HEC_TOKEN" \
  --set="splunkPlatform.endpoint=$HEC_URL" \
  --set="splunkPlatform.index=splunk4rookies-workshop" \
  -f values.yaml \
splunk-otel-collector-chart/splunk-otel-collector

And then check the agent config map:

kubectl describe cm splunk-otel-collector-otel-agent

This time, we should see a fully defined exporters pipeline for both logs and traces:

  pipelines:
    logs:
      exporters:
      - splunk_hec/platform_logs
      - debug
      processors:
      ...
    traces:
      exporters:
      - otlphttp
      - signalfx
      - debug
      processors:
      ...

Reviewing the Log Output

The Splunk Distribution of OpenTelemetry .NET automatically exports logs enriched with tracing context from applications that use Microsoft.Extensions.Logging for logging (which our sample app does).

Application logs are enriched with tracing metadata and then exported to a local instance of the OpenTelemetry Collector in OTLP format.

Let’s take a closer look at the logs that were captured by the debug exporter to see if that’s happening.
To tail the collector logs, we can use the following command:

kubectl logs -l component=otel-collector-agent -f

Once we’re tailing the logs, we can use curl to generate some more traffic. Then we should see something like the following:

2024-12-20T21:56:30.858Z	info	Logs	{"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-12-20T21:56:30.858Z	info	ResourceLog #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
Resource attributes:
     -> splunk.distro.version: Str(1.8.0)
     -> telemetry.distro.name: Str(splunk-otel-dotnet)
     -> telemetry.distro.version: Str(1.8.0)
     -> os.type: Str(linux)
     -> os.description: Str(Debian GNU/Linux 12 (bookworm))
     -> os.build_id: Str(6.8.0-1021-aws)
     -> os.name: Str(Debian GNU/Linux)
     -> os.version: Str(12)
     -> host.name: Str(derek-1)
     -> process.owner: Str(app)
     -> process.pid: Int(1)
     -> process.runtime.description: Str(.NET 8.0.11)
     -> process.runtime.name: Str(.NET)
     -> process.runtime.version: Str(8.0.11)
     -> container.id: Str(5bee5b8f56f4b29f230ffdd183d0367c050872fefd9049822c1ab2aa662ba242)
     -> telemetry.sdk.name: Str(opentelemetry)
     -> telemetry.sdk.language: Str(dotnet)
     -> telemetry.sdk.version: Str(1.9.0)
     -> service.name: Str(helloworld)
     -> deployment.environment: Str(otel-derek-1)
     -> k8s.node.name: Str(derek-1)
     -> k8s.cluster.name: Str(derek-1-cluster)
ScopeLogs #0
ScopeLogs SchemaURL: 
InstrumentationScope HelloWorldController 
LogRecord #0
ObservedTimestamp: 2024-12-20 21:56:28.486804 +0000 UTC
Timestamp: 2024-12-20 21:56:28.486804 +0000 UTC
SeverityText: Information
SeverityNumber: Info(9)
Body: Str(/hello endpoint invoked by {name})
Attributes:
     -> name: Str(Kubernetes)
Trace ID: 78db97a12b942c0252d7438d6b045447
Span ID: 5e9158aa42f96db3
Flags: 1
	{"kind": "exporter", "data_type": "logs", "name": "debug"}

In this example, we can see that the Trace ID and Span ID were automatically written to the log output by the OpenTelemetry .NET instrumentation. This allows us to correlate logs with traces in Splunk Observability Cloud.

You might remember though that if we deploy the OpenTelemetry collector in a K8s cluster using Helm, and we include the log collection option, then the OpenTelemetry collector will use the File Log receiver to automatically capture any container logs.

This would result in duplicate logs being captured for our application. For example, in the following screenshot we can see two log entries for each request made to our service:

Duplicate Log Entries Duplicate Log Entries

How do we avoid this?

Avoiding Duplicate Logs in K8s

To avoid capturing duplicate logs, we can set the OTEL_LOGS_EXPORTER environment variable to none, to tell the Splunk Distribution of OpenTelemetry .NET to avoid exporting logs to the collector using OTLP. We can do this by adding the OTEL_LOGS_EXPORTER environment variabl to the deployment.yaml file:

          env:
            - name: PORT
              value: "8080"
            - name: NODE_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://$(NODE_IP):4318"
            - name: OTEL_SERVICE_NAME
              value: "helloworld"
            - name: OTEL_RESOURCE_ATTRIBUTES 
              value: "deployment.environment=otel-$INSTANCE" 
            - name: OTEL_LOGS_EXPORTER 
              value: "none" 

And then running:

# update the deployment
kubectl apply -f deployment.yaml

Setting the OTEL_LOGS_EXPORTER environment variable to none is straightforward. However, the Trace ID and Span ID are not written to the stdout logs generated by the application, which would prevent us from correlating logs with traces.

To resolve this, we will need to define a custom logger, such as the example defined in
/home/splunk/workshop/docker-k8s-otel/helloworld/SplunkTelemetryConfigurator.cs.

We could include this in our application by updating the Program.cs file as follows:

using SplunkTelemetry;
using Microsoft.Extensions.Logging.Console;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddControllers();

SplunkTelemetryConfigurator.ConfigureLogger(builder.Logging);

var app = builder.Build();

app.MapControllers();

app.Run();

Then we’ll build a new Docker image that includes the custom logging configuration:

cd /home/splunk/workshop/docker-k8s-otel/helloworld 

docker build -t helloworld:1.3 .

And then we’ll import the updated image into Kubernetes:

cd /home/splunk

# Export the image from docker
docker save --output helloworld.tar helloworld:1.3

# Import the image into k3s
sudo k3s ctr images import helloworld.tar

Finally, we’ll need to update the `deployment.yaml’ file to use the 1.3 version of the container image:

    spec:
      containers:
        - name: helloworld
          image: docker.io/library/helloworld:1.3

And then apply the changes:

# update the deployment
kubectl apply -f deployment.yaml

Now we can see that the duplicate log entries have been eliminated. And the remaining log entries have been formatted as JSON, and include the span and trace IDs:

JSON Format Logs JSON Format Logs