OpenTelemetry Collector Workshops

Learn the concepts of the OpenTelemetry Collector and how to use it to send data to Splunk Observability Cloud.

Practice setting up the OpenTelemetry Collector configuration from scratch and go though several advanced configuration scenarios's.

Making Your Observability Cloud Native With OpenTelemetry

1 hour Author Robert Castley

Abstract

Organizations getting started with OpenTelemetry may begin by sending data directly to an observability backend. While this works well for initial testing, using the OpenTelemetry collector as part of your observability architecture provides numerous benefits and is recommended for any production deployment.

In this workshop, we will be focusing on using the OpenTelemetry collector and starting with the fundamentals of configuring the receivers, processors, and exporters ready to use with Splunk Observability Cloud. The journey will take attendees from novices to being able to start adding custom components to help solve for their business observability needs for their distributed platform.

Ninja Sections

Throughout the workshop there will be expandable Ninja Sections, these will be more hands on and go into further technical detail that you can explore within the workshop or in your own time.

Please note that the content in these sections may go out of date due to the frequent development being made to the OpenTelemetry project. Links will be provided in the event details are out of sync, please let us know if you spot something that needs updating.

Ninja: Test Me!

By completing this workshop you will officially be an OpenTelemetry Collector Ninja!

Target Audience

This interactive workshop is for developers and system administrators who are interested in learning more about architecture and deployment of the OpenTelemetry Collector.

Prerequisites

Attendees should have a basic understanding of data collection
Command line and vim/vi experience.
A instance/host/VM running Ubuntu 20.04 LTS or 22.04 LTS.
- Minimum requirements are an AWS/EC2 t2.micro (1 CPU, 1GB RAM, 8GB Storage)

Learning Objectives

By the end of this talk, attendees will be able to:

Understand the components of OpenTelemetry
Use receivers, processors, and exporters to collect and analyze data
Identify the benefits of using OpenTelemetry
Build a custom component to solve their business needs

OpenTelemetry Architecture

%%{
  init:{
    "theme":"base",
    "themeVariables": {
      "primaryColor": "#ffffff",
      "clusterBkg": "#eff2fb",
      "defaultLinkColor": "#333333"
    }
  }
}%%

flowchart LR;
    subgraph Collector
    A[OTLP] --> M(Receivers)
    B[JAEGER] --> M(Receivers)
    C[Prometheus] --> M(Receivers)
    end
    subgraph Processors
    M(Receivers) --> H(Filters, Attributes, etc)
    E(Extensions)
    end
    subgraph Exporters
    H(Filters, Attributes, etc) --> S(OTLP)
    H(Filters, Attributes, etc) --> T(JAEGER)
    H(Filters, Attributes, etc) --> U(Prometheus)
    end

Installing OpenTelemetry Collector Contrib

Download the OpenTelemetry Collector Contrib distribution

The first step in installing the Open Telemetry Collector is downloading it. For our lab, we will use the wget command to download the .deb package from the OpenTelemetry Github repository.

Obtain the .deb package for your platform from the OpenTelemetry Collector Contrib releases page

wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.111.0/otelcol-contrib_0.111.0_linux_amd64.deb

Install the OpenTelemetry Collector Contrib distribution

Install the .deb package using dpkg. Take a look at the dpkg Output tab below to see what the example output of a successful install will look like:

sudo dpkg -i otelcol-contrib_0.111.0_linux_amd64.deb

Selecting previously unselected package otelcol-contrib.
(Reading database ... 89232 files and directories currently installed.)
Preparing to unpack otelcol-contrib_0.111.0_linux_amd64.deb ...
Unpacking otelcol-contrib (0.111.0) ...
Setting up otelcol-contrib (0.111.0) ...
Created symlink /etc/systemd/system/multi-user.target.wants/otelcol-contrib.service → /lib/systemd/system/otelcol-contrib.service.

Installing OpenTelemetry Collector Contrib

Confirm the Collector is running

The collector should now be running. We will verify this as root using systemctl command. To exit the status just press q.

sudo systemctl status otelcol-contrib

● otelcol-contrib.service - OpenTelemetry Collector Contrib
     Loaded: loaded (/lib/systemd/system/otelcol-contrib.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2024-10-07 10:27:49 BST; 52s ago
   Main PID: 17113 (otelcol-contrib)
      Tasks: 13 (limit: 19238)
     Memory: 34.8M
        CPU: 155ms
     CGroup: /system.slice/otelcol-contrib.service
             └─17113 /usr/bin/otelcol-contrib --config=/etc/otelcol-contrib/config.yaml

Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: Descriptor:
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]:      -> Name: up
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]:      -> Description: The scraping was successful
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]:      -> Unit:
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]:      -> DataType: Gauge
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: NumberDataPoints #0
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: Timestamp: 2024-10-07 09:28:36.942 +0000 UTC
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: Value: 1.000000
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]:         {"kind": "exporter", "data_type": "metrics", "name": "debug"}

Because we will be making multiple configuration file changes, setting environment variables and restarting the collector, we need to stop the collector service and disable it from starting on boot.

sudo systemctl stop otelcol-contrib && sudo systemctl disable otelcol-contrib

Ninja: Build your own collector using Open Telemetry Collector Builder (ocb)

For this part we will require the following installed on your system:

Golang (latest version)

cd /tmp
wget https://golang.org/dl/go1.20.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.20.linux-amd64.tar.gz

Edit .profile and add the following environment variables:

export GOROOT=/usr/local/go
export GOPATH=$HOME/go
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH

Renew your shell session:

source ~/.profile

Check Go version:

go version

ocb installed
- Download the ocb binary from the project releases and run the following commands:
```
mv ocb_0.80.0_darwin_arm64 /usr/bin/ocb
chmod 755 /usr/bin/ocb
```
  An alternative approach would be to use the golang tool chain to build the binary locally by doing:
```
go install go.opentelemetry.io/collector/cmd/builder@v0.80.0
mv $(go env GOPATH)/bin/builder /usr/bin/ocb
```
(Optional) Docker

Why build your own collector?

The default distribution of the collector (core and contrib) either contains too much or too little in what they have to offer.

It is also not advised to run the contrib collector in your production environments due to the amount of components installed which more than likely are not needed by your deployment.

Benefits of building your own collector?

When creating your own collector binaries, (commonly referred to as distribution), means you build what you need.

The benefits of this are:

Smaller sized binaries
Can use existing go scanners for vulnerabilities
Include internal components that can tie in with your organization

Considerations for building your collector?

Now, this would not be a 🥷 Ninja zone if it didn’t come with some drawbacks:

Go experience is recommended if not required
No Splunk support
Responsibility for distribution and lifecycle management

It is important to note that the project is working towards stability but it does not mean changes made will not break your workflow. The team at Splunk provides increased support and a higher level of stability so they can provide a curated experience helping you with your deployment needs.

The Ninja Zone

Once you have all the required tools installed to get started, you will need to create a new file named otelcol-builder.yaml and we will follow this directory structure:

.
└── otelcol-builder.yaml

Once we have the file created, we need to add a list of components for it to install with some additional metadata.

For this example, we are going to create a builder manifest that will install only the components we need for the introduction config:

dist:
  name: otelcol-ninja
  description: A custom build of the Open Telemetry Collector
  output_path: ./dist

extensions:
- gomod: go.opentelemetry.io/collector/extension/ballastextension v0.80.0
- gomod: go.opentelemetry.io/collector/extension/zpagesextension  v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/httpforwarder v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/healthcheckextension v0.80.0

exporters:
- gomod: go.opentelemetry.io/collector/exporter/loggingexporter v0.80.0
- gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/splunkhecexporter v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter v0.80.0

processors:
- gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.80.0
- gomod: go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.80.0

receivers:
- gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/jaegerreceiver v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/zipkinreceiver v0.80.0

Once the yaml file has been updated for the ocb, then run the following command:

ocb --config=otelcol-builder.yaml

Which leave you with the following directory structure:

├── dist
│   ├── components.go
│   ├── components_test.go
│   ├── go.mod
│   ├── go.sum
│   ├── main.go
│   ├── main_others.go
│   ├── main_windows.go
│   └── otelcol-ninja
└── otelcol-builder.yaml

References

https://opentelemetry.io/docs/collector/custom-collector/

Default configuration

OpenTelemetry is configured through YAML files. These files have default configurations that we can modify to meet our needs. Let’s look at the default configuration that is supplied:

cat /etc/otelcol-contrib/config.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:

exporters:
  debug:
    verbosity: detailed

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

Congratulations! You have successfully downloaded and installed the OpenTelemetry Collector. You are well on your way to becoming an OTel Ninja. But first let’s walk through configuration files and different distributions of the OpenTelemetry Collector.

Note

Splunk does provide its own, fully supported, distribution of the OpenTelemetry Collector. This distribution is available to install from the Splunk GitHub Repository or via a wizard in Splunk Observability Cloud that will build out a simple installation script to copy and paste. This distribution includes many additional features and enhancements that are not available in the OpenTelemetry Collector Contrib distribution.

The Splunk Distribution of the OpenTelemetry Collector is production-tested; it is in use by the majority of customers in their production environments.
Customers that use our distribution can receive direct help from official Splunk support within SLAs.
Customers can use or migrate to the Splunk Distribution of the OpenTelemetry Collector without worrying about future breaking changes to its core configuration experience for metrics and traces collection (OpenTelemetry logs collection configuration is in beta). There may be breaking changes to the Collector’s metrics.

We will now walk through each section of the configuration file and modify it to send host metrics to Splunk Observability Cloud.

OpenTelemetry Collector Extensions

Now that we have the OpenTelemetry Collector installed, let’s take a look at extensions for the OpenTelemetry Collector. Extensions are optional and available primarily for tasks that do not involve processing telemetry data. Examples of extensions include health monitoring, service discovery, and data forwarding.

%%{
  init:{
    "theme": "base",
    "themeVariables": {
      "primaryColor": "#ffffff",
      "clusterBkg": "#eff2fb",
      "defaultLinkColor": "#333333"
    }
  }
}%%

flowchart LR;
    style E fill:#e20082,stroke:#333,stroke-width:4px,color:#fff
    subgraph Collector
    A[OTLP] --> M(Receivers)
    B[JAEGER] --> M(Receivers)
    C[Prometheus] --> M(Receivers)
    end
    subgraph Processors
    M(Receivers) --> H(Filters, Attributes, etc)
    E(Extensions)
    end
    subgraph Exporters
    H(Filters, Attributes, etc) --> S(OTLP)
    H(Filters, Attributes, etc) --> T(JAEGER)
    H(Filters, Attributes, etc) --> U(Prometheus)
    end

OpenTelemetry Collector Extensions

Health Check

Extensions are configured in the same config.yaml file that we referenced in the installation step. Let’s edit the config.yaml file and configure the extensions. Note that the pprof and zpages extensions are already configured in the default config.yaml file. For the purpose of this workshop, we will only be updating the health_check extension to expose the port on all network interfaces on which we can access the health of the collector.

sudo vi /etc/otelcol-contrib/config.yaml

extensions:
  health_check:
    endpoint: 0.0.0.0:13133

Start the collector:

otelcol-contrib --config=file:/etc/otelcol-contrib/config.yaml

This extension enables an HTTP URL that can be probed to check the status of the OpenTelemetry Collector. This extension can be used as a liveness and/or readiness probe on Kubernetes. To learn more about the curl command, check out the curl man page.

Open a new terminal session and SSH into your instance to run the following command:

curl http://localhost:13133

{"status":"Server available","upSince":"2024-10-07T11:00:08.004685295+01:00","uptime":"12.56420005s"}

OpenTelemetry Collector Extensions

Performance Profiler

Performance Profiler extension enables the golang net/http/pprof endpoint. This is typically used by developers to collect performance profiles and investigate issues with the service. We will not be covering this in this workshop.

OpenTelemetry Collector Extensions

zPages

zPages are an in-process alternative to external exporters. When included, they collect and aggregate tracing and metrics information in the background; this data is served on web pages when requested. zPages are an extremely useful diagnostic feature to ensure the collector is running as expected.

ServiceZ gives an overview of the collector services and quick access to the pipelinez, extensionz, and featurez zPages. The page also provides build and runtime information.

Example URL: http://localhost:55679/debug/servicez (change localhost to reflect your own environment).

PipelineZ provides insights on the running pipelines running in the collector. You can find information on type, if data is mutated, and you can also see information on the receivers, processors and exporters that are used for each pipeline.

Example URL: http://localhost:55679/debug/pipelinez (change localhost to reflect your own environment).

ExtensionZ shows the extensions that are active in the collector.

Example URL: http://localhost:55679/debug/extensionz (change localhost to reflect your own environment).

Ninja: Improve data durability with storage extension

For this, we will need to validate that our distribution has the file_storage extension installed. This can be done by running the command otelcol-contrib components which should show results like:

# ... truncated for clarity
extensions:
  - file_storage

buildinfo:
    command: otelcol-contrib
    description: OpenTelemetry Collector Contrib
    version: 0.80.0
receivers:
    - prometheus_simple
    - apache
    - influxdb
    - purefa
    - purefb
    - receiver_creator
    - mongodbatlas
    - vcenter
    - snmp
    - expvar
    - jmx
    - kafka
    - skywalking
    - udplog
    - carbon
    - kafkametrics
    - memcached
    - prometheus
    - windowseventlog
    - zookeeper
    - otlp
    - awsecscontainermetrics
    - iis
    - mysql
    - nsxt
    - aerospike
    - elasticsearch
    - httpcheck
    - k8sobjects
    - mongodb
    - hostmetrics
    - signalfx
    - statsd
    - awsxray
    - cloudfoundry
    - collectd
    - couchdb
    - kubeletstats
    - jaeger
    - journald
    - riak
    - splunk_hec
    - active_directory_ds
    - awscloudwatch
    - sqlquery
    - windowsperfcounters
    - flinkmetrics
    - googlecloudpubsub
    - podman_stats
    - wavefront
    - k8s_events
    - postgresql
    - rabbitmq
    - sapm
    - sqlserver
    - redis
    - solace
    - tcplog
    - awscontainerinsightreceiver
    - awsfirehose
    - bigip
    - filelog
    - googlecloudspanner
    - cloudflare
    - docker_stats
    - k8s_cluster
    - pulsar
    - zipkin
    - nginx
    - opencensus
    - azureeventhub
    - datadog
    - fluentforward
    - otlpjsonfile
    - syslog
processors:
    - resource
    - batch
    - cumulativetodelta
    - groupbyattrs
    - groupbytrace
    - k8sattributes
    - experimental_metricsgeneration
    - metricstransform
    - routing
    - attributes
    - datadog
    - deltatorate
    - spanmetrics
    - span
    - memory_limiter
    - redaction
    - resourcedetection
    - servicegraph
    - transform
    - filter
    - probabilistic_sampler
    - tail_sampling
exporters:
    - otlp
    - carbon
    - datadog
    - f5cloud
    - kafka
    - mezmo
    - skywalking
    - awsxray
    - dynatrace
    - loki
    - prometheus
    - logging
    - azuredataexplorer
    - azuremonitor
    - instana
    - jaeger
    - loadbalancing
    - sentry
    - splunk_hec
    - tanzuobservability
    - zipkin
    - alibabacloud_logservice
    - clickhouse
    - file
    - googlecloud
    - prometheusremotewrite
    - awscloudwatchlogs
    - googlecloudpubsub
    - jaeger_thrift
    - logzio
    - sapm
    - sumologic
    - otlphttp
    - googlemanagedprometheus
    - opencensus
    - awskinesis
    - coralogix
    - influxdb
    - logicmonitor
    - signalfx
    - tencentcloud_logservice
    - awsemf
    - elasticsearch
    - pulsar
extensions:
    - zpages
    - bearertokenauth
    - oidc
    - host_observer
    - sigv4auth
    - file_storage
    - memory_ballast
    - health_check
    - oauth2client
    - awsproxy
    - http_forwarder
    - jaegerremotesampling
    - k8s_observer
    - pprof
    - asapclient
    - basicauth
    - headers_setter

This extension provides exporters the ability to queue data to disk in the event that exporter is unable to send data to the configured endpoint.

In order to configure the extension, you will need to update your config to include the information below. First, be sure to create a /tmp/otel-data directory and give it read/write permissions:

extensions:
...
  file_storage:
    directory: /tmp/otel-data
    timeout: 10s
    compaction:
      directory: /tmp/otel-data
      on_start: true
      on_rebound: true
      rebound_needed_threshold_mib: 5
      rebound_trigger_threshold_mib: 3

# ... truncated for clarity

service:
  extensions: [health_check, pprof, zpages, file_storage]

Why queue data to disk?

This allows the collector to weather network interruptions (and even collector restarts) to ensure data is sent to the upstream provider.

Considerations for queuing data to disk?

There is a potential that this could impact data throughput performance due to disk performance.

References

Configuration Check-in

Now that we’ve covered extensions, let’s check our configuration changes.

Check-inReview your configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:

exporters:
  debug:
    verbosity: detailed

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

Now that we have reviewed extensions, let’s dive into the data pipeline portion of the workshop. A pipeline defines a path the data follows in the Collector starting from reception, moving to further processing or modification, and finally exiting the Collector via exporters.

The data pipeline in the OpenTelemetry Collector is made up of receivers, processors, and exporters. We will first start with receivers.

OpenTelemetry Collector Receivers

Welcome to the receiver portion of the workshop! This is the starting point of the data pipeline of the OpenTelemetry Collector. Let’s dive in.

A receiver, which can be push or pull based, is how data gets into the Collector. Receivers may support one or more data sources. Generally, a receiver accepts data in a specified format, translates it into the internal format and passes it to processors and exporters defined in the applicable pipelines.

%%{
  init:{
    "theme":"base",
    "themeVariables": {
      "primaryColor": "#ffffff",
      "clusterBkg": "#eff2fb",
      "defaultLinkColor": "#333333"
    }
  }
}%%

flowchart LR;
    style M fill:#e20082,stroke:#333,stroke-width:4px,color:#fff
    subgraph Collector
    A[OTLP] --> M(Receivers)
    B[JAEGER] --> M(Receivers)
    C[Prometheus] --> M(Receivers)
    end
    subgraph Processors
    M(Receivers) --> H(Filters, Attributes, etc)
    E(Extensions)
    end
    subgraph Exporters
    H(Filters, Attributes, etc) --> S(OTLP)
    H(Filters, Attributes, etc) --> T(JAEGER)
    H(Filters, Attributes, etc) --> U(Prometheus)
    end

OpenTelemetry Collector Receivers

Host Metrics Receiver

The Host Metrics Receiver generates metrics about the host system scraped from various sources. This is intended to be used when the collector is deployed as an agent which is what we will be doing in this workshop.

Let’s update the /etc/otel-contrib/config.yaml file and configure the hostmetrics receiver. Insert the following YAML under the receivers section, taking care to indent by two spaces.

sudo vi /etc/otelcol-contrib/config.yaml

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      # CPU utilization metrics
      cpu:
      # Disk I/O metrics
      disk:
      # File System utilization metrics
      filesystem:
      # Memory utilization metrics
      memory:
      # Network interface I/O metrics & TCP connection metrics
      network:
      # CPU load metrics
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Process count metrics
      processes:
      # Per process CPU, Memory and Disk I/O metrics. Disabled by default.
      # process:

OpenTelemetry Collector Receivers

Prometheus Receiver

You will also notice another receiver called prometheus. Prometheus is an open-source toolkit used by the OpenTelemetry Collector. This receiver is used to scrape metrics from the OpenTelemetry Collector itself. These metrics can then be used to monitor the health of the collector.

Let’s modify the prometheus receiver to clearly show that it is for collecting metrics from the collector itself. By changing the name of the receiver from prometheus to prometheus/internal, it is now much clearer as to what that receiver is doing. Update the configuration file to look like this:

prometheus/internal:
  config:
    scrape_configs:
    - job_name: 'otel-collector'
      scrape_interval: 10s
      static_configs:
      - targets: ['0.0.0.0:8888']

Example Dashboard - Prometheus metrics

The following screenshot shows an example dashboard of some of the metrics the Prometheus internal receiver collects from the OpenTelemetry Collector. Here, we can see accepted and sent spans, metrics and log records.

Note

The following screenshot is an out-of-the-box (OOTB) dashboard from Splunk Observability Cloud that allows you to easily monitor your Splunk OpenTelemetry Collector install base.

OpenTelemetry Collector Receivers

Other Receivers

You will notice in the default configuration there are other receivers: otlp, opencensus, jaeger and zipkin. These are used to receive telemetry data from other sources. We will not be covering these receivers in this workshop and they can be left as they are.

Ninja: Create receivers dynamically

To help observe short lived tasks like docker containers, kubernetes pods, or ssh sessions, we can use the receiver creator with observer extensions to create a new receiver as these services start up.

What do we need?

In order to start using the receiver creator and its associated observer extensions, they will need to be part of your collector build manifest.

See installation for the details.

Things to consider?

Some short lived tasks may require additional configuration such as username, and password. These values can be referenced via environment variables, or use a scheme expand syntax such as ${file:./path/to/database/password}. Please adhere to your organisation’s secret practices when taking this route.

The Ninja Zone

There are only two things needed for this ninja zone:

Make sure you have added receiver creater and observer extensions to the builder manifest.
Create the config that can be used to match against discovered endpoints.

To create the templated configurations, you can do the following:

receiver_creator:
  watch_observers: [host_observer]
  receivers:
    redis:
      rule: type == "port" && port == 6379
      config:
        password: ${env:HOST_REDIS_PASSWORD}

For more examples, refer to these receiver creator’s examples.

Configuration Check-in

We’ve now covered receivers, so let’s now check our configuration changes.

Check-inReview your configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      # CPU utilization metrics
      cpu:
      # Disk I/O metrics
      disk:
      # File System utilization metrics
      filesystem:
      # Memory utilization metrics
      memory:
      # Network interface I/O metrics & TCP connection metrics
      network:
      # CPU load metrics
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Process count metrics
      processes:
      # Per process CPU, Memory and Disk I/O metrics. Disabled by default.
      # process:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus/internal:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:

exporters:
  debug:
    verbosity: detailed

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

Now that we have reviewed how data gets into the OpenTelemetry Collector through receivers, let’s now take a look at how the Collector processes the received data.

Warning

As the /etc/otelcol-contrib/config.yaml is not complete, please do not attempt to restart the collector at this point.

OpenTelemetry Collector Processors

Processors are run on data between being received and being exported. Processors are optional though some are recommended. There are a large number of processors included in the OpenTelemetry contrib Collector.

%%{
  init:{
    "theme":"base",
    "themeVariables": {
      "primaryColor": "#ffffff",
      "clusterBkg": "#eff2fb",
      "defaultLinkColor": "#333333"
    }
  }
}%%

flowchart LR;
    style Processors fill:#e20082,stroke:#333,stroke-width:4px,color:#fff
    subgraph Collector
    A[OTLP] --> M(Receivers)
    B[JAEGER] --> M(Receivers)
    C[Prometheus] --> M(Receivers)
    end
    subgraph Processors
    M(Receivers) --> H(Filters, Attributes, etc)
    E(Extensions)
    end
    subgraph Exporters
    H(Filters, Attributes, etc) --> S(OTLP)
    H(Filters, Attributes, etc) --> T(JAEGER)
    H(Filters, Attributes, etc) --> U(Prometheus)
    end

OpenTelemetry Collector Processors

Batch Processor

By default, only the batch processor is enabled. This processor is used to batch up data before it is exported. This is useful for reducing the number of network calls made to exporters. For this workshop, we will inherit the following defaults which are hard-coded into the Collector:

send_batch_size (default = 8192): Number of spans, metric data points, or log records after which a batch will be sent regardless of the timeout. send_batch_size acts as a trigger and does not affect the size of the batch. If you need to enforce batch size limits sent to the next component in the pipeline see send_batch_max_size.
timeout (default = 200ms): Time duration after which a batch will be sent regardless of size. If set to zero, send_batch_size is ignored as data will be sent immediately, subject to only send_batch_max_size.
send_batch_max_size (default = 0): The upper limit of the batch size. 0 means no upper limit on the batch size. This property ensures that larger batches are split into smaller units. It must be greater than or equal to send_batch_size.

For more information on the Batch processor, see the Batch Processor documentation.

OpenTelemetry Collector Processors

Resource Detection Processor

The resourcedetection processor can be used to detect resource information from the host and append or override the resource value in telemetry data with this information.

By default, the hostname is set to the FQDN if possible, otherwise, the hostname provided by the OS is used as a fallback. This logic can be changed from using using the hostname_sources configuration option. To avoid getting the FQDN and use the hostname provided by the OS, we will set the hostname_sources to os.

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]

If the workshop instance is running on an AWS/EC2 instance we can gather the following tags from the EC2 metadata API (this is not available on other platforms).

cloud.provider ("aws")
cloud.platform ("aws_ec2")
cloud.account.id
cloud.region
cloud.availability_zone
host.id
host.image.id
host.name
host.type

We will create another processor to append these tags to our metrics.

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]
  resourcedetection/ec2:
    detectors: [ec2]

OpenTelemetry Collector Processors

Attributes Processor

The attributes processor modifies attributes of a span, log, or metric. This processor also supports the ability to filter and match input data to determine if they should be included or excluded for specified actions.

It takes a list of actions that are performed in the order specified in the config. The supported actions are:

insert: Inserts a new attribute in input data where the key does not already exist.
update: Updates an attribute in input data where the key does exist.
upsert: Performs insert or update. Inserts a new attribute in input data where the key does not already exist and updates an attribute in input data where the key does exist.
delete: Deletes an attribute from the input data.
hash: Hashes (SHA1) an existing attribute value.
extract: Extracts values using a regular expression rule from the input key to target keys specified in the rule. If a target key already exists, it will be overridden.

We are going to create an attributes processor to insert a new attribute to all our host metrics called participant.name with a value of your name e.g. marge_simpson.

Warning

Ensure you replace INSERT_YOUR_NAME_HERE with your name and also ensure you do not use spaces in your name.

Later on in the workshop, we will use this attribute to filter our metrics in Splunk Observability Cloud.

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]
  resourcedetection/ec2:
    detectors: [ec2]
  attributes/conf:
    actions:
      - key: participant.name
        action: insert
        value: "INSERT_YOUR_NAME_HERE"

Ninja: Using connectors to gain internal insights

One of the most recent additions to the collector was the notion of a connector, which allows you to join the output of one pipeline to the input of another pipeline.

An example of how this is beneficial is that some services emit metrics based on the amount of datapoints being exported, the number of logs containing an error status, or the amount of data being sent from one deployment environment. The count connector helps address this for you out of the box.

Why a connector instead of a processor?

A processor is limited in what additional data it can produce considering it has to pass on the data it has processed making it hard to expose additional information. Connectors do not have to emit the data they receive which means they provide an opportunity to create the insights we are after.

For example, a connector could be made to count the number of logs, metrics, and traces that do not have the deployment environment attribute.

A very simple example with the output of being able to break down data usage by deployment environment.

Considerations with connectors

A connector only accepts data exported from one pipeline and receiver by another pipeline, this means you may have to consider how you construct your collector config to take advantage of it.

References

Configuration Check-in

That’s processors covered, let’s check our configuration changes.

Check-inReview your configuration

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      # CPU utilization metrics
      cpu:
      # Disk I/O metrics
      disk:
      # File System utilization metrics
      filesystem:
      # Memory utilization metrics
      memory:
      # Network interface I/O metrics & TCP connection metrics
      network:
      # CPU load metrics
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Process count metrics
      processes:
      # Per process CPU, Memory and Disk I/O metrics. Disabled by default.
      # process:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus/internal:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]
  resourcedetection/ec2:
    detectors: [ec2]
  attributes/conf:
    actions:
      - key: participant.name
        action: insert
        value: "INSERT_YOUR_NAME_HERE"

exporters:
  debug:
    verbosity: detailed

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

OpenTelemetry Collector Exporters

An exporter, which can be push or pull-based, is how you send data to one or more backends/destinations. Exporters may support one or more data sources.

For this workshop, we will be using the otlphttp exporter. The OpenTelemetry Protocol (OTLP) is a vendor-neutral, standardised protocol for transmitting telemetry data. The OTLP exporter sends data to a server that implements the OTLP protocol. The OTLP exporter supports both gRPC and HTTP/JSON protocols.

%%{
  init:{
    "theme":"base",
    "themeVariables": {
      "primaryColor": "#ffffff",
      "clusterBkg": "#eff2fb",
      "defaultLinkColor": "#333333"
    }
  }
}%%

flowchart LR;
    style Exporters fill:#e20082,stroke:#333,stroke-width:4px,color:#fff
    subgraph Collector
    A[OTLP] --> M(Receivers)
    B[JAEGER] --> M(Receivers)
    C[Prometheus] --> M(Receivers)
    end
    subgraph Processors
    M(Receivers) --> H(Filters, Attributes, etc)
    E(Extensions)
    end
    subgraph Exporters
    H(Filters, Attributes, etc) --> S(OTLP)
    H(Filters, Attributes, etc) --> T(JAEGER)
    H(Filters, Attributes, etc) --> U(Prometheus)
    end

OpenTelemetry Collector Exporters

OTLP HTTP Exporter

To send metrics over HTTP to Splunk Observability Cloud, we will need to configure the otlphttp exporter.

Let’s edit our /etc/otelcol-contrib/config.yaml file and configure the otlphttp exporter. Insert the following YAML under the exporters section, taking care to indent by two spaces e.g.

We will also change the verbosity of the logging exporter to prevent the disk from filling up. The default of detailed is very noisy.

exporters:
  logging:
    verbosity: normal
  otlphttp/splunk:

Next, we need to define the metrics_endpoint and configure the target URL.

Note

If you are an attendee at a Splunk-hosted workshop, the instance you are using has already been configured with a Realm environment variable. We will reference that environment variable in our configuration file. Otherwise, you will need to create a new environment variable and set the Realm e.g.

export REALM="us1"

The URL to use is https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlp. (Splunk has Realms in key geographical locations around the world for data residency).

The otlphttp exporter can also be configured to send traces and logs by defining a target URL for traces_endpoint and logs_endpoint respectively. Configuring these is outside the scope of this workshop.

exporters:
  logging:
    verbosity: normal
  otlphttp/splunk:
    metrics_endpoint: https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlp

By default, gzip compression is enabled for all endpoints. This can be disabled by setting compression: none in the exporter configuration. We will leave compression enabled for this workshop and accept the default as this is the most efficient way to send data.

To send metrics to Splunk Observability Cloud, we need to use an Access Token. This can be done by creating a new token in the Splunk Observability Cloud UI. For more information on how to create a token, see Create a token. The token needs to be of type INGEST.

Note

If you are an attendee at a Splunk-hosted workshop, the instance you are using has already been configured with an Access Token (which has been set as an environment variable). We will reference that environment variable in our configuration file. Otherwise, you will need to create a new token and set it as an environment variable e.g.

export ACCESS_TOKEN=<replace-with-your-token>

The token is defined in the configuration file by inserting X-SF-TOKEN: ${env:ACCESS_TOKEN} under a headers: section:

exporters:
  logging:
    verbosity: normal
  otlphttp/splunk:
    metrics_endpoint: https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlp
    headers:
      X-SF-TOKEN: ${env:ACCESS_TOKEN}

Configuration Check-in

Now that we’ve covered exporters, let’s check our configuration changes:

Check-inReview your configuration

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      # CPU utilization metrics
      cpu:
      # Disk I/O metrics
      disk:
      # File System utilization metrics
      filesystem:
      # Memory utilization metrics
      memory:
      # Network interface I/O metrics & TCP connection metrics
      network:
      # CPU load metrics
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Process count metrics
      processes:
      # Per process CPU, Memory and Disk I/O metrics. Disabled by default.
      # process:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus/internal:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]
  resourcedetection/ec2:
    detectors: [ec2]
  attributes/conf:
    actions:
      - key: participant.name
        action: insert
        value: "INSERT_YOUR_NAME_HERE"

exporters:
  debug:
    verbosity: normal
  otlphttp/splunk:
    metrics_endpoint: https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlp
    headers:
      X-SF-Token: ${env:ACCESS_TOKEN}

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

Of course, you can easily configure the metrics_endpoint to point to any other solution that supports the OTLP protocol.

Next, we need to enable the receivers, processors and exporters we have just configured in the service section of the config.yaml.

OpenTelemetry Collector Service

The Service section is used to configure what components are enabled in the Collector based on the configuration found in the receivers, processors, exporters, and extensions sections.

Info

If a component is configured, but not defined within the Service section then it is not enabled.

The service section consists of three sub-sections:

extensions
pipelines
telemetry

In the default configuration, the extension section has been configured to enable health_check, pprof and zpages, which we configured in the Extensions module earlier.

service:
  extensions: [health_check, pprof, zpages]

So lets configure our Metric Pipeline!

OpenTelemetry Collector Service

Hostmetrics Receiver

If you recall from the Receivers portion of the workshop, we defined the Host Metrics Receiver to generate metrics about the host system, which are scraped from various sources. To enable the receiver, we must include the hostmetrics receiver in the metrics pipeline.

In the metrics pipeline, add hostmetrics to the metrics receivers section.

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

OpenTelemetry Collector Service

Prometheus Internal Receiver

Earlier in the workshop, we also renamed the prometheus receiver to reflect that is was collecting metrics internal to the collector, renaming it to prometheus/internal.

We now need to enable the prometheus/internal receiver under the metrics pipeline. Update the receivers section to include prometheus/internal under the metrics pipeline:

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus/internal]
      processors: [batch]
      exporters: [debug]

OpenTelemetry Collector Service

Resource Detection Processor

We also added resourcedetection/system and resourcedetection/ec2 processors so that the collector can capture the instance hostname and AWS/EC2 metadata. We now need to enable these two processors under the metrics pipeline.

Update the processors section to include resourcedetection/system and resourcedetection/ec2 under the metrics pipeline:

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus/internal]
      processors: [batch, resourcedetection/system, resourcedetection/ec2]
      exporters: [debug]

OpenTelemetry Collector Service

Attributes Processor

Also in the Processors section of this workshop, we added the attributes/conf processor so that the collector will insert a new attribute called participant.name to all the metrics. We now need to enable this under the metrics pipeline.

Update the processors section to include attributes/conf under the metrics pipeline:

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus/internal]
      processors: [batch, resourcedetection/system, resourcedetection/ec2, attributes/conf]
      exporters: [debug]

OpenTelemetry Collector Service

OTLP HTTP Exporter

In the Exporters section of the workshop, we configured the otlphttp exporter to send metrics to Splunk Observability Cloud. We now need to enable this under the metrics pipeline.

Update the exporters section to include otlphttp/splunk under the metrics pipeline:

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus/internal]
      processors: [batch, resourcedetection/system, resourcedetection/ec2, attributes/conf]
      exporters: [debug, otlphttp/splunk]

Ninja: Observing the collector internals

The collector captures internal signals about its behavior this also includes additional signals from running components. The reason for this is that components that make decisions about the flow of data need a way to surface that information as metrics or traces.

Why monitor the collector?

This is somewhat of a chicken and egg problem of, “Who is watching the watcher?”, but it is important that we can surface this information. Another interesting part of the collector’s history is that it existed before the Go metrics’ SDK was considered stable so the collector exposes a Prometheus endpoint to provide this functionality for the time being.

Considerations

Monitoring the internal usage of each running collector in your organization can contribute a significant amount of new Metric Time Series (MTS). The Splunk distribution has curated these metrics for you and would be able to help forecast the expected increases.

The Ninja Zone

To expose the internal observability of the collector, some additional settings can be adjusted:

service:
  telemetry:
    logs:
      level: <info|warn|error>
      development: <true|false>
      encoding: <console|json>
      disable_caller: <true|false>
      disable_stacktrace: <true|false>
      output_paths: [<stdout|stderr>, paths...]
      error_output_paths: [<stdout|stderr>, paths...]
      initial_fields:
        key: value
    metrics:
      level: <none|basic|normal|detailed>
      # Address binds the promethues endpoint to scrape
      address: <hostname:port>

service:
  telemetry:
    logs: 
      level: info
      encoding: json
      disable_stacktrace: true
      initial_fields:
        instance.name: ${env:INSTANCE}
    metrics:
      address: localhost:8888

References

https://opentelemetry.io/docs/collector/configuration/#service

Final configuration

Check-inReview your final configuration

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      # CPU utilization metrics
      cpu:
      # Disk I/O metrics
      disk:
      # File System utilization metrics
      filesystem:
      # Memory utilization metrics
      memory:
      # Network interface I/O metrics & TCP connection metrics
      network:
      # CPU load metrics
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Process count metrics
      processes:
      # Per process CPU, Memory and Disk I/O metrics. Disabled by default.
      # process:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus/internal:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]
  resourcedetection/ec2:
    detectors: [ec2]
  attributes/conf:
    actions:
      - key: participant.name
        action: insert
        value: "INSERT_YOUR_NAME_HERE"

exporters:
  debug:
    verbosity: normal
  otlphttp/splunk:
    metrics_endpoint: https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlp
    headers:
      X-SF-Token: ${env:ACCESS_TOKEN}

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus/internal]
      processors: [batch, resourcedetection/system, resourcedetection/ec2, attributes/conf]
      exporters: [debug, otlphttp/splunk]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

Tip

It is recommended that you validate your configuration file before restarting the collector. You can do this by pasting the contents of your config.yaml file into otelbin.io.

ScreenshotOTelBin

Now that we have a working configuration, let’s start the collector and then check to see what zPages is reporting.

otelcol-contrib --config=file:/etc/otelcol-contrib/config.yaml

Open up zPages in your browser: http://localhost:55679/debug/pipelinez (change localhost to reflect your own environment).

Data Visualisations

Splunk Observability Cloud

Now that we have configured the OpenTelemetry Collector to send metrics to Splunk Observability Cloud, let’s take a look at the data in Splunk Observability Cloud. If you have not received an invite to Splunk Observability Cloud, your instructor will provide you with login credentials.

Before that, let’s make things a little more interesting and run a stress test on the instance. This in turn will light up the dashboards.

sudo apt install stress
while true; do stress -c 2 -t 40; stress -d 5 -t 40; stress -m 20 -t 40; done

Once you are logged into Splunk Observability Cloud, using the left-hand navigation, navigate to Dashboards from the main menu. This will take you to the Teams view. At the top of this view click on All Dashboards :

In the search box, search for OTel Contrib:

Info

If the dashboard does not exist, then your instructor will be able to quickly add it. If you are not attending a Splunk hosted version of this workshop then the Dashboard Group to import can be found at the bottom of this page.

Click on the OTel Contrib Dashboard dashboard to open it, next click in the Participant Name box, at the top of the dashboard, and select the name you configured for participant.name in the config.yaml in the drop-down list or start typing the name to search for it:

You can now see the host metrics for the host upon which you configured the OpenTelemetry Collector.

Download Dashboard Group JSON for importing

OpenTelemetry-Contrib-Dashboard-Group.json (40 KB)

OpenTelemetry Collector Development

Developing a custom component

Building a component for the Open Telemetry Collector requires three key parts:

The Configuration - What values are exposed to the user to configure
The Factory - Make the component using the provided values
The Business Logic - What the component needs to do

For this, we will use the example of building a component that works with Jenkins so that we can track important DevOps metrics of our project(s).

The metrics we are looking to measure are:

Lead time for changes - “How long it takes for a commit to get into production”
Change failure rate - “The percentage of deployments causing a failure in production”
Deployment frequency - “How often a [team] successfully releases to production”
Mean time to recover - “How long does it take for a [team] to recover from a failure in production”

These indicators were identified Google’s DevOps Research and Assesment (DORA)[^1] team to help show performance of a software development team. The reason for choosing Jenkins CI is that we remain in the same Open Source Software ecosystem which we can serve as the example for the vendor managed CI tools to adopt in future.

Instrument Vs Component

There is something to consider when improving level of Observability within your organisation since there are some trade offs that get made.

	Pros	Cons
(Auto) Instrumented	Does not require an external API to be monitored in order to observe the system.	Changing instrumentation requires changes to the project.
	Gives system owners/developers to make changes in their observability.	Requires additional runtime dependancies.
	Understands system context and can corrolate captured data with Exemplars.	Can impact performance of the system.
Component	- Changes to data names or semantics can be rolled out independently of the system’s release cycle.	Breaking API changes require a coordinated release between system and collector.
	Updating/extending data collected is a seemless user facing change.	Captured data semantics can unexpectedly break that does not align with a new system release.
	Does not require the supporting teams to have a deep understanding of observability practice.	Strictly external / exposed information can be surfaced from the system.

OpenTelemetry Collector Development

Project Setup Ninja

Note

The time to finish this section of the workshop can vary depending on experience.

A complete solution can be found here in case you’re stuck or want to follow along with the instructor.

To get started developing the new Jenkins CI receiver, we first need to set up a Golang project. The steps to create your new Golang project is:

Create a new directory named ${HOME}/go/src/jenkinscireceiver and change into it
1. The actual directory name or location is not strict, you can choose your own development directory to make it in.
Initialize the golang module by going go mod init splunk.conf/workshop/example/jenkinscireceiver
1. This will create a file named go.mod which is used to track our direct and indirect dependencies
2. Eventually, there will be a go.sum which is the checksum value of the dependencies being imported.

Check-inReview your go.mod

module splunk.conf/workshop/example/jenkinscireceiver

go 1.20

OpenTelemetry Collector Development

Building The Configuration

The configuration portion of the component is how the user is able to have their inputs over the component, so the values that is used for the configuration need to be:

Intuitive for users to understand what that field controls
Be explicit in what is required and what is optional
Reuse common names and fields
Keep the options simple

---
jenkins_server_addr: hostname
jenkins_server_api_port: 8089
interval: 10m
filter_builds_by:
    - name: my-awesome-build
      status: amber
track:
    values:
        example.metric.1: yes
        example.metric.2: yes
        example.metric.3: no
        example.metric.4: no

---
# Required Values
endpoint: http://my-jenkins-server:8089
auth:
    authenticator: basicauth/jenkins
# Optional Values
collection_interval: 10m
metrics:
    example.metric.1:
        enabled: true
    example.metric.2:
        enabled: true
    example.metric.3:
        enabled: true
    example.metric.4:
        enabled: true

The bad configuration highlights how doing the opposite of the recommendations of configuration practices impacts the usability of the component. It doesn’t make it clear what field values should be, it includes features that can be pushed to existing processors, and the field naming is not consistent with other components that exist in the collector.

The good configuration keeps the required values simple, reuses field names from other components, and ensures the component focuses on just the interaction between Jenkins and the collector.

The code tab shows how much is required to be added by us and what is already provided for us by shared libraries within the collector. These will be explained in more detail once we get to the business logic. The configuration should start off small and will change once the business logic has started to include additional features that is needed.

Write the code

In order to implement the code needed for the configuration, we are going to create a new file named config.go with the following content:

package jenkinscireceiver

import (
    "go.opentelemetry.io/collector/config/confighttp"
    "go.opentelemetry.io/collector/receiver/scraperhelper"

    "splunk.conf/workshop/example/jenkinscireceiver/internal/metadata"
)

type Config struct {
    // HTTPClientSettings contains all the values
    // that are commonly shared across all HTTP interactions
    // performed by the collector.
    confighttp.HTTPClientSettings `mapstructure:",squash"`
    // ScraperControllerSettings will allow us to schedule 
    // how often to check for updates to builds.
    scraperhelper.ScraperControllerSettings `mapstructure:",squash"`
    // MetricsBuilderConfig contains all the metrics
    // that can be configured.
    metadata.MetricsBuilderConfig `mapstructure:",squash"`
}

OpenTelemetry Collector Development

Component Review

To recap the type of component we will need to capture metrics from Jenkins:

The business use case an extension helps solves for are:

Having shared functionality that requires runtime configuration
Indirectly helps with observing the runtime of the collector

See Extensions Overview for more details.

The business use case a receiver solves for:

Fetching data from a remote source
Receiving data from remote source(s)

This is commonly referred to pull vs push based data collection, and you read more about the details in the Receiver Overview.

The business use case a processor solves for is:

Adding or removing data, fields, or values
Observing and making decisions on the data
Buffering, queueing, and reordering

The thing to keep in mind is the data type flowing through a processor needs to forward the same data type to its downstream components. Read through Processor Overview for the details.

The business use case an exporter solves for:

Send the data to a tool, service, or storage

The OpenTelemetry collector does not want to be “backend”, an all-in-one observability suite, but rather keep to the principles that founded OpenTelemetry to begin with; A vendor agnostic Observability for all. To help revisit the details, please read through Exporter Overview.

This is a component type that was missed in the workshop since it is a relatively new addition to the collector, but the best way to think about a connector is that it is like a processor that allows it to be used across different telemetry types and pipelines. Meaning that a connector can accept data as logs, and output metrics, or accept metrics from one pipeline and provide metrics on the data it has observed.

The business case that a connector solves for:

Converting from different telemetry types
- logs to metrics
- traces to metrics
- metrics to logs
Observing incoming data and producing its own data
- Accepting metrics and generating analytical metrics of the data.

There was a brief overview within the Ninja section as part of the Processor Overview, and be sure what the project for updates for new connector components.

From the component overviews, it is clear that developing a pull-based receiver for Jenkins.

OpenTelemetry Collector Development

Designing The Metrics

To help define and export the metrics captured by our receiver, we will be using, mdatagen, a tool developed for the collector that turns YAML defined metrics into code.

---
# Type defines the name to reference the component
# in the configuration file
type: jenkins

# Status defines the component type and the stability level
status:
  class: receiver
  stability:
    development: [metrics]

# Attributes are the expected fields reported
# with the exported values.
attributes:
  job.name:
    description: The name of the associated Jenkins job
    type: string
  job.status:
    description: Shows if the job had passed, or failed
    type: string
    enum:
    - failed
    - success
    - unknown

# Metrics defines all the pontentially exported values from this receiver. 
metrics:
  jenkins.jobs.count:
    enabled: true
    description: Provides a count of the total number of configured jobs
    unit: "{Count}"
    gauge:
      value_type: int
  jenkins.job.duration:
    enabled: true
    description: Show the duration of the job
    unit: "s"
    gauge:
      value_type: int
    attributes:
    - job.name
    - job.status
  jenkins.job.commit_delta:
    enabled: true
    description: The calculation difference of the time job was finished minus commit timestamp
    unit: "s"
    gauge:
      value_type: int
    attributes:
    - job.name
    - job.status

// To generate the additional code needed to capture metrics, 
// the following command to be run from the shell:
//  go generate -x ./...

//go:generate go run github.com/open-telemetry/opentelemetry-collector-contrib/cmd/mdatagen@v0.80.0 metadata.yaml
package jenkinscireceiver

// There is no code defined within this file.

Create these files within the project folder before continuing onto the next section.

Building The Factory

The Factory is a software design pattern that effectively allows for an object, in this case a jenkinscireceiver, to be created dynamically with the provided configuration. To use a more real-world example, it would be going to a phone store, asking for a phone that matches your exact description, and then providing it to you.

Run the following command go generate -x ./... , it will create a new folder, jenkinscireceiver/internal/metadata, that contains all code required to export the defined metrics. The required code is:

package jenkinscireceiver

import (
    "errors"

    "go.opentelemetry.io/collector/component"
    "go.opentelemetry.io/collector/config/confighttp"
    "go.opentelemetry.io/collector/receiver"
    "go.opentelemetry.io/collector/receiver/scraperhelper"

    "splunk.conf/workshop/example/jenkinscireceiver/internal/metadata"
)

func NewFactory() receiver.Factory {
    return receiver.NewFactory(
        metadata.Type,
        newDefaultConfig,
        receiver.WithMetrics(newMetricsReceiver, metadata.MetricsStability),
    )
}

func newMetricsReceiver(_ context.Context, set receiver.CreateSettings, cfg component.Config, consumer consumer.Metrics) (receiver.Metrics, error) {
    // Convert the configuration into the expected type
    conf, ok := cfg.(*Config)
    if !ok {
        return nil, errors.New("can not convert config")
    }
    sc, err := newScraper(conf, set)
    if err != nil {
        return nil, err
    }
    return scraperhelper.NewScraperControllerReceiver(
        &conf.ScraperControllerSettings,
        set,
        consumer,
        scraperhelper.AddScraper(sc),
    )
}

package jenkinscireceiver

import (
    "go.opentelemetry.io/collector/config/confighttp"
    "go.opentelemetry.io/collector/receiver/scraperhelper"

    "splunk.conf/workshop/example/jenkinscireceiver/internal/metadata"
)

type Config struct {
    // HTTPClientSettings contains all the values
    // that are commonly shared across all HTTP interactions
    // performed by the collector.
    confighttp.HTTPClientSettings `mapstructure:",squash"`
    // ScraperControllerSettings will allow us to schedule 
    // how often to check for updates to builds.
    scraperhelper.ScraperControllerSettings `mapstructure:",squash"`
    // MetricsBuilderConfig contains all the metrics
    // that can be configured.
    metadata.MetricsBuilderConfig `mapstructure:",squash"`
}

func newDefaultConfig() component.Config {
    return &Config{
        ScraperControllerSettings: scraperhelper.NewDefaultScraperControllerSettings(metadata.Type),
        HTTPClientSettings:        confighttp.NewDefaultHTTPClientSettings(),
        MetricsBuilderConfig:      metadata.DefaultMetricsBuilderConfig(),
    }
}

package jenkinscireceiver

type scraper struct {}

func newScraper(cfg *Config, set receiver.CreateSettings) (scraperhelper.Scraper, error) {
    // Create a our scraper with our values 
    s := scraper{
        // To be filled in later
    }
    return scraperhelper.NewScraper(metadata.Type, s.scrape)
}

func (scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
    // To be filled in
    return pmetrics.NewMetrics(), nil
}

---
dist:
  name: otelcol
  description: "Conf workshop collector"
  output_path: ./dist
  version: v0.0.0-experimental

extensions:
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/basicauthextension v0.80.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/healthcheckextension v0.80.0

receivers:
  - gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.80.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/jaegerreceiver v0.80.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.80.0
  - gomod: splunk.conf/workshop/example/jenkinscireceiver v0.0.0
    path: ./jenkinscireceiver

processors:
  - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.80.0

exporters:
  - gomod: go.opentelemetry.io/collector/exporter/loggingexporter v0.80.0
  - gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.80.0
  - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.80.0

# This replace is a go directive that allows for redefine
# where to fetch the code to use since the default would be from a remote project.
replaces:
- splunk.conf/workshop/example/jenkinscireceiver => ./jenkinscireceiver

├── build-config.yaml
└── jenkinscireceiver
    ├── go.mod
    ├── config.go
    ├── factory.go
    ├── scraper.go
    └── internal
      └── metadata

Once you have written these files into the project with the expected contents run, go mod tidy, which will fetch all the remote dependencies and update go.mod and generate the go.sum files.

OpenTelemetry Collector Development

Building The Business Logic

At this point, we have a custom component that currently does nothing so we need to add in the required logic to capture this data from Jenkins.

From this point, the steps that we need to take are:

Create a client that connect to Jenkins
Capture all the configured jobs
Report the status of the last build for the configured job
Calculate the time difference between commit timestamp and job completion.

The changes will be made to scraper.go.

To be able to connect to the Jenkins server, we will be using the package, “github.com/yosida95/golang-jenkins”, which provides the functionality required to read data from the jenkins server.

Then we are going to utilise some of the helper functions from the, “go.opentelemetry.io/collector/receiver/scraperhelper” , library to create a start function so that we can connect to the Jenkins server once component has finished starting.

package jenkinscireceiver

import (
    "context"

    jenkins "github.com/yosida95/golang-jenkins"
    "go.opentelemetry.io/collector/component"
    "go.opentelemetry.io/collector/pdata/pmetric"
    "go.opentelemetry.io/collector/receiver"
    "go.opentelemetry.io/collector/receiver/scraperhelper"

    "splunk.conf/workshop/example/jenkinscireceiver/internal/metadata"
)

type scraper struct {
    mb     *metadata.MetricsBuilder
    client *jenkins.Jenkins
}

func newScraper(cfg *Config, set receiver.CreateSettings) (scraperhelper.Scraper, error) {
    s := &scraper{
        mb : metadata.NewMetricsBuilder(cfg.MetricsBuilderConfig, set),
    }
    
    return scraperhelper.NewScraper(
        metadata.Type,
        s.scrape,
        scraperhelper.WithStart(func(ctx context.Context, h component.Host) error {
            client, err := cfg.ToClient(h, set.TelemetrySettings)
            if err != nil {
                return err
            }
            // The collector provides a means of injecting authentication
            // on our behalf, so this will ignore the libraries approach
            // and use the configured http client with authentication.
            s.client = jenkins.NewJenkins(nil, cfg.Endpoint)
            s.client.SetHTTPClient(client)
            return nil
        }),
    )
}

func (s scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
    // To be filled in
    return pmetric.NewMetrics(), nil
}

This finishes all the setup code that is required in order to initialise a Jenkins receiver.

From this point on, we will be focuses on the scrape method that has been waiting to be filled in. This method will be run on each interval that is configured within the configuration (by default, every minute).

The reason we want to capture the number of jobs configured so we can see the growth of our Jenkins server, and measure of many projects have onboarded. To do this we will call the jenkins client to list all jobs, and if it reports an error, return that with no metrics, otherwise, emit the data from the metric builder.

func (s scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
    jobs, err := s.client.GetJobs()
    if err != nil {
        return pmetric.Metrics{}, err
    }

    // Recording the timestamp to ensure
    // all captured data points within this scrape have the same value. 
    now := pcommon.NewTimestampFromTime(time.Now())
    
    // Casting to an int64 to match the expected type
    s.mb.RecordJenkinsJobsCountDataPoint(now, int64(len(jobs)))
    
    // To be filled in

    return s.mb.Emit(), nil
}

In the last step, we were able to capture all jobs ands report the number of jobs there was. Within this step, we are going to examine each job and use the report values to capture metrics.

func (s scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
    jobs, err := s.client.GetJobs()
    if err != nil {
        return pmetric.Metrics{}, err
    }

    // Recording the timestamp to ensure
    // all captured data points within this scrape have the same value. 
    now := pcommon.NewTimestampFromTime(time.Now())
    
    // Casting to an int64 to match the expected type
    s.mb.RecordJenkinsJobsCountDataPoint(now, int64(len(jobs)))
    
    for _, job := range jobs {
        // Ensure we have valid results to start off with
        var (
            build  = job.LastCompletedBuild
            status = metadata.AttributeJobStatusUnknown
        )

        // This will check the result of the job, however,
        // since the only defined attributes are 
        // `success`, `failure`, and `unknown`. 
        // it is assume that anything did not finish 
        // with a success or failure to be an unknown status.

        switch build.Result {
        case "aborted", "not_built", "unstable":
            status = metadata.AttributeJobStatusUnknown
        case "success":
            status = metadata.AttributeJobStatusSuccess
        case "failure":
            status = metadata.AttributeJobStatusFailed
        }

        s.mb.RecordJenkinsJobDurationDataPoint(
            now,
            int64(job.LastCompletedBuild.Duration),
            job.Name,
            status,
        )
    }

    return s.mb.Emit(), nil
}

The final step is to calculate how long it took from commit to job completion to help infer our DORA metrics.

func (s scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
    jobs, err := s.client.GetJobs()
    if err != nil {
        return pmetric.Metrics{}, err
    }

    // Recording the timestamp to ensure
    // all captured data points within this scrape have the same value. 
    now := pcommon.NewTimestampFromTime(time.Now())
    
    // Casting to an int64 to match the expected type
    s.mb.RecordJenkinsJobsCountDataPoint(now, int64(len(jobs)))
    
    for _, job := range jobs {
        // Ensure we have valid results to start off with
        var (
            build  = job.LastCompletedBuild
            status = metadata.AttributeJobStatusUnknown
        )

        // Previous step here

        // Ensure that the `ChangeSet` has values
        // set so there is a valid value for us to reference
        if len(build.ChangeSet.Items) == 0 {
            continue
        }

        // Making the assumption that the first changeset
        // item is the most recent change.
        change := build.ChangeSet.Items[0]

        // Record the difference from the build time
        // compared against the change timestamp.
        s.mb.RecordJenkinsJobCommitDeltaDataPoint(
            now,
            int64(build.Timestamp-change.Timestamp),
            job.Name,
            status,
        )
    }

    return s.mb.Emit(), nil
}

Once all of these steps have been completed, you now have built a custom Jenkins CI receiver!

Whats next?

There are more than likely features that would be desired from component that you can think of, like:

Can I include the branch name that the job used?
Can I include the project name for the job?
How I calculate the collective job durations for project?
How do I validate the changes work?

Please take this time to play around, break it, change things around, or even try to capture logs from the builds.

Advanced OpenTelemetry Collector

75 minutes Authors Robert Castley, Charity Anderson, Pieter Hagen & Geoff Higginbottom

The goal of this workshop is to help you gain confidence in creating and modifying OpenTelemetry Collector configuration files. You’ll start with a minimal agent.yaml and gateway.yaml file and progressively build them out to handle several advanced, real-world scenarios.

A key focus of this workshop is learning how to configure the OpenTelemetry Collector to store telemetry data locally, rather than sending it to a third-party vendor backend. This approach not only simplifies debugging and troubleshooting but is also ideal for testing and development environments where you want to avoid sending data to production systems.

To make the most of this workshop, you should have:

A basic understanding of the OpenTelemetry Collector and its configuration file structure.
Proficiency in editing YAML files.

Everything in this workshop is designed to run locally, ensuring a hands-on and accessible learning experience. Let’s dive in and start building!

Workshop Overview

During this workshop, we will cover the following topics:

Setting up the agent and gateway locally: Test that metrics, traces, and logs go via the agent to the gateway.
Enhancing agent resilience: Basic configurations for fault tolerance.
Configuring processors:
- Filter out noise by dropping specific spans (e.g., health checks).
- Remove unnecessary tags, and handle sensitive data.
- Transform data using OTTL (OpenTelemetry Transformation Language) in the pipeline before exporting.
Configuring Connectors:
- Route data to different endpoints based on the values received.

By the end of this workshop, you’ll be familiar with configuring the OpenTelemetry Collector for a variety of real-world use cases.

Pre-requisites

5 minutes

Prerequisites

Proficiency in editing YAML files using vi, vim, nano, or your preferred text editor.
Supported Environments:
- A provided Splunk Workshop Instance (preferred). Outbound access to port 2222 is required for ssh access.
- Apple Mac (Apple Silicon). Installation of jq is required - https://jqlang.org/download/

Exercise

Create a directory: In your environment create a new directory and change into it:

mkdir advanced-otel-workshop && \
cd advanced-otel-workshop

We will refer to this directory as [WORKSHOP] for the remainder of the workshop.

Remove any existing OpenTelemetry Collectors

If you have completed the Splunk IM workshop, please ensure you have deleted the collector running in Kubernetes before continuing. This can be done by running the following command:

helm delete splunk-otel-collector

The EC2 instance in that case may also run some services that can interfere with this workshop , so run the following command to make sure they are stopped if present:

kubectl delete ~/workshop/apm/deployment.yaml

Download workshop binaries: Change into your [WORKSHOP] directory and download the OpenTelemetry Collector, Load Generator binaries and setup script:

curl -L https://github.com/signalfx/splunk-otel-collector/releases/download/v0.132.0/otelcol_linux_amd64 -o otelcol && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/loadgen/build/loadgen-linux-amd64 -o loadgen && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/setup-workshop.sh -o setup-workshop.sh && \
chmod +x setup-workshop.sh

curl -L https://github.com/signalfx/splunk-otel-collector/releases/download/v0.132.0/otelcol_darwin_arm64 -o otelcol && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/loadgen/build/loadgen-darwin-arm64 -o loadgen && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/setup-workshop.sh -o setup-workshop.sh && \
chmod +x setup-workshop.sh

Run the setup-workshop.sh script which will configure the correct permissions and also create the initial configurations for the Agent and the Gateway:

./setup-workshop.sh

███████╗██████╗ ██╗     ██╗   ██╗███╗   ██╗██╗  ██╗    ██╗
██╔════╝██╔══██╗██║     ██║   ██║████╗  ██║██║ ██╔╝    ╚██╗
███████╗██████╔╝██║     ██║   ██║██╔██╗ ██║█████╔╝      ╚██╗
╚════██║██╔═══╝ ██║     ██║   ██║██║╚██╗██║██╔═██╗      ██╔╝
███████║██║     ███████╗╚██████╔╝██║ ╚████║██║  ██╗    ██╔╝
╚══════╝╚═╝     ╚══════╝ ╚═════╝ ╚═╝  ╚═══╝╚═╝  ╚═╝    ╚═╝

Welcome to the Splunk Advanced OpenTelemetry Workshop!
======================================================

macOS detected. Removing quarantine attributes...
otelcol version v0.126.0
Usage: loadgen [OPTIONS]
Options:
  -base       Send base traces (enabled by default)
  -health     Send health traces
  -security   Send security traces
  -logs       Enable logging of random quotes to quotes.log
  -json       Output logs in JSON format (only applicable with -logs)
  -count      Number of traces or logs to send (default: infinite)
  -h, --help  Display this help message

Example:
  loadgen -health -security -count 10   Send 10 health and security traces
  loadgen -logs -json -count 5          Write 5 random quotes in JSON format to quotes.log
Creating workshop directories...
✓ Created subdirectories:
  ├── 1-agent-gateway
  ├── 2-building-resilience
  ├── 3-dropping-spans
  ├── 4-sensitive-data
  ├── 5-transform-data
  ├── 6-routing-data
  └── 7-sum-count

Creating configuration files for 1-agent-gateway...
Creating OpenTelemetry Collector agent configuration file: 1-agent-gateway/agent.yaml
✓ Configuration file created successfully: 1-agent-gateway/agent.yaml
✓ File size:     4355 bytes

Creating OpenTelemetry Collector gateway configuration file: 1-agent-gateway/gateway.yaml
✓ Configuration file created successfully: 1-agent-gateway/gateway.yaml
✓ File size:     3376 bytes

✓ Completed configuration files for 1-agent-gateway

Creating configuration files for 2-building-resilience...
Creating OpenTelemetry Collector agent configuration file: 2-building-resilience/agent.yaml
✓ Configuration file created successfully: 2-building-resilience/agent.yaml
✓ File size:     4355 bytes

Creating OpenTelemetry Collector gateway configuration file: 2-building-resilience/gateway.yaml
✓ Configuration file created successfully: 2-building-resilience/gateway.yaml
✓ File size:     3376 bytes

✓ Completed configuration files for 2-building-resilience

Workshop environment setup complete!
Configuration files created in the following directories:
  1-agent-gateway/
    ├── agent.yaml
    └── gateway.yaml
  2-building-resilience/
    ├── agent.yaml
    └── gateway.yaml

[WORKSHOP]
├── 1-agent-gateway
├── 2-building-resilience
├── 3-dropping-spans
├── 4-sensitive-data
├── 5-transform-data
├── 6-routing-data
├── 7-sum-count
├── loadgen
├── otelcol
└── setup-workshop.sh

1. Verify Agent Configuration

15 minutes

Welcome! In this section, we’ll begin with a fully functional OpenTelemetry setup that includes both an Agent and a Gateway.

We’ll start by quickly reviewing their configuration files to get familiar with the overall structure and to highlight key sections that control the telemetry pipeline.

Tip

Throughout the workshop, you’ll work with multiple terminal windows. To keep things organized, give each terminal a unique name or color. This will help you easily recognize and switch between them during the exercises.

We will refer to these terminals as: Agent, Gateway, Loadgen, and Test.

Exercise

Create your first terminal window and name it Agent. Navigate to the directory for the first exercise [WORKSHOP]/1-agent-gateway and verify that the required files have been generated:
```
cd 1-agent-gateway
ls -l
```
You should see the following files in the directory. If not, re-run the setup-workshop.sh script as described in the Pre-requisites section:
. ├── agent.yaml └── gateway.yaml

Understanding the Agent configuration

Let’s review the key components of the agent.yaml file used in this workshop. We’ve made some important additions to support metrics, traces, and logs.

Receivers

The receivers section defines how the Agent ingests telemetry data. In this setup, three types of receivers have been configured:

Host Metrics Receiver

hostmetrics:                         # Host Metrics Receiver
  collection_interval: 3600s         # Collection Interval (1hr)
  scrapers:
    cpu:                             # CPU Scraper

Collects CPU usage from the local system every hour. We’ll use this to generate example metric data.

OTLP Receiver (HTTP protocol)

otlp:                                # OTLP Receiver
  protocols:
    http:                            # Configure HTTP protocol
      endpoint: "0.0.0.0:4318"       # Endpoint to bind to

Enables the agent to receive metrics, traces, and logs over HTTP on port 4318. This is used to send data to the collector in future exercises.

FileLog Receiver

filelog/quotes:                      # Receiver Type/Name
  include: ./quotes.log              # The file to read log data from
  include_file_path: true            # Include file path in the log data
  include_file_name: false           # Exclude file name from the log data
  resource:                          # Add custom resource attributes
    com.splunk.source: ./quotes.log  # Source of the log data
    com.splunk.sourcetype: quotes    # Source type of the log data

Enables the agent to tail a local log file (quotes.log) and convert it to structured log events, enriched with metadata such as source and sourceType.

Exporters

Debug Exporter

  debug:                               # Exporter Type
    verbosity: detailed                # Enabled detailed debug output

OTLPHTTP Exporter
```
  otlphttp:                            # Exporter Type
    endpoint: "http://localhost:5318"  # Gateway OTLP endpoint  
```
The debug exporter sends data to the console for visibility and debugging during the workshop while the otlphttp exporter forwards all telemetry to the local Gateway instance.
This dual-export strategy ensures you can see the raw data locally while also sending it downstream for further processing and export.

1.1 Verify Gateway Configuration

The OpenTelemetry Gateway serves as a central hub for receiving, processing, and exporting telemetry data. It sits between your telemetry sources (such as applications and services) and your observability backends like Splunk Observability Cloud.

By centralizing telemetry traffic, the gateway enables advanced features such as data filtering, enrichment, transformation, and routing to one or more destinations. It helps reduce the burden on individual services by offloading telemetry processing and ensures consistent, standardized data across distributed systems.

This makes your observability pipeline easier to manage, scale, and analyze—especially in complex, multi-service environments.

Exercise

Open or create your second terminal window and name it Gateway. Navigate to the first exercise directory [WORKSHOP]/1-agent-gateway then check the contents of the gateway.yaml file.

This file outlines the core structure of the OpenTelemetry Collector as deployed in Gateway mode.

Understanding the Gateway Configuration

Let’s explore the gateway.yaml file that defines how the OpenTelemetry Collector is configured in Gateway mode during this workshop. This Gateway is responsible for receiving telemetry from the Agent, then processing and exporting it for inspection or forwarding.

OTLP Receiver (Custom Port)
```
receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:5318"
```
The port 5318 matches the otlphttp exporter in the Agent configuration, ensuring that all telemetry data sent by the Agent is accepted by the Gateway.

Note

This separation of ports avoids conflicts and keeps responsibilities clear between agent and gateway roles.

File Exporters

The Gateway uses three file exporters to output telemetry data to local files. These exporters are defined as:

exporters:                        # List of exporters
  debug:                          # Debug exporter
    verbosity: detailed           # Enable detailed debug output
  file/traces:                    # Exporter Type/Name
    path: "./gateway-traces.out"  # Path for OTLP JSON output for traces
    append: false                 # Overwrite the file each time
  file/metrics:                   # Exporter Type/Name
    path: "./gateway-metrics.out" # Path for OTLP JSON output for metrics
    append: false                 # Overwrite the file each time
  file/logs:                      # Exporter Type/Name
    path: "./gateway-logs.out"    # Path for OTLP JSON output for logs
    append: false                 # Overwrite the file each time

Each exporter writes a specific signal type to its corresponding file.

These files are created once the gateway is started and will be populated with real telemetry as the agent sends data. You can monitor these files in real time to observe the flow of telemetry through your pipeline.

1.2 Validate & Test Configuration

Now, we can start the Gateway and the Agent, which is configured to automatically send Host Metrics at startup. We do this to verify that data is properly routed from the Agent to the Gateway.

Exercise

Gateway: In the Gateway terminal window, run the following command to start the Gateway:

../otelcol --config=gateway.yaml

If everything is configured correctly, the collector will start and state Everything is ready. Begin running and processing data. in the output, similar to the following:

2025-06-09T09:22:11.944+0100    info    service@v0.126.0/service.go:289 Everything is ready. Begin running and processing data. {"resource": {}}

Once the Gateway is running, it will listen for incoming data on port 5318 and export the received data to the following files:

gateway-traces.out
gateway-metrics.out
gateway-logs.out

Start the Agent: In the Agent terminal window start the agent with the agent configuration:

../otelcol --config=agent.yaml

Verify CPU Metrics:

Check that when the Agent starts, it immediately starts sending CPU metrics.
Both the Agent and the Gateway will display this activity in their debug output. The output should resemble the following snippet:

<snip>
NumberDataPoints #31
Data point attributes:
     -> cpu: Str(cpu3)
     -> state: Str(wait)
StartTimestamp: 2025-07-07 16:49:42 +0000 UTC
Timestamp: 2025-07-09 09:36:21.190226459 +0000 UTC
Value: 77.380000
        {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics"}

At this stage, the Agent continues to collect CPU metrics once per hour or upon each restart and sends them to the gateway. The Gateway processes these metrics and exports them to a file named gateway-metrics.out. This file stores the exported metrics as part of the pipeline service.

Verify Data arrived at Gateway: To confirm that CPU metrics, specifically for cpu0, have successfully reached the gateway, we’ll inspect the gateway-metrics.out file using the jq command.

The following command filters and extracts the system.cpu.time metric, focusing on cpu0. It displays the metric’s state (e.g., user, system, idle, interrupt) along with the corresponding values.

Open or create your third terminal window and name it Tests. Run the command below in the Tests terminal to check the system.cpu.time metric:

jq '.resourceMetrics[].scopeMetrics[].metrics[] | select(.name == "system.cpu.time") | .sum.dataPoints[] | select(.attributes[0].value.stringValue == "cpu0") | {cpu: .attributes[0].value.stringValue, state: .attributes[1].value.stringValue, value: .asDouble}' gateway-metrics.out

{
  "cpu": "cpu0",
  "state": "user",
  "value": 123407.02
}
{
  "cpu": "cpu0",
  "state": "system",
  "value": 64866.6
}
{
  "cpu": "cpu0",
  "state": "idle",
  "value": 216427.87
}
{
  "cpu": "cpu0",
  "state": "interrupt",
  "value": 0
}

Important

Stop the Agent and the Gateway processes by pressing Ctrl-C in their respective terminals.

2. Building In Resilience

10 minutes

The OpenTelemetry Collector’s FileStorage Extension is a critical component for building a more resilient telemetry pipeline. It enables the Collector to reliably checkpoint in-flight data, manage retries efficiently, and gracefully handle temporary failures without losing valuable telemetry.

With FileStorage enabled, the Collector can persist intermediate states to disk, ensuring that your traces, metrics, and logs are not lost during network disruptions, backend outages, or Collector restarts. This means that even if your network connection drops or your backend becomes temporarily unavailable, the Collector will continue to receive and buffer telemetry, resuming delivery seamlessly once connectivity is restored.

By integrating the FileStorage Extension into your pipeline, you can strengthen the durability of your observability stack and maintain high-quality telemetry ingestion, even in environments where connectivity may be unreliable.

Note

This solution will work for metrics as long as the connection downtime is brief, up to 15 minutes. If the downtime exceeds this, Splunk Observability Cloud might drop data to make sure no data-point is out of order.

For logs, there are plans to implement a full enterprise-ready solution in one of the upcoming Splunk OpenTelemetry Collector releases.

2.1 File Storage Configuration

In this exercise, we will update the extensions: section of the agent.yaml file. This section is part of the OpenTelemetry configuration YAML and defines optional components that enhance or modify the OpenTelemetry Collector’s behavior.

While these components do not process telemetry data directly, they provide valuable capabilities and services to improve the Collector’s functionality.

Exercise

Important

Change ALL terminal windows to the 2-building-resilience directory and run the clear command.

Your directory structure will look like this:

.
├── agent.yaml
└── gateway.yaml

Update the agent.yaml: In the Agent terminal window, add the file_storage extension under the existing health_check extension:

  file_storage/checkpoint:             # Extension Type/Name
    directory: "./checkpoint-dir"      # Define directory
    create_directory: true             # Create directory
    timeout: 1s                        # Timeout for file operations
    compaction:                        # Compaction settings
      on_start: true                   # Start compaction at Collector startup
      # Define compaction directory
      directory: "./checkpoint-dir/tmp"
      max_transaction_size: 65536      # Max. size limit before compaction occurs

Add file_storage to the exporter: Modify the otlphttp exporter to configure retry and queuing mechanisms, ensuring data is retained and resent if failures occur. Add the following under the endpoint: "http://localhost:5318" and make sure the indentation matches endpoint:

    retry_on_failure:
      enabled: true                    # Enable retry on failure
    sending_queue:                     # 
      enabled: true                    # Enable sending queue
      num_consumers: 10                # No. of consumers
      queue_size: 10000                # Max. queue size
      storage: file_storage/checkpoint # File storage extension

Update the services section: Add the file_storage/checkpoint extension to the existing extensions: section and the configuration needs to look like this:

service:
  extensions:
  - health_check
  - file_storage/checkpoint            # Enabled extensions for this collector

Update the metrics pipeline: For this exercise we are going to comment out the hostmetrics receiver from the Metric pipeline to reduce debug and log noise, again the configuration needs to look like this:

    metrics:
      receivers:
      # - hostmetrics                    # Hostmetric reciever (cpu only)
      - otlp

Validate the Agent configuration using otelbin.io. For reference, the metrics: section of your pipelines will look similar to this:

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
      REC1(&nbsp;&nbsp;otlp&nbsp;&nbsp;<br>fa:fa-download):::receiver
      REC2(filelog<br>fa:fa-download):::receiver
      PRO1(memory_limiter<br>fa:fa-microchip):::processor
      PRO2(resourcedetection<br>fa:fa-microchip):::processor
      PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
      EXP1(&ensp;debug&ensp;<br>fa:fa-upload):::exporter
      EXP2(otlphttp<br>fa:fa-upload):::exporter
      EXP3(&ensp;file&ensp;<br>fa:fa-upload):::exporter
    %% Links
    subID1:::sub-metrics
    subgraph " "
      subgraph subID1[**Metrics**]
      direction LR
      REC1 --> PRO1
      REC2 --> PRO1
      PRO1 --> PRO2
      PRO2 --> PRO3
      PRO3 --> EXP1
      PRO3 --> EXP3
      PRO3 --> EXP2
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-metrics stroke:#38bdf8,stroke-width:1px, color:#38bdf8,stroke-dasharray: 3 3;

2.2 Setup environment for Resilience Testing

Next, we will configure our environment to be ready for testing the File Storage configuration.

Exercise

Start the Gateway: In the Gateway terminal window run:

../otelcol --config=gateway.yaml

Start the Agent: In the Agent terminal window run:

../otelcol --config=agent.yaml

Send five test spans: In the Loadgen terminal window run:

../loadgen -count 5

Both the Agent and Gateway should display debug logs, and the Gateway should create a ./gateway-traces.out file.

If everything functions correctly, we can proceed with testing system resilience.

2.3 Simulate Failure

To assess the Agent’s resilience, we’ll simulate a temporary Gateway outage and observe how the Agent handles it:

Exercise

Simulate a network failure: In the Gateway terminal stop the Gateway with Ctrl-C and wait until the gateway console shows that it has stopped. The Agent will continue running, but it will not be able to send data to the gateway. The output in the Gateway terminal should look similar to this:

2025-07-09T10:22:37.941Z        info    service@v0.126.0/service.go:345 Shutdown complete.      {"resource": {}}

Send traces: In the Loadgen terminal window send five more traces using the loadgen.

../loadgen -count 5

Notice that the agent’s retry mechanism is activated as it continuously attempts to resend the data. In the agent’s console output, you will see repeated messages similar to the following:

2025-01-28T14:22:47.020+0100  info  internal/retry_sender.go:126  Exporting failed. Will retry the request after interval.  {"kind": "exporter", "data_type": "traces", "name": "otlphttp", "error": "failed to make an HTTP request: Post \"http://localhost:5318/v1/traces\": dial tcp 127.0.0.1:5318: connect: connection refused", "interval": "9.471474933s"}

Stop the Agent: In the Agent terminal window, use Ctrl-C to stop the agent. Wait until the agent’s console confirms it has stopped:

2025-07-09T10:25:59.344Z        info    service@v0.126.0/service.go:345 Shutdown complete.      {"resource": {}}

Important

When you stop the agent, any metrics, traces, or logs held in memory for retry will be lost. However, because we have configured the FileStorage Extension, all telemetry that has not yet been accepted by the target endpoint are safely checkpointed on disk.

Stopping the agent is a crucial step to clearly demonstrate how the system recovers when the agent is restarted.

2.4 Recovery

In this exercise, we’ll test how the OpenTelemetry Collector recovers from a network outage by restarting the Gateway collector. When the Gateway becomes available again, the Agent will resume sending data from its last check-pointed state, ensuring no data loss.

Exercise

Restart the Gateway: In the Gateway terminal window run:

../otelcol --config=gateway.yaml

Restart the Agent: In the Agent terminal window run:

../otelcol --config=agent.yaml

After the Agent is up and running, the File_Storage extension will detect buffered data in the checkpoint folder. It will start to dequeue the stored spans from the last checkpoint folder, ensuring no data is lost.

Verify the Agent Debug output: Note that the Agent debug output does NOT change and still shows the following line indicating no new data is being exported:

2025-07-11T08:31:58.176Z        info    service@v0.126.0/service.go:289 Everything is ready. Begin running and processing data.   {"resource": {}}

Watch the Gateway Debug output
You should see from the Gateway debug screen, it has started receiving the previously missed traces without requiring any additional action on your part e.g.:

Attributes:
   -> user.name: Str(Luke Skywalker)
   -> user.phone_number: Str(+1555-867-5309)
   -> user.email: Str(george@deathstar.email)
   -> user.password: Str(LOTR>StarWars1-2-3)
   -> user.visa: Str(4111 1111 1111 1111)
   -> user.amex: Str(3782 822463 10005)
   -> user.mastercard: Str(5555 5555 5555 4444)
   -> payment.amount: Double(75.75)
      {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "traces"}

Check the gateway-traces.out file: Using jq, count the number of traces in the recreated gateway-traces.out. It should match the number you send when the Gateway was down.

jq '.resourceSpans | length | "\(.) resourceSpans found"' gateway-traces.out

"5 resourceSpans found"

Important

Stop the Agent and the Gateway processes by pressing Ctrl-C in their respective terminals.

Conclusion

This exercise demonstrated how to enhance the resilience of the OpenTelemetry Collector by configuring the file_storage extension, enabling retry mechanisms for the otlp exporter, and using a file-backed queue for temporary data storage.

By implementing file-based check-pointing and queue persistence, you ensure the telemetry pipeline can gracefully recover from temporary interruptions, making it a more robust and reliable for production environments.

3. Dropping Spans

5 minutes

In this section, we will explore how to use the Filter Processor to selectively drop spans based on certain conditions.

Specifically, we will drop traces based on the span name, which is commonly used to filter out unwanted spans such as health checks or internal communication traces. In this case, we will be filtering out spans that contain "/_healthz", typically associated with health check requests and usually are quite “noisy”.

Exercise

Important

Change ALL terminal windows to the 3-dropping-spans directory and run the clear command.

Copy *.yaml from the 2-building-resilience directory into 3-dropping-spans. Your updated directory structure will now look like this:

.
├── agent.yaml
└── gateway.yaml

Next, we will configure the filter processor and the respective pipelines.

3.1 Configuration

Exercise

Switch to your Gateway terminal window and open the gateway.yaml file. Update the processors section with the following configuration:

Add a filter processor:
Configure the gateway to exclude spans with the name /_healthz. The error_mode: ignore directive ensures that any errors encountered during filtering are ignored, allowing the pipeline to continue running smoothly. The traces section defines the filtering rules, specifically targeting spans named /_healthz for exclusion.

  filter/health:                       # Defines a filter processor
    error_mode: ignore                 # Ignore errors
    traces:                            # Filtering rules for traces
      span:                            # Exclude spans named "/_healthz"
       - 'name == "/_healthz"'

Add the filter processor to the traces pipeline:
Include the filter/health processor in the traces pipeline. For optimal performance, place the filter as early as possible—right after the memory_limiter and before the batch processor. Here’s how the configuration should look:

  traces:
    receivers:
      - otlp
    processors:
      - memory_limiter
      - filter/health             # Filters data based on rules
      - resource/add_mode
      - batch
    exporters:
      - debug
      - file/traces

This setup ensures that health check related spans (/_healthz) are filtered out early in the pipeline, reducing unnecessary noise in your telemetry data.

Validate the agent configuration using otelbin.io. For reference, the traces: section of your pipelines will look similar to this:

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
      REC1(&nbsp;&nbsp;otlp&nbsp;&nbsp;<br>fa:fa-download):::receiver
      PRO1(memory_limiter<br>fa:fa-microchip):::processor
      PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
      PRO4(filter<br>fa:fa-microchip<br>health):::processor
      PRO5(batch<br>fa:fa-microchip):::processor
      EXP1(&ensp;debug&ensp;<br>fa:fa-upload):::exporter
      EXP2(&ensp;&ensp;file&ensp;&ensp;<br>fa:fa-upload<br>traces):::exporter
    %% Links
    subID1:::sub-traces
    subgraph " "
      subgraph subID1[**Traces**]
      direction LR
      REC1 --> PRO1
      PRO1 --> PRO4
      PRO4 --> PRO3
      PRO3 --> PRO5
      PRO5 --> EXP1
      PRO5 --> EXP2
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-traces stroke:#fbbf24,stroke-width:1px, color:#fbbf24,stroke-dasharray: 3 3;

3.2 Test Filter Processor

To test your configuration, you’ll need to generate some trace data that includes a span named "/_healthz".

Exercise

Start the Gateway: In your Gateway terminal window start the Gateway.

../otelcol --config ./gateway.yaml

Start the Agent: In your Agent terminal window start the Agent.

../otelcol --config ./agent.yaml

Start the Loadgen: In the Loadgen terminal window, execute the following command to start the load generator with health check spans enabled:

../loadgen -health -count 5

The debug output in the Agent terminal will show _healthz spans:

InstrumentationScope healthz 1.0.0
Span #0
   Trace ID       : 0cce8759b5921c8f40b346b2f6e2f4b6
   Parent ID      :
   ID             : bc32bd0e4ddcb174
   Name           : /_healthz
   Kind           : Server
   Start time     : 2025-07-11 08:47:50.938703979 +0000 UTC
   End time       : 2025-07-11 08:47:51.938704091 +0000 UTC
   Status code    : Ok
   Status message : Success

They will not be present in the Gateway debug as they are dropped by the filter processor that was configured earlier.

Verify agent.out: Using jq, in the Test terminal, confirm the name of the spans received by the Agent:

jq -c '.resourceSpans[].scopeSpans[].spans[] | "Span \(input_line_number) found with name \(.name)"' ./agent.out

"Span 1 found with name /movie-validator"
"Span 2 found with name /_healthz"
"Span 3 found with name /movie-validator"
"Span 4 found with name /_healthz"
"Span 5 found with name /movie-validator"
"Span 6 found with name /_healthz"
"Span 7 found with name /movie-validator"
"Span 8 found with name /_healthz"
"Span 9 found with name /movie-validator"
"Span 10 found with name /_healthz"

Check the Gateway Debug output: Using jq confirm the name of the spans received by the Gateway:

jq -c '.resourceSpans[].scopeSpans[].spans[] | "Span \(input_line_number) found with name \(.name)"' ./gateway-traces.out

The gateway-metrics.out file will not contain any spans named /_healthz.

"Span 1 found with name /movie-validator"
"Span 2 found with name /movie-validator"
"Span 3 found with name /movie-validator"
"Span 4 found with name /movie-validator"
"Span 5 found with name /movie-validator"

Tip

To ensure optimal performance with the Filter processor, thoroughly understand your incoming data format and rigorously test your configuration. Use the most specific filtering criteria possible to minimize the risk of inadvertently dropping important data.

This configuration can be extended to filter spans based on various attributes, tags, or custom criteria, enhancing the OpenTelemetry Collector’s flexibility and efficiency for your specific observability requirements.

Important

Stop the Agent and the Gateway processes by pressing Ctrl-C in their respective terminals.

4. Redacting Sensitive Data

10 minutes

In this section, you’ll learn how to configure the OpenTelemetry Collector to remove specific tags and redact sensitive data from telemetry spans. This is crucial for protecting sensitive information such as credit card numbers, personal data, or other security-related details that must be anonymized before being processed or exported.

We’ll walk through configuring key processors in the OpenTelemetry Collector, including:

Attributes Processor: Modifies or removes specific span attributes.
Redaction Processor: Ensures sensitive data is sanitized before being stored or transmitted.

Exercise

Important

Change ALL terminal windows to the 4-sensitive-data directory and run the clear command.

Copy *.yaml from the 3-dropping-spans directory into 4-sensitive-data. Your updated directory structure will now look like this:

.
├── agent.yaml
└── gateway.yaml

4.1 Configuration

In this step, we’ll modify agent.yaml to include the attributes and redaction processors. These processors will help ensure that sensitive data within span attributes is properly handled before being logged or exported.

Previously, you may have noticed that some span attributes displayed in the console contained personal and sensitive data. We’ll now configure the necessary processors to filter out and redact this information effectively.

Attributes:
     -> user.name: Str(George Lucas)
     -> user.phone_number: Str(+1555-867-5309)
     -> user.email: Str(george@deathstar.email)
     -> user.account_password: Str(LOTR>StarWars1-2-3)
     -> user.visa: Str(4111 1111 1111 1111)
     -> user.amex: Str(3782 822463 10005)
     -> user.mastercard: Str(5555 5555 5555 4444)
  {"kind": "exporter", "data_type": "traces", "name": "debug"}

Exercise

Switch to your Agent terminal window and open the agent.yaml file in your editor. We’ll add two processors to enhance the security and privacy of your telemetry data.

1. Add an attributes Processor: The Attributes Processor allows you to modify span attributes (tags) by updating, deleting, or hashing their values. This is particularly useful for obfuscating sensitive information before it is exported.

In this step, we’ll:

Update the user.phone_number attribute to a static value ("UNKNOWN NUMBER").
Hash the user.email attribute to ensure the original email is not exposed.
Delete the user.password attribute to remove it entirely from the span.

  attributes/update:
    actions:                           # Actions
      - key: user.phone_number         # Target key
        action: update                 # Update action
        value: "UNKNOWN NUMBER"        # New value
      - key: user.email                # Target key
        action: hash                   # Hash the email value
      - key: user.password             # Target key
        action: delete                 # Delete the password

2. Add a redaction Processor: The Redaction Processor detects and redacts sensitive data in span attributes based on predefined patterns, such as credit card numbers or other personally identifiable information (PII).

In this step:

We set allow_all_keys: true to ensure all attributes are processed (if set to false, only explicitly allowed keys are retained).
We define blocked_values with regular expressions to detect and redact Visa and MasterCard credit card numbers.
The summary: debug option logs detailed information about the redaction process for debugging purposes.

  redaction/redact:
    allow_all_keys: true               # If false, only allowed keys will be retained
    blocked_values:                    # List of regex patterns to block
      - '\b4[0-9]{3}[\s-]?[0-9]{4}[\s-]?[0-9]{4}[\s-]?[0-9]{4}\b'       # Visa
      - '\b5[1-5][0-9]{2}[\s-]?[0-9]{4}[\s-]?[0-9]{4}[\s-]?[0-9]{4}\b'  # MasterCard
    summary: debug                     # Show debug details about redaction

Update the traces Pipeline: Integrate both processors into the traces pipeline. Make sure that you comment out the redaction processor at first (we will enable it later in a separate exercise). Your configuration should look like this:

    traces:
      receivers:
      - otlp
      processors:
      - memory_limiter
      - attributes/update              # Update, hash, and remove attributes
      #- redaction/redact               # Redact sensitive fields using regex
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - file
      - otlphttp

Validate the agent configuration using otelbin.io. For reference, the traces: section of your pipelines will look similar to this:

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
      REC1(&nbsp;&nbsp;otlp&nbsp;&nbsp;<br>fa:fa-download):::receiver
      PRML(memory_limiter<br>fa:fa-microchip):::processor
      PRRD(resourcedetection<br>fa:fa-microchip):::processor
      PRRS(resource<br>fa:fa-microchip<br>add_mode):::processor
      PRUP(attributes<br>fa:fa-microchip<br>update):::processor
      EXP1(otlphttp<br>fa:fa-upload):::exporter
      EXP2(&ensp;&ensp;debug&ensp;&ensp;<br>fa:fa-upload):::exporter
      EXP3(file<br>fa:fa-upload):::exporter

    %% Links
    subID1:::sub-traces
    subgraph " "
      subgraph subID1[**Traces**]
      direction LR
      REC1 --> PRML
      PRML --> PRUP
      PRUP --> PRRD
      PRRD --> PRRS
      PRRS --> EXP2
      PRRS --> EXP3
      PRRS --> EXP1
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-traces stroke:#fbbf24,stroke-width:1px, color:#fbbf24,stroke-dasharray: 3 3;

4.2 Test Attribute Processor

In this exercise, we will delete the user.account_password, update the user.phone_number attribute and hash the user.email in the span data before it is exported by the Agent.

Exercise

Start the Gateway: In your Gateway terminal window start the Gateway.

../otelcol --config=gateway.yaml

Start the Agent: In your Agent terminal window start the Agent.

../otelcol --config=agent.yaml

Start the Load Generator: In the Loadgen terminal window start the loadgen:

../loadgen -count 1

Check the debug output: For both the Agent and Gateway confirm that user.account_password has been removed, and both user.phone_number & user.email have been updated:

   -> user.name: Str(George Lucas)
   -> user.phone_number: Str(UNKNOWN NUMBER)
   -> user.email: Str(62d5e03d8fd5808e77aee5ebbd90cf7627a470ae0be9ffd10e8025a4ad0e1287)
   -> payment.amount: Double(51.71)
   -> user.visa: Str(4111 1111 1111 1111)
   -> user.amex: Str(3782 822463 10005)
   -> user.mastercard: Str(5555 5555 5555 4444)

    -> user.name: Str(George Lucas)
    -> user.phone_number: Str(+1555-867-5309)
    -> user.email: Str(george@deathstar.email)
    -> user.password: Str(LOTR>StarWars1-2-3)
    -> user.visa: Str(4111 1111 1111 1111)
    -> user.amex: Str(3782 822463 10005)
    -> user.mastercard: Str(5555 5555 5555 4444)
    -> payment.amount: Double(95.22)

Check file output: Using jq validate that user.account_password has been removed, and user.phone_number & user.email have been updated in gateway-taces.out:

jq '.resourceSpans[].scopeSpans[].spans[].attributes[] | select(.key == "user.password" or .key == "user.phone_number" or .key == "user.email") | {key: .key, value: .value.stringValue}' ./gateway-traces.out

Notice that the user.account_password has been removed, and the user.phone_number & user.email have been updated:

{
  "key": "user.phone_number",
  "value": "UNKNOWN NUMBER"
}
{
  "key": "user.email",
  "value": "62d5e03d8fd5808e77aee5ebbd90cf7627a470ae0be9ffd10e8025a4ad0e1287"
}

Important

Stop the Agent and the Gateway processes by pressing Ctrl-C in their respective terminals.

4.3 Test Redaction Processor

The redaction processor gives precise control over which attributes and values are permitted or removed from telemetry data.

In this exercise, we will redact the user.visa & user.mastercard values in the span data before it is exported by the Agent.

Exercise

Start the Gateway: In your Gateway terminal window start the Gateway.

../otelcol --config=gateway.yaml

Enable the redaction/redact processor: In the Agent terminal window, edit agent.yaml and remove the # we inserted in the previous exercise.

    traces:
      receivers:
      - otlp
      processors:
      - memory_limiter
      - attributes/update              # Update, hash, and remove attributes
      - redaction/redact               # Redact sensitive fields using regex
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - file
      - otlphttp

Start the Agent: In your Agent terminal window start the Agent.

../otelcol --config=agent.yaml

Start the Load Generator: In the Loadgen terminal window start the loadgen:

../loadgen -count 1

Check the debug output: For both the Agent and Gateway confirm the values for user.visa & user.mastercard have been updated. Notice user.amex attribute value was NOT redacted because a matching regex pattern was not added to blocked_values

   -> user.name: Str(George Lucas)
   -> user.phone_number: Str(UNKNOWN NUMBER)
   -> user.email: Str(62d5e03d8fd5808e77aee5ebbd90cf7627a470ae0be9ffd10e8025a4ad0e1287)
   -> payment.amount: Double(69.71)
   -> user.visa: Str(****)
   -> user.amex: Str(3782 822463 10005)
   -> user.mastercard: Str(****)
   -> redaction.masked.keys: Str(user.mastercard,user.visa)
   -> redaction.masked.count: Int(2)

    -> user.name: Str(George Lucas)
    -> user.phone_number: Str(+1555-867-5309)
    -> user.email: Str(george@deathstar.email)
    -> user.password: Str(LOTR>StarWars1-2-3)
    -> user.visa: Str(4111 1111 1111 1111)
    -> user.amex: Str(3782 822463 10005)
    -> user.mastercard: Str(5555 5555 5555 4444)
    -> payment.amount: Double(65.54)

Note

By including summary:debug in the redaction processor, the debug output will include summary information about which matching key values were redacted, along with the count of values that were masked.

     -> redaction.masked.keys: Str(user.mastercard,user.visa)
     -> redaction.masked.count: Int(2)

Check file output: Using jq verify that user.visa & user.mastercard have been updated in the gateway-traces.out.

jq '.resourceSpans[].scopeSpans[].spans[].attributes[] | select(.key == "user.visa" or .key == "user.mastercard" or .key == "user.amex") | {key: .key, value: .value.stringValue}' ./gateway-traces.out

Notice that user.amex has not been redacted because a matching regex pattern was not added to blocked_values:

{
  "key": "user.visa",
  "value": "****"
}
{
  "key": "user.amex",
  "value": "3782 822463 10005"
}
{
  "key": "user.mastercard",
  "value": "****"
}

These are just a couple of examples of how attributes and redaction processors can be configured to protect sensitive data.

Important

Stop the Agent and the Gateway processes by pressing Ctrl-C in their respective terminals.

5. Transform Data

10 minutes

The Transform Processor lets you modify telemetry data—logs, metrics, and traces—as it flows through the pipeline. Using the OpenTelemetry Transformation Language (OTTL), you can filter, enrich, and transform data on the fly without touching your application code.

In this exercise we’ll update gateway.yaml to include a Transform Processor that will:

Filter log resource attributes.
Parse JSON structured log data into attributes.
Set log severity levels based on the log message body.

You may have noticed that in previous logs, fields like SeverityText and SeverityNumber were undefined. This is typical of the filelog receiver. However, the severity is embedded within the log body e.g.:

SeverityText: 
SeverityNumber: Unspecified(0)
Body: Str(2025-01-31 15:49:29 [WARN] - Do or do not, there is no try.)

Logs often contain structured data encoded as JSON within the log body. Extracting these fields into attributes allows for better indexing, filtering, and querying. Instead of manually parsing JSON in downstream systems, OTTL enables automatic transformation at the telemetry pipeline level.

Exercise

Important

Change ALL terminal windows to the 5-transform-data directory and run the clear command.

Copy *.yaml from the 4-sensitve-data directory into 5-transform-data. Your updated directory structure will now look like this:

.
├── agent.yaml
└── gateway.yaml

5.1 Configuration

Exercise

Add a transform processor: Switch to your Gateway terminal window and edit the gateway.yaml and add the following transform processor:

  transform/logs:                      # Processor Type/Name
    log_statements:                    # Log Processing Statements
      - context: resource              # Log Context
        statements:                    # List of attribute keys to keep
          - keep_keys(attributes, ["com.splunk.sourcetype", "host.name", "otelcol.service.mode"])

By using the -context: resource key we are targeting the resourceLog attributes of logs.

This configuration ensures that only the relevant resource attributes (com.splunk.sourcetype, host.name, otelcol.service.mode) are retained, improving log efficiency and reducing unnecessary metadata.

Adding a Context Block for Log Severity Mapping: To properly set the severity_text and severity_number fields of a log record, we add a log context block within log_statements. This configuration extracts the level value from the log body, maps it to severity_text, and assigns the corresponding severity_number based on the log level:

      - context: log                   # Log Context
        statements:                    # Transform Statements Array
          - set(cache, ParseJSON(body)) where IsMatch(body, "^\\{")  # Parse JSON log body into a cache object
          - flatten(cache, "")                                        # Flatten nested JSON structure
          - merge_maps(attributes, cache, "upsert")                   # Merge cache into attributes, updating existing keys
          - set(severity_text, attributes["level"])                   # Set severity_text from the "level" attribute
          - set(severity_number, 1) where severity_text == "TRACE"    # Map severity_text to severity_number
          - set(severity_number, 5) where severity_text == "DEBUG"
          - set(severity_number, 9) where severity_text == "INFO"
          - set(severity_number, 13) where severity_text == "WARN"
          - set(severity_number, 17) where severity_text == "ERROR"
          - set(severity_number, 21) where severity_text == "FATAL"

The merge_maps function is used to combine two maps (dictionaries) into one. In this case, it merges the cache object (containing parsed JSON data from the log body) into the attributes map.

Parameters:
- attributes: The target map where the data will be merged.
- cache: The source map containing the parsed JSON data.
- "upsert": This mode ensures that if a key already exists in the attributes map, its value will be updated with the value from cache. If the key does not exist, it will be inserted.

This step is crucial because it ensures that all relevant fields from the log body (e.g., level, message, etc.) are added to the attributes map, making them available for further processing or exporting.

Summary of Key Transformations:

Parse JSON: Extracts structured data from the log body.
Flatten JSON: Converts nested JSON objects into a flat structure.
Merge Attributes: Integrates extracted data into log attributes.
Map Severity Text: Assigns severity_text from the log’s level attribute.
Assign Severity Numbers: Converts severity levels into standardized numerical values.

Important

You should have a single transform processor containing two context blocks: one whose context is for resource and one whose context is for log.

This configuration ensures that log severity is correctly extracted, standardized, and structured for efficient processing.

Tip

This method of mapping all JSON fields to top-level attributes should only be used for testing and debugging OTTL. It will result in high cardinality in a production scenario.

Update the logs pipeline: Add the transform/logs: processor into the logs: pipeline so your configuration looks like this:

    logs:                         # Logs pipeline
      receivers:
      - otlp                      # OTLP receiver
      processors:                 # Processors for logs
      - memory_limiter
      - resource/add_mode
      - transform/logs
      - batch
      exporters:
      - debug                     # Debug exporter
      - file/logs

Validate the agent configuration using https://otelbin.io. For reference, the logs: section of your pipelines will look similar to this:

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
      REC1(&nbsp;&nbsp;otlp&nbsp;&nbsp;<br>fa:fa-download):::receiver
      PRO1(memory_limiter<br>fa:fa-microchip):::processor
      PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
      PRO4(transform<br>fa:fa-microchip<br>logs):::processor
      PRO5(batch<br>fa:fa-microchip):::processor
      EXP1(file<br>fa:fa-upload<br>logs):::exporter
      EXP2(&ensp;&ensp;debug&ensp;&ensp;<br>fa:fa-upload):::exporter
    %% Links
    subID1:::sub-logs
    subgraph " "
      subgraph subID1[**Logs**]
      direction LR
      REC1 --> PRO1
      PRO1 --> PRO3
      PRO3 --> PRO4
      PRO4 --> PRO5
      PRO5 --> EXP2
      PRO5 --> EXP1
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-logs stroke:#34d399,stroke-width:1px, color:#34d399,stroke-dasharray: 3 3;

5.2 Setup Environment

Exercise

Start the Gateway: In the Gateway terminal run:

../otelcol --config=gateway.yaml

Start the Agent: In the Agent terminal run:

../otelcol --config=agent.yaml

Start the Load Generator: In the Loadgen terminal window, execute the following command to start the load generator with JSON enabled:

../loadgen -logs -json -count 5

The loadgen will write 5 log lines to ./quotes.log in JSON format.

5.3 Test Transform Processor

This test verifies that the com.splunk/source and os.type metadata have been removed from the log resource attributes before being exported by the Agent. Additionally, the test ensures that:

The log body is parsed to extract severity information.
- SeverityText and SeverityNumber are set on the LogRecord.
JSON fields from the log body are promoted to log attributes.

This ensures proper metadata filtering, severity mapping, and structured log enrichment before exporting.

Exercise

Check the debug output: For both the Agent and Gateway confirm that com.splunk/source and os.type have been removed:

Resource attributes:
   -> com.splunk.sourcetype: Str(quotes)
   -> host.name: Str(workshop-instance)
   -> otelcol.service.mode: Str(agent)

Resource attributes:
   -> com.splunk.source: Str(./quotes.log)
   -> com.splunk.sourcetype: Str(quotes)
   -> host.name: Str(workshop-instance)
   -> os.type: Str(linux)
   -> otelcol.service.mode: Str(agent)

For both the Agent and Gateway confirm that SeverityText and SeverityNumber in the LogRecord is now defined with the severity level from the log body. Confirm that the JSON fields from the body can be accessed as top-level log Attributes:

<snip>
SeverityText: WARN
SeverityNumber: Warn(13)
Body: Str({"level":"WARN","message":"Your focus determines your reality.","movie":"SW","timestamp":"2025-03-07 11:17:26"})
Attributes:
     -> log.file.path: Str(quotes.log)
     -> level: Str(WARN)
     -> message: Str(Your focus determines your reality.)
     -> movie: Str(SW)
     -> timestamp: Str(2025-03-07 11:17:26)
</snip>

<snip>
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str({"level":"WARN","message":"Your focus determines your reality.","movie":"SW","timestamp":"2025-03-07 11:17:26"})
Attributes:
     -> log.file.path: Str(quotes.log)
</snip>

Check file output: In the new gateway-logs.out file verify the data has been transformed:

jq '[.resourceLogs[].scopeLogs[].logRecords[] | {severityText, severityNumber, body: .body.stringValue}]' gateway-logs.out

[
  {
    "severityText": "DEBUG",
    "severityNumber": 5,
    "body": "{\"level\":\"DEBUG\",\"message\":\"All we have to decide is what to do with the time that is given us.\",\"movie\":\"LOTR\",\"timestamp\":\"2025-03-07 11:56:29\"}"
  },
  {
    "severityText": "WARN",
    "severityNumber": 13,
    "body": "{\"level\":\"WARN\",\"message\":\"The Force will be with you. Always.\",\"movie\":\"SW\",\"timestamp\":\"2025-03-07 11:56:29\"}"
  },
  {
    "severityText": "ERROR",
    "severityNumber": 17,
    "body": "{\"level\":\"ERROR\",\"message\":\"One does not simply walk into Mordor.\",\"movie\":\"LOTR\",\"timestamp\":\"2025-03-07 11:56:29\"}"
  },
  {
    "severityText": "DEBUG",
    "severityNumber": 5,
    "body": "{\"level\":\"DEBUG\",\"message\":\"Do or do not, there is no try.\",\"movie\":\"SW\",\"timestamp\":\"2025-03-07 11:56:29\"}"
  }
]
[
  {
    "severityText": "ERROR",
    "severityNumber": 17,
    "body": "{\"level\":\"ERROR\",\"message\":\"There is some good in this world, and it's worth fighting for.\",\"movie\":\"LOTR\",\"timestamp\":\"2025-03-07 11:56:29\"}"
  }
]

Important

Stop the Agent and the Gateway processes by pressing Ctrl-C in their respective terminals.

6. Routing Data

10 minutes

The Routing Connector in OpenTelemetry is a powerful feature that allows you to direct data (traces, metrics, or logs) to different pipelines/destinations based on specific criteria. This is especially useful in scenarios where you want to apply different processing or exporting logic to subsets of your telemetry data.

For example, you might want to send production data to one exporter while directing test or development data to another. Similarly, you could route certain spans based on their attributes, such as service name, environment, or span name, to apply custom processing or storage logic.

Exercise

Important

Change ALL terminal windows to the 6-routing-data directory and run the clear command.

Copy *.yaml from the 5-transform-data directory into 6-routing-data. Your updated directory structure will now look like this:

.
├── agent.yaml
└── gateway.yaml

Next, we will configure the routing connector and the respective pipelines.

6.1 Configure the Routing Connector

In this exercise, you will configure the Routing Connector in the gateway.yaml. The Routing Connector can route metrics, traces, and logs based on any attributes, we will focus exclusively on trace routing based on the deployment.environment attribute (though any span/log/metirc attribute can be used).

Exercise

Add new file exporters: The routing connector requires different targets for routing. In the Gateway terminal create two new file exporters, file/traces/route1-regular and file/traces/route2-security, to ensure data is directed correctly in the exporters section of the gateway.yaml:

  file/traces/route1-regular:                     # Exporter for regular traces
    path: "./gateway-traces-route1-regular.out"   # Path for saving trace data
    append: false                                 # Overwrite the file each time
  file/traces/route2-security:                    # Exporter for security traces
    path: "./gateway-traces-route2-security.out"  # Path for saving trace data
    append: false                                 # Overwrite the file each time

Enable Routing by adding the routing connector. In OpenTelemetry configuration files, connectors have their own dedicated section, similar to receivers and processors.

Find and uncomment the #connectors: section. Then, add the following below the connectors: section:

  routing:
    default_pipelines: [traces/route1-regular]  # Default pipeline if no rule matches
    error_mode: ignore                          # Ignore errors in routing
    table:                                      # Define routing rules
      # Routes spans to a target pipeline if the resourceSpan attribute matches the rule
      - statement: route() where attributes["deployment.environment"] == "security-applications"
        pipelines: [traces/route2-security]     # Security target pipeline

The default pipeline in the configuration file works at a Catch all. It will be the routing target for any data (spans in our case) that do not match a rule in the routing rules table, In this table you find the pipeline that is the target for any span that matches the following rule: ["deployment.environment"] == "security-applications"

With the routing configuration complete, the next step is to configure the pipelines to apply these routing rules.

6.2 Configuring the Pipelines

Exercise

Update the original traces pipeline to use routing:

To enable routing, update the original traces pipeline to use routing as the only exporter. This ensures all span data is sent through the Routing Connector for evaluation and then onwards to connected pipelines. Also, remove all processors and replace it with an empty array ([]) as this will now behandeld in the traces/route1-regular and traces/route2-security pipelines, allowing for custom behaviour for each route. Your traces: configuration should look like this:
```
traces:                       # Traces pipeline
  receivers:
  - otlp                      # OTLP receiver
  processors: []              # Processors for traces
  exporters:
  - routing
```

Add both the route1-regular and route2-security traces pipelines below the existing traces pipeline:

Configure Route1-regular pipeline: This pipeline will handle all spans that have no match in the routing table in the connector. Notice this uses routing as its only receiver and will recieve data thought its connection from the original traces pipeline.

    traces/route1-regular:         # Default pipeline for unmatched spans
      receivers: 
      - routing                    # Receive data from the routing connector
      processors:
      - memory_limiter             # Memory Limiter Processor
      - resource/add_mode          # Adds collector mode metadata
      - batch
      exporters:
      - debug                      # Debug Exporter 
      - file/traces/route1-regular # File Exporter for unmatched spans

Add the route2-security pipeline: This pipeline processes all spans that do match our rule "[deployment.environment"] == "security-applications" in the the routing rule. This pipeline is also using routing as its receiver. Add this pipline below the traces/route1-regular one.

    traces/route2-security:         # Default pipeline for unmatched spans
      receivers: 
      - routing                     # Receive data from the routing connector
      processors:
      - memory_limiter              # Memory Limiter Processor
      - resource/add_mode           # Adds collector mode metadata
      - batch
      exporters:
      - debug                       # Debug exporter
      - file/traces/route2-security # File exporter for unmatched spans

Validate the agent configuration using otelbin.io. For reference, the traces: section of your pipelines will look similar to this:

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
      REC1(&nbsp;&nbsp;&nbsp;otlp&nbsp;&nbsp;&nbsp;<br>fa:fa-download):::receiver
      PRO1(memory_limiter<br>fa:fa-microchip):::processor
      PRO2(memory_limiter<br>fa:fa-microchip):::processor
      PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
      PRO4(resource<br>fa:fa-microchip<br>add_mode):::processor
      PRO5(batch<br>fa:fa-microchip):::processor
      PRO6(batch<br>fa:fa-microchip):::processor
      EXP1(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload):::exporter
      EXP2(&emsp;&emsp;file&emsp;&emsp;<br>fa:fa-upload<br>traces):::exporter
      EXP3(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload):::exporter
      EXP4(&emsp;&emsp;file&emsp;&emsp;<br>fa:fa-upload<br>traces):::exporter
      ROUTE1(&nbsp;routing&nbsp;<br>fa:fa-route):::con-export
      ROUTE2(&nbsp;routing&nbsp;<br>fa:fa-route):::con-receive
      ROUTE3(&nbsp;routing&nbsp;<br>fa:fa-route):::con-receive
    %% Links
    subID1:::sub-traces
    subID2:::sub-traces
    subID3:::sub-traces
    subgraph " "
    direction LR
      subgraph subID1[**Traces**]
      REC1 --> ROUTE1
      end
      subgraph subID2[**Traces/route2-security**]
      ROUTE1 --> ROUTE2
      ROUTE2 --> PRO1
      PRO1 --> PRO3
      PRO3 --> PRO5
      PRO5 --> EXP1
      PRO5 --> EXP2
      end
      subgraph subID3[**Traces/route1-regular**]
      ROUTE1 --> ROUTE3
      ROUTE3 --> PRO2
      PRO2 --> PRO4
      PRO4 --> PRO6
      PRO6 --> EXP3
      PRO6 --> EXP4
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-traces stroke:#fbbf24,stroke-width:1px, color:#fbbf24,stroke-dasharray: 3 3;

6.3 Test Routing Connector

Exercise

In this section, we will test the routing rule configured for the Gateway. The expected result is that a span generated by the loadgen that match the "[deployment.environment"] == "security-applications" rule will be sent to the gateway-traces-route2-security.out file.

Start the Gateway: In your Gateway terminal window start the Gateway.

../otelcol --config gateway.yaml

Start the Agent: In your Agent terminal window start the Agent.

../otelcol --config agent.yaml

Send a Regular Span: In the Loadgen terminal window send a regular span using the loadgen:

../loadgen -count 1

Both the Agent and Gateway will display debug information. The gateway will also generate a new gateway-traces-route1-regular.out file, as this is now the designated destination for regular spans.

Tip

If you check gateway-traces-route1-regular.out, it will contain the span sent by loadgen. You will also see an empty gateway-traces-route2-security..out file, as the routing configuration creates output files immediately, even if no matching spans have been processed yet.

Send a Security Span: In the Loadgen terminal window send a security span using the security flag:

../loadgen -security -count 1

Again, both the Agent and Gateway should display debug information, including the span you just sent. This time, the Gateway will write a line to the gateway-traces-route2-security.out file, which is designated for spans where the deployment.environment resource attribute matches "security-applications".

jq -c '.resourceSpans[] as $resource | $resource.scopeSpans[].spans[] | {spanId: .spanId, deploymentEnvironment: ($resource.resource.attributes[] | select(.key == "deployment.environment") | .value.stringValue)}' gateway-traces-route2-security.out

{"spanId":"cb799e92e26d5782","deploymentEnvironment":"security-applications"}

You can repeat this scenario multiple times, and each trace will be written to its corresponding output file.

Important

Stop the Agent and the Gateway processes by pressing Ctrl-C in their respective terminals.

Conclusion

In this section, we successfully tested the routing connector in the gateway by sending different spans and verifying their destinations.

Regular spans were correctly routed to gateway-traces-route1-regular.out, confirming that spans without a matching deployment.environment attribute follow the default pipeline.
Security-related spans were routed to gateway-traces-route2-security.out, demonstrating that the routing rule based on "deployment.environment": "security-applications" works as expected.

By inspecting the output files, we confirmed that the OpenTelemetry Collector correctly evaluates span attributes and routes them to the appropriate destinations. This validates that routing rules can effectively separate and direct telemetry data for different use cases.

You can now extend this approach by defining additional routing rules to further categorize spans, metrics, and logs based on different attributes.

7. Create metrics with Count Connector

10 minutes

In this section, we’ll explore how to use the Count Connector to extract attribute values from logs and convert them into meaningful metrics.

Specifically, we’ll use the Count Connector to track the number of “Star Wars” and “Lord of the Rings” quotes appearing in our logs, turning them into measurable data points.

Exercise

Important

Change ALL terminal windows to the 7-sum-count directory and run the clear command.

Copy *.yaml from the 6-routing-data directory into 7-sum-count. Your updated directory structure will now look like this:

.
├── agent.yaml
└── gateway.yaml

Update the agent.yaml to change the frequency that we read logs. Find the filelog/quotes receiver in the agent.yaml and add a poll_interval attribute:

  filelog/quotes:                      # Receiver Type/Name
    poll_interval: 10s                 # Only read every ten seconds

The reason for the delay is that the Count Connector in the OpenTelemetry Collector counts logs only within each processing interval. This means that every time the data is read, the count resets to zero for the next interval. With the default Filelog reciever interval of 200ms, it reads every line the loadgen writes, giving us counts of 1. With this interval we make sure we have multiple entries to count.

The Collector can maintain a running count for each read interval by omitting conditions, as shown below. However, it’s best practice to let your backend handle running counts since it can track them over a longer time period.

Exercise

Add the Count Connector

Include the Count Connector in the connector’s section of your configuration and define the metrics counters we want to use:

connectors:
  count:
    logs:
      logs.full.count:
        description: "Running count of all logs read in interval"
      logs.sw.count:
        description: "StarWarsCount"
        conditions:
        - attributes["movie"] == "SW"
      logs.lotr.count:
        description: "LOTRCount"
        conditions:
        - attributes["movie"] == "LOTR"
      logs.error.count:
        description: "ErrorCount"
        conditions:
        - attributes["level"] == "ERROR"

Explanation of the Metrics Counters
- logs.full.count: Tracks the total number of logs processed during each read interval.
  Since this metric has no filtering conditions, every log that passes through the system is included in the count.
- logs.sw.count Counts logs that contain a quote from a Star Wars movie.
- logs.lotr.count: Counts logs that contain a quote from a Lord of the Rings movie.
- logs.error.count: Represents a real-world scenario by counting logs with a severity level of ERROR for the read interval.
Configure the Count Connector in the pipelines
In the pipeline configuration below, the connector exporter is added to the logs section, while the connector receiver is added to the metrics section.

  pipelines:
    traces:
      receivers:
      - otlp
      processors:
      - memory_limiter
      - attributes/update              # Update, hash, and remove attributes
      - redaction/redact               # Redact sensitive fields using regex
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - file
      - otlphttp
    metrics:
      receivers:
      - count                           # Count Connector that receives count metric from logs count exporter in logs pipeline. 
      - otlp
      #- hostmetrics                    # Host Metrics Receiver
      processors:
      - memory_limiter
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - otlphttp
    logs:
      receivers:
      - otlp
      - filelog/quotes
      processors:
      - memory_limiter
      - resourcedetection
      - resource/add_mode
      - transform/logs                 # Transform logs processor
      - batch
      exporters:
      - count                          # Count Connector that exports count as a metric to metrics pipeline.
      - debug
      - otlphttp

We count logs based on their attributes. If your log data is stored in the log body instead of attributes, you’ll need to use a Transform processor in your pipeline to extract key/value pairs and add them as attributes.

In this workshop, we’ve already added merge_maps(attributes, cache, "upsert") in the 05-transform-data section. This ensures that all relevant data is included in the log attributes for processing.

When selecting fields to create attributes from, be mindful—adding all fields indiscriminately is generally not ideal for production environments. Instead, choose only the fields that are truly necessary to avoid unnecessary data clutter.

Exercise

Validate the agent configuration using otelbin.io. For reference, the logs and metrics: sections of your pipelines will look like this:

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
    REC1(otlp<br>fa:fa-download):::receiver
    REC2(filelog<br>fa:fa-download<br>quotes):::receiver
    REC3(otlp<br>fa:fa-download):::receiver
    PRO1(memory_limiter<br>fa:fa-microchip):::processor
    PRO2(memory_limiter<br>fa:fa-microchip):::processor
    PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
    PRO4(resource<br>fa:fa-microchip<br>add_mode):::processor
    PRO5(batch<br>fa:fa-microchip):::processor
    PRO6(batch<br>fa:fa-microchip):::processor
    PRO7(resourcedetection<br>fa:fa-microchip):::processor
    PRO8(resourcedetection<br>fa:fa-microchip):::processor
    PRO9(transfrom<br>fa:fa-microchip<br>logs):::processor
    EXP1(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload):::exporter
    EXP2(&emsp;&emsp;otlphttp&emsp;&emsp;<br>fa:fa-upload):::exporter
    EXP3(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload):::exporter
    EXP4(&emsp;&emsp;otlphttp&emsp;&emsp;<br>fa:fa-upload):::exporter
    ROUTE1(&nbsp;count&nbsp;<br>fa:fa-route):::con-export
    ROUTE2(&nbsp;count&nbsp;<br>fa:fa-route):::con-receive

    %% Links
    subID1:::sub-logs
    subID2:::sub-metrics
    subgraph " " 
      direction LR
      subgraph subID1[**Logs**]
      direction LR
      REC1 --> PRO1
      REC2 --> PRO1
      PRO1 --> PRO7
      PRO7 --> PRO3
      PRO3 --> PRO9
      PRO9 --> PRO5
      PRO5 --> ROUTE1
      PRO5 --> EXP1
      PRO5 --> EXP2
      end
      
      subgraph subID2[**Metrics**]
      direction LR
      ROUTE1 --> ROUTE2       
      ROUTE2 --> PRO2
      REC3 --> PRO2
      PRO2 --> PRO8
      PRO8 --> PRO4
      PRO4 --> PRO6
      PRO6 --> EXP3
      PRO6 --> EXP4
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-logs stroke:#34d399,stroke-width:1px, color:#34d399,stroke-dasharray: 3 3;
classDef sub-metrics stroke:#38bdf8,stroke-width:1px, color:#38bdf8,stroke-dasharray: 3 3;

7.1 Testing the Count Connector

Exercise

Start the Gateway:
In the Gateway terminal window run:

../otelcol --config=gateway.yaml

Start the Agent:
In the Agent terminal window run:

../otelcol --config=agent.yaml

Send 12 Logs lines with the Loadgen:
In the Spans terminal window send 12 log lines, they should be read in two intervals. Do this with the following loadgen command:

../loadgen -logs -json -count 12

Both the Agent and Gateway will display debug information, showing they are processing data. Wait until the loadgen completes.

Verify metrics have been generated
As the logs are processed, the Agent generates metrics and forwards them to the Gateway, which then writes them to gateway-metrics.out.

To check if the metrics logs.full.count, logs.sw.count, logs.lotr.count, and logs.error.count are present in the output, run the following jq query:

jq '.resourceMetrics[].scopeMetrics[].metrics[]
    | select(.name == "logs.full.count" or .name == "logs.sw.count" or .name == "logs.lotr.count" or .name == "logs.error.count")
    | {name: .name, value: (.sum.dataPoints[0].asInt // "-")}' gateway-metrics.out

{
  "name": "logs.sw.count",
  "value": "2"
}
{
  "name": "logs.lotr.count",
  "value": "2"
}
{
  "name": "logs.full.count",
  "value": "4"
}
{
  "name": "logs.error.count",
  "value": "2"
}
{
  "name": "logs.error.count",
  "value": "1"
}
{
  "name": "logs.sw.count",
  "value": "2"
}
{
  "name": "logs.lotr.count",
  "value": "6"
}
{
  "name": "logs.full.count",
  "value": "8"
}

Tip

Note: the logs.full.count normally is equal to logs.sw.count + logs.lotr.count, while the logs.error.count will be a random number.

Important

Stop the Agent and the Gateway processes by pressing Ctrl-C in their respective terminals.

7.2 Create metrics with Sum Connector

10 minutes

In this section, we’ll explore how the Sum Connector can extract values from spans and convert them into metrics.

We’ll specifically use the credit card charges from our base spans and leverage the Sum Connector to retrieve the total charges as a metric.

The connector can be used to collect (sum) attribute values from spans, span events, metrics, data points, and log records. It captures each individual value, transforms it into a metric, and passes it along. However, it’s the backend’s job to use these metrics and their attributes for calculations and further processing.

Exercise

Switch to your Agent terminal window and open the agent.yaml file in your editor.

Add the Sum Connector
Include the Sum Connector in the connectors section of your configuration and define the metrics counters:

  sum:
    spans:
       user.card-charge:
        source_attribute: payment.amount
        conditions:
          - attributes["payment.amount"] != "NULL"
        attributes:
          - key: user.name

In the example above, we check for the payment.amount attribute in spans. If it has a valid value, the Sum connector generates a metric called user.card-charge and includes the user.name as an attribute. This enables the backend to track and display a user’s total charges over an extended period, such as a billing cycle.

In the pipeline configuration below, the connector exporter is added to the traces section, while the connector receiver is added to the metrics section.

Exercise

Configure the Count Connector in the pipelines

  pipelines:
    traces:
      receivers:
      - otlp
      processors:
      - memory_limiter
      - attributes/update              # Update, hash, and remove attributes
      - redaction/redact               # Redact sensitive fields using regex
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - file
      - otlphttp
      - sum                            # Sum connector which aggregates payment.amount from spans and sends to metrics pipeline
    metrics:
      receivers:
      - sum                            # Receives metrics from the sum exporter in the traces pipeline
      - count                          # Receives count metric from logs count exporter in logs pipeline. 
      - otlp
      #- hostmetrics                   # Host Metrics Receiver
      processors:
      - memory_limiter
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - otlphttp
    logs:
      receivers:
      - otlp
      - filelog/quotes
      processors:
      - memory_limiter
      - resourcedetection
      - resource/add_mode
      - transform/logs                 # Transform logs processor
      - batch
      exporters:
      - count                          # Count Connector that exports count as a metric to metrics pipeline.
      - debug
      - otlphttp

Validate the agent configuration using otelbin.io. For reference, the traces and metrics: sections of your pipelines will look like this:

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
    REC1(otlp<br>fa:fa-download<br> ):::receiver
    REC3(otlp<br>fa:fa-download<br> ):::receiver
    PRO1(memory_limiter<br>fa:fa-microchip<br> ):::processor
    PRO2(memory_limiter<br>fa:fa-microchip<br> ):::processor
    PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
    PRO4(resource<br>fa:fa-microchip<br>add_mode):::processor
    PRO5(batch<br>fa:fa-microchip<br> ):::processor
    PRO6(batch<br>fa:fa-microchip<br> ):::processor
    PRO7(resourcedetection<br>fa:fa-microchip<br> ):::processor
    PRO8(resourcedetection<br>fa:fa-microchip<br>):::processor

    PROA(attributes<br>fa:fa-microchip<br>redact):::processor
    PROB(redaction<br>fa:fa-microchip<br>update):::processor
    EXP1(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload<br> ):::exporter
    EXP2(&emsp;&emsp;file&emsp;&emsp;<br>fa:fa-upload<br> ):::exporter
    EXP3(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload<br> ):::exporter
    EXP4(&emsp;&emsp;otlphttp&emsp;&emsp;<br>fa:fa-upload<br> ):::exporter
    EXP5(&emsp;&emsp;otlphttp&emsp;&emsp;<br>fa:fa-upload<br> ):::exporter
    ROUTE1(&nbsp;sum&nbsp;<br>fa:fa-route<br> ):::con-export
    ROUTE2(&nbsp;count&nbsp;<br>fa:fa-route<br> ):::con-receive
    ROUTE3(&nbsp;sum&nbsp;<br>fa:fa-route<br> ):::con-receive

    %% Links
    subID1:::sub-traces
    subID2:::sub-metrics
    subgraph " " 
      direction LR
      subgraph subID1[**Traces**]
      direction LR
      REC1 --> PRO1
      PRO1 --> PROA
      PROA --> PROB
      PROB --> PRO7
      PRO7 --> PRO3
      PRO3 --> PRO5 
      PRO5 --> EXP1
      PRO5 --> EXP2
      PRO5 --> EXP5
      PRO5 --> ROUTE1
      end
      
      subgraph subID2[**Metrics**]
      direction LR
      ROUTE1 --> ROUTE3
      ROUTE3 --> PRO2       
      ROUTE2 --> PRO2
      REC3 --> PRO2
      PRO2 --> PRO8
      PRO8 --> PRO4
      PRO4 --> PRO6
      PRO6 --> EXP3
      PRO6 --> EXP4
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-logs stroke:#34d399,stroke-width:1px, color:#34d399,stroke-dasharray: 3 3;
classDef sub-traces stroke:#fbbf24,stroke-width:1px, color:#fbbf24,stroke-dasharray: 3 3;
classDef sub-metrics stroke:#38bdf8,stroke-width:1px, color:#38bdf8,stroke-dasharray: 3 3;

7.3 Testing the Count Connector

Exercise

Start the Gateway:
In the Gateway terminal window run:

../otelcol --config=gateway.yaml

Start the Agent:
In the Agent terminal window run:

../otelcol --config=agent.yaml

Start the Loadgen:
In the Spans terminal window send 8 spans with the following loadgen command:

../loadgen -count 8

Both the Agent and Gateway will display debug information, showing they are processing data. Wait until the loadgen completes.

Verify the metrics:
While processing the span, the Agent, has generated metrics and passed them on to the Gateway. The Gateway has written them into gateway-metrics.out.

To confirm the presence of user.card-chargeand verify each one has an attribute user.name in the metrics output, run the following jq query:

jq -r '.resourceMetrics[].scopeMetrics[].metrics[] | select(.name == "user.card-charge") | .sum.dataPoints[] | "\(.attributes[] | select(.key == "user.name").value.stringValue)\t\(.asDouble)"' gateway-metrics.out | while IFS=$'\t' read -r name charge; do
    printf "%-20s %s\n" "$name" "$charge"
done

George Lucas         67.49
Frodo Baggins        87.14
Thorin Oakenshield   90.98
Luke Skywalker       51.37
Luke Skywalker       65.56
Thorin Oakenshield   67.5
Thorin Oakenshield   66.66
Peter Jackson        94.39

Important

Stop the Agent and the Gateway processes by pressing Ctrl-C in their respective terminals.

OpenTelemetry Collector Workshops

Subsections of OpenTelemetry Collector Workshops

Making Your Observability Cloud Native With OpenTelemetry

Abstract

Ninja Sections

Target Audience

Prerequisites

Learning Objectives

OpenTelemetry Architecture

Subsections of OpenTelemetry Collector Concepts

Installing OpenTelemetry Collector Contrib

Download the OpenTelemetry Collector Contrib distribution

Install the OpenTelemetry Collector Contrib distribution

Subsections of 1. Installation

Installing OpenTelemetry Collector Contrib

Confirm the Collector is running

Why build your own collector?

Benefits of building your own collector?

Considerations for building your collector?

The Ninja Zone

References

Default configuration

OpenTelemetry Collector Extensions

Subsections of 2. Extensions

OpenTelemetry Collector Extensions

Health Check

OpenTelemetry Collector Extensions

Performance Profiler

OpenTelemetry Collector Extensions

zPages

Why queue data to disk?

Considerations for queuing data to disk?

References

Configuration Check-in

OpenTelemetry Collector Receivers

Subsections of 3. Receivers

OpenTelemetry Collector Receivers

Host Metrics Receiver

OpenTelemetry Collector Receivers

Prometheus Receiver

Example Dashboard - Prometheus metrics

OpenTelemetry Collector Receivers

Other Receivers

What do we need?

Things to consider?

The Ninja Zone

Configuration Check-in

OpenTelemetry Collector Processors

Subsections of 4. Processors

OpenTelemetry Collector Processors

Batch Processor

OpenTelemetry Collector Processors

Resource Detection Processor

OpenTelemetry Collector Processors

Attributes Processor

Why a connector instead of a processor?

Considerations with connectors

References

Configuration Check-in

OpenTelemetry Collector Exporters

Subsections of 5. Exporters

OpenTelemetry Collector Exporters

OTLP HTTP Exporter

Configuration Check-in

OpenTelemetry Collector Service

Subsections of 6. Service

OpenTelemetry Collector Service

Hostmetrics Receiver

OpenTelemetry Collector Service

Prometheus Internal Receiver

OpenTelemetry Collector Service

Resource Detection Processor

OpenTelemetry Collector Service

Attributes Processor

OpenTelemetry Collector Service

OTLP HTTP Exporter

Why monitor the collector?

Considerations

The Ninja Zone

References