Making Your Observability Cloud Native With OpenTelemetry
1 hourAuthor
Robert Castley
Abstract
Organizations getting started with OpenTelemetry may begin by sending data directly to an observability backend. While this works well for initial testing, using the OpenTelemetry collector as part of your observability architecture provides numerous benefits and is recommended for any production deployment.
In this workshop, we will be focusing on using the OpenTelemetry collector and starting with the fundamentals of configuring the receivers, processors, and exporters ready to use with Splunk Observability Cloud. The journey will take attendees from novices to being able to start adding custom components to help solve for their business observability needs for their distributed platform.
Ninja Sections
Throughout the workshop there will be expandable Ninja Sections, these will be more hands on and go into further technical detail that you can explore within the workshop or in your own time.
Please note that the content in these sections may go out of date due to the frequent development being made to the OpenTelemetry project. Links will be provided in the event details are out of sync, please let us know if you spot something that needs updating.
Ninja: Test Me!
By completing this workshop you will officially be an OpenTelemetry Collector Ninja!
Target Audience
This interactive workshop is for developers and system administrators who are interested in learning more about architecture and deployment of the OpenTelemetry Collector.
Prerequisites
Attendees should have a basic understanding of data collection
Command line and vim/vi experience.
A instance/host/VM running Ubuntu 20.04 LTS or 22.04 LTS.
Minimum requirements are an AWS/EC2 t2.micro (1 CPU, 1GB RAM, 8GB Storage)
Learning Objectives
By the end of this talk, attendees will be able to:
Understand the components of OpenTelemetry
Use receivers, processors, and exporters to collect and analyze data
Identify the benefits of using OpenTelemetry
Build a custom component to solve their business needs
Download the OpenTelemetry Collector Contrib distribution
The first step in installing the Open Telemetry Collector is downloading it. For our lab, we will use the wget command to download the .deb package from the OpenTelemetry Github repository.
Selecting previously unselected package otelcol-contrib.
(Reading database ... 89232 files and directories currently installed.)
Preparing to unpack otelcol-contrib_0.111.0_linux_amd64.deb ...
Unpacking otelcol-contrib (0.111.0) ...
Setting up otelcol-contrib (0.111.0) ...
Created symlink /etc/systemd/system/multi-user.target.wants/otelcol-contrib.service → /lib/systemd/system/otelcol-contrib.service.
Subsections of 1. Installation
Installing OpenTelemetry Collector Contrib
Confirm the Collector is running
The collector should now be running. We will verify this as root using systemctl command. To exit the status just press q.
sudo systemctl status otelcol-contrib
● otelcol-contrib.service - OpenTelemetry Collector Contrib
Loaded: loaded (/lib/systemd/system/otelcol-contrib.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2024-10-07 10:27:49 BST; 52s ago
Main PID: 17113 (otelcol-contrib)
Tasks: 13 (limit: 19238)
Memory: 34.8M
CPU: 155ms
CGroup: /system.slice/otelcol-contrib.service
└─17113 /usr/bin/otelcol-contrib --config=/etc/otelcol-contrib/config.yaml
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: Descriptor:
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: -> Name: up
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: -> Description: The scraping was successful
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: -> Unit:
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: -> DataType: Gauge
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: NumberDataPoints #0
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: Timestamp: 2024-10-07 09:28:36.942 +0000 UTC
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: Value: 1.000000
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: {"kind": "exporter", "data_type": "metrics", "name": "debug"}
Because we will be making multiple configuration file changes, setting environment variables and restarting the collector, we need to stop the collector service and disable it from starting on boot.
An alternative approach would be to use the golang tool chain to build the binary locally by doing:
go install go.opentelemetry.io/collector/cmd/builder@v0.80.0
mv $(go env GOPATH)/bin/builder /usr/bin/ocb
(Optional) Docker
Why build your own collector?
The default distribution of the collector (core and contrib) either contains too much or too little in what they have to offer.
It is also not advised to run the contrib collector in your production environments due to the amount of components installed which more than likely are not needed by your deployment.
Benefits of building your own collector?
When creating your own collector binaries, (commonly referred to as distribution), means you build what you need.
The benefits of this are:
Smaller sized binaries
Can use existing go scanners for vulnerabilities
Include internal components that can tie in with your organization
Considerations for building your collector?
Now, this would not be a 🥷 Ninja zone if it didn’t come with some drawbacks:
Go experience is recommended if not required
No Splunk support
Responsibility for distribution and lifecycle management
It is important to note that the project is working towards stability but it does not mean changes made will not break your workflow. The team at Splunk provides increased support and a higher level of stability so they can provide a curated experience helping you with your deployment needs.
The Ninja Zone
Once you have all the required tools installed to get started, you will need to create a new file named otelcol-builder.yaml and we will follow this directory structure:
.
└── otelcol-builder.yaml
Once we have the file created, we need to add a list of components for it to install with some additional metadata.
For this example, we are going to create a builder manifest that will install only the components we need for the introduction config:
OpenTelemetry is configured through YAML files. These files have default configurations that we can modify to meet our needs. Let’s look at the default configuration that is supplied:
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacksextensions:health_check:pprof:endpoint:0.0.0.0:1777zpages:endpoint:0.0.0.0:55679receivers:otlp:protocols:grpc:endpoint:0.0.0.0:4317http:endpoint:0.0.0.0:4318opencensus:endpoint:0.0.0.0:55678# Collect own metricsprometheus:config:scrape_configs:- job_name:'otel-collector'scrape_interval:10sstatic_configs:- targets:['0.0.0.0:8888']jaeger:protocols:grpc:endpoint:0.0.0.0:14250thrift_binary:endpoint:0.0.0.0:6832thrift_compact:endpoint:0.0.0.0:6831thrift_http:endpoint:0.0.0.0:14268zipkin:endpoint:0.0.0.0:9411processors:batch:exporters:debug:verbosity:detailedservice:pipelines:traces:receivers:[otlp, opencensus, jaeger, zipkin]processors:[batch]exporters:[debug]metrics:receivers:[otlp, opencensus, prometheus]processors:[batch]exporters:[debug]logs:receivers:[otlp]processors:[batch]exporters:[debug]extensions:[health_check, pprof, zpages]
Congratulations! You have successfully downloaded and installed the OpenTelemetry Collector. You are well on your way to becoming an OTel Ninja. But first let’s walk through configuration files and different distributions of the OpenTelemetry Collector.
Note
Splunk does provide its own, fully supported, distribution of the OpenTelemetry Collector. This distribution is available to install from the Splunk GitHub Repository or via a wizard in Splunk Observability Cloud that will build out a simple installation script to copy and paste. This distribution includes many additional features and enhancements that are not available in the OpenTelemetry Collector Contrib distribution.
The Splunk Distribution of the OpenTelemetry Collector is production-tested; it is in use by the majority of customers in their production environments.
Customers that use our distribution can receive direct help from official Splunk support within SLAs.
Customers can use or migrate to the Splunk Distribution of the OpenTelemetry Collector without worrying about future breaking changes to its core configuration experience for metrics and traces collection (OpenTelemetry logs collection configuration is in beta). There may be breaking changes to the Collector’s metrics.
We will now walk through each section of the configuration file and modify it to send host metrics to Splunk Observability Cloud.
OpenTelemetry Collector Extensions
Now that we have the OpenTelemetry Collector installed, let’s take a look at extensions for the OpenTelemetry Collector. Extensions are optional and available primarily for tasks that do not involve processing telemetry data. Examples of extensions include health monitoring, service discovery, and data forwarding.
Extensions are configured in the same config.yaml file that we referenced in the installation step. Let’s edit the config.yaml file and configure the extensions. Note that the pprof and zpages extensions are already configured in the default config.yaml file. For the purpose of this workshop, we will only be updating the health_check extension to expose the port on all network interfaces on which we can access the health of the collector.
This extension enables an HTTP URL that can be probed to check the status of the OpenTelemetry Collector. This extension can be used as a liveness and/or readiness probe on Kubernetes. To learn more about the curl command, check out the curl man page.
Open a new terminal session and SSH into your instance to run the following command:
Performance Profiler extension enables the golang net/http/pprof endpoint. This is typically used by developers to collect performance profiles and investigate issues with the service. We will not be covering this in this workshop.
OpenTelemetry Collector Extensions
zPages
zPages are an in-process alternative to external exporters. When included, they collect and aggregate tracing and metrics information in the background; this data is served on web pages when requested. zPages are an extremely useful diagnostic feature to ensure the collector is running as expected.
ServiceZ gives an overview of the collector services and quick access to the pipelinez, extensionz, and featurez zPages. The page also provides build and runtime information.
PipelineZ provides insights on the running pipelines running in the collector. You can find information on type, if data is mutated, and you can also see information on the receivers, processors and exporters that are used for each pipeline.
Ninja: Improve data durability with storage extension
For this, we will need to validate that our distribution has the file_storage extension installed. This can be done by running the command otelcol-contrib components which should show results like:
# ... truncated for clarityextensions:- file_storage
This extension provides exporters the ability to queue data to disk in the event that exporter is unable to send data to the configured endpoint.
In order to configure the extension, you will need to update your config to include the information below. First, be sure to create a /tmp/otel-data directory and give it read/write permissions:
extensions:...file_storage:directory:/tmp/otel-datatimeout:10scompaction:directory:/tmp/otel-dataon_start:trueon_rebound:truerebound_needed_threshold_mib:5rebound_trigger_threshold_mib:3# ... truncated for clarityservice:extensions:[health_check, pprof, zpages, file_storage]
Why queue data to disk?
This allows the collector to weather network interruptions (and even collector restarts) to ensure data is sent to the upstream provider.
Considerations for queuing data to disk?
There is a potential that this could impact data throughput performance due to disk performance.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacksextensions:health_check:endpoint:0.0.0.0:13133pprof:endpoint:0.0.0.0:1777zpages:endpoint:0.0.0.0:55679receivers:otlp:protocols:grpc:endpoint:0.0.0.0:4317http:endpoint:0.0.0.0:4318opencensus:endpoint:0.0.0.0:55678# Collect own metricsprometheus:config:scrape_configs:- job_name:'otel-collector'scrape_interval:10sstatic_configs:- targets:['0.0.0.0:8888']jaeger:protocols:grpc:endpoint:0.0.0.0:14250thrift_binary:endpoint:0.0.0.0:6832thrift_compact:endpoint:0.0.0.0:6831thrift_http:endpoint:0.0.0.0:14268zipkin:endpoint:0.0.0.0:9411processors:batch:exporters:debug:verbosity:detailedservice:pipelines:traces:receivers:[otlp, opencensus, jaeger, zipkin]processors:[batch]exporters:[debug]metrics:receivers:[otlp, opencensus, prometheus]processors:[batch]exporters:[debug]logs:receivers:[otlp]processors:[batch]exporters:[debug]extensions:[health_check, pprof, zpages]
Now that we have reviewed extensions, let’s dive into the data pipeline portion of the workshop. A pipeline defines a path the data follows in the Collector starting from reception, moving to further processing or modification, and finally exiting the Collector via exporters.
The data pipeline in the OpenTelemetry Collector is made up of receivers, processors, and exporters. We will first start with receivers.
OpenTelemetry Collector Receivers
Welcome to the receiver portion of the workshop! This is the starting point of the data pipeline of the OpenTelemetry Collector. Let’s dive in.
A receiver, which can be push or pull based, is how data gets into the Collector. Receivers may support one or more data sources. Generally, a receiver accepts data in a specified format, translates it into the internal format and passes it to processors and exporters defined in the applicable pipelines.
The Host Metrics Receiver generates metrics about the host system scraped from various sources. This is intended to be used when the collector is deployed as an agent which is what we will be doing in this workshop.
Let’s update the /etc/otel-contrib/config.yaml file and configure the hostmetrics receiver. Insert the following YAML under the receivers section, taking care to indent by two spaces.
sudo vi /etc/otelcol-contrib/config.yaml
receivers:hostmetrics:collection_interval:10sscrapers:# CPU utilization metricscpu:# Disk I/O metricsdisk:# File System utilization metricsfilesystem:# Memory utilization metricsmemory:# Network interface I/O metrics & TCP connection metricsnetwork:# CPU load metricsload:# Paging/Swap space utilization and I/O metricspaging:# Process count metricsprocesses:# Per process CPU, Memory and Disk I/O metrics. Disabled by default.# process:
OpenTelemetry Collector Receivers
Prometheus Receiver
You will also notice another receiver called prometheus. Prometheus is an open-source toolkit used by the OpenTelemetry Collector. This receiver is used to scrape metrics from the OpenTelemetry Collector itself. These metrics can then be used to monitor the health of the collector.
Let’s modify the prometheus receiver to clearly show that it is for collecting metrics from the collector itself. By changing the name of the receiver from prometheus to prometheus/internal, it is now much clearer as to what that receiver is doing. Update the configuration file to look like this:
The following screenshot shows an example dashboard of some of the metrics the Prometheus internal receiver collects from the OpenTelemetry Collector. Here, we can see accepted and sent spans, metrics and log records.
Note
The following screenshot is an out-of-the-box (OOTB) dashboard from Splunk Observability Cloud that allows you to easily monitor your Splunk OpenTelemetry Collector install base.
OpenTelemetry Collector Receivers
Other Receivers
You will notice in the default configuration there are other receivers: otlp, opencensus, jaeger and zipkin. These are used to receive telemetry data from other sources. We will not be covering these receivers in this workshop and they can be left as they are.
Ninja: Create receivers dynamically
To help observe short lived tasks like docker containers, kubernetes pods, or ssh sessions, we can use the receiver creator with observer extensions to create a new receiver as these services start up.
What do we need?
In order to start using the receiver creator and its associated observer extensions, they will need to be part of your collector build manifest.
Some short lived tasks may require additional configuration such as username, and password.
These values can be referenced via environment variables,
or use a scheme expand syntax such as ${file:./path/to/database/password}.
Please adhere to your organisation’s secret practices when taking this route.
The Ninja Zone
There are only two things needed for this ninja zone:
Make sure you have added receiver creater and observer extensions to the builder manifest.
Create the config that can be used to match against discovered endpoints.
To create the templated configurations, you can do the following:
receiver_creator:watch_observers:[host_observer]receivers:redis:rule:type == "port" && port == 6379config:password:${env:HOST_REDIS_PASSWORD}
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacksextensions:health_check:endpoint:0.0.0.0:13133pprof:endpoint:0.0.0.0:1777zpages:endpoint:0.0.0.0:55679receivers:hostmetrics:collection_interval:10sscrapers:# CPU utilization metricscpu:# Disk I/O metricsdisk:# File System utilization metricsfilesystem:# Memory utilization metricsmemory:# Network interface I/O metrics & TCP connection metricsnetwork:# CPU load metricsload:# Paging/Swap space utilization and I/O metricspaging:# Process count metricsprocesses:# Per process CPU, Memory and Disk I/O metrics. Disabled by default.# process:otlp:protocols:grpc:endpoint:0.0.0.0:4317http:endpoint:0.0.0.0:4318opencensus:endpoint:0.0.0.0:55678# Collect own metricsprometheus/internal:config:scrape_configs:- job_name:'otel-collector'scrape_interval:10sstatic_configs:- targets:['0.0.0.0:8888']jaeger:protocols:grpc:endpoint:0.0.0.0:14250thrift_binary:endpoint:0.0.0.0:6832thrift_compact:endpoint:0.0.0.0:6831thrift_http:endpoint:0.0.0.0:14268zipkin:endpoint:0.0.0.0:9411processors:batch:exporters:debug:verbosity:detailedservice:pipelines:traces:receivers:[otlp, opencensus, jaeger, zipkin]processors:[batch]exporters:[debug]metrics:receivers:[otlp, opencensus, prometheus]processors:[batch]exporters:[debug]logs:receivers:[otlp]processors:[batch]exporters:[debug]extensions:[health_check, pprof, zpages]
Now that we have reviewed how data gets into the OpenTelemetry Collector through receivers, let’s now take a look at how the Collector processes the received data.
Warning
As the /etc/otelcol-contrib/config.yaml is not complete, please do not attempt to restart the collector at this point.
OpenTelemetry Collector Processors
Processors are run on data between being received and being exported. Processors are optional though some are recommended. There are a large number of processors included in the OpenTelemetry contrib Collector.
By default, only the batch processor is enabled. This processor is used to batch up data before it is exported. This is useful for reducing the number of network calls made to exporters. For this workshop, we will inherit the following defaults which are hard-coded into the Collector:
send_batch_size (default = 8192): Number of spans, metric data points, or log records after which a batch will be sent regardless of the timeout. send_batch_size acts as a trigger and does not affect the size of the batch. If you need to enforce batch size limits sent to the next component in the pipeline see send_batch_max_size.
timeout (default = 200ms): Time duration after which a batch will be sent regardless of size. If set to zero, send_batch_size is ignored as data will be sent immediately, subject to only send_batch_max_size.
send_batch_max_size (default = 0): The upper limit of the batch size. 0 means no upper limit on the batch size. This property ensures that larger batches are split into smaller units. It must be greater than or equal to send_batch_size.
The resourcedetection processor can be used to detect resource information from the host and append or override the resource value in telemetry data with this information.
By default, the hostname is set to the FQDN if possible, otherwise, the hostname provided by the OS is used as a fallback. This logic can be changed from using using the hostname_sources configuration option. To avoid getting the FQDN and use the hostname provided by the OS, we will set the hostname_sources to os.
If the workshop instance is running on an AWS/EC2 instance we can gather the following tags from the EC2 metadata API (this is not available on other platforms).
cloud.provider ("aws")
cloud.platform ("aws_ec2")
cloud.account.id
cloud.region
cloud.availability_zone
host.id
host.image.id
host.name
host.type
We will create another processor to append these tags to our metrics.
The attributes processor modifies attributes of a span, log, or metric. This processor also supports the ability to filter and match input data to determine if they should be included or excluded for specified actions.
It takes a list of actions that are performed in the order specified in the config. The supported actions are:
insert: Inserts a new attribute in input data where the key does not already exist.
update: Updates an attribute in input data where the key does exist.
upsert: Performs insert or update. Inserts a new attribute in input data where the key does not already exist and updates an attribute in input data where the key does exist.
delete: Deletes an attribute from the input data.
hash: Hashes (SHA1) an existing attribute value.
extract: Extracts values using a regular expression rule from the input key to target keys specified in the rule. If a target key already exists, it will be overridden.
We are going to create an attributes processor to insert a new attribute to all our host metrics called participant.name with a value of your name e.g. marge_simpson.
Warning
Ensure you replace INSERT_YOUR_NAME_HERE with your name and also ensure you do not use spaces in your name.
Later on in the workshop, we will use this attribute to filter our metrics in Splunk Observability Cloud.
One of the most recent additions to the collector was the notion of a connector, which allows you to join the output of one pipeline to the input of another pipeline.
An example of how this is beneficial is that some services emit metrics based on the amount of datapoints being exported, the number of logs containing an error status,
or the amount of data being sent from one deployment environment. The count connector helps address this for you out of the box.
Why a connector instead of a processor?
A processor is limited in what additional data it can produce considering it has to pass on the data it has processed making it hard to expose additional information. Connectors do not have to emit the data they receive which means they provide an opportunity to create the insights we are after.
For example, a connector could be made to count the number of logs, metrics, and traces that do not have the deployment environment attribute.
A very simple example with the output of being able to break down data usage by deployment environment.
Considerations with connectors
A connector only accepts data exported from one pipeline and receiver by another pipeline, this means you may have to consider how you construct your collector config to take advantage of it.
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacksextensions:health_check:endpoint:0.0.0.0:13133pprof:endpoint:0.0.0.0:1777zpages:endpoint:0.0.0.0:55679receivers:hostmetrics:collection_interval:10sscrapers:# CPU utilization metricscpu:# Disk I/O metricsdisk:# File System utilization metricsfilesystem:# Memory utilization metricsmemory:# Network interface I/O metrics & TCP connection metricsnetwork:# CPU load metricsload:# Paging/Swap space utilization and I/O metricspaging:# Process count metricsprocesses:# Per process CPU, Memory and Disk I/O metrics. Disabled by default.# process:otlp:protocols:grpc:endpoint:0.0.0.0:4317http:endpoint:0.0.0.0:4318opencensus:endpoint:0.0.0.0:55678# Collect own metricsprometheus/internal:config:scrape_configs:- job_name:'otel-collector'scrape_interval:10sstatic_configs:- targets:['0.0.0.0:8888']jaeger:protocols:grpc:endpoint:0.0.0.0:14250thrift_binary:endpoint:0.0.0.0:6832thrift_compact:endpoint:0.0.0.0:6831thrift_http:endpoint:0.0.0.0:14268zipkin:endpoint:0.0.0.0:9411processors:batch:resourcedetection/system:detectors:[system]system:hostname_sources:[os]resourcedetection/ec2:detectors:[ec2]attributes/conf:actions:- key:participant.nameaction:insertvalue:"INSERT_YOUR_NAME_HERE"exporters:debug:verbosity:detailedservice:pipelines:traces:receivers:[otlp, opencensus, jaeger, zipkin]processors:[batch]exporters:[debug]metrics:receivers:[otlp, opencensus, prometheus]processors:[batch]exporters:[debug]logs:receivers:[otlp]processors:[batch]exporters:[debug]extensions:[health_check, pprof, zpages]
OpenTelemetry Collector Exporters
An exporter, which can be push or pull-based, is how you send data to one or more backends/destinations. Exporters may support one or more data sources.
For this workshop, we will be using the otlphttp exporter. The OpenTelemetry Protocol (OTLP) is a vendor-neutral, standardised protocol for transmitting telemetry data. The OTLP exporter sends data to a server that implements the OTLP protocol. The OTLP exporter supports both gRPC and HTTP/JSON protocols.
To send metrics over HTTP to Splunk Observability Cloud, we will need to configure the otlphttp exporter.
Let’s edit our /etc/otelcol-contrib/config.yaml file and configure the otlphttp exporter. Insert the following YAML under the exporters section, taking care to indent by two spaces e.g.
We will also change the verbosity of the logging exporter to prevent the disk from filling up. The default of detailed is very noisy.
Next, we need to define the metrics_endpoint and configure the target URL.
Note
If you are an attendee at a Splunk-hosted workshop, the instance you are using has already been configured with a Realm environment variable. We will reference that environment variable in our configuration file. Otherwise, you will need to create a new environment variable and set the Realm e.g.
exportREALM="us1"
The URL to use is https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlp. (Splunk has Realms in key geographical locations around the world for data residency).
The otlphttp exporter can also be configured to send traces and logs by defining a target URL for traces_endpoint and logs_endpoint respectively. Configuring these is outside the scope of this workshop.
By default, gzip compression is enabled for all endpoints. This can be disabled by setting compression: none in the exporter configuration. We will leave compression enabled for this workshop and accept the default as this is the most efficient way to send data.
To send metrics to Splunk Observability Cloud, we need to use an Access Token. This can be done by creating a new token in the Splunk Observability Cloud UI. For more information on how to create a token, see Create a token. The token needs to be of type INGEST.
Note
If you are an attendee at a Splunk-hosted workshop, the instance you are using has already been configured with an Access Token (which has been set as an environment variable). We will reference that environment variable in our configuration file. Otherwise, you will need to create a new token and set it as an environment variable e.g.
exportACCESS_TOKEN=<replace-with-your-token>
The token is defined in the configuration file by inserting X-SF-TOKEN: ${env:ACCESS_TOKEN} under a headers: section:
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacksextensions:health_check:endpoint:0.0.0.0:13133pprof:endpoint:0.0.0.0:1777zpages:endpoint:0.0.0.0:55679receivers:hostmetrics:collection_interval:10sscrapers:# CPU utilization metricscpu:# Disk I/O metricsdisk:# File System utilization metricsfilesystem:# Memory utilization metricsmemory:# Network interface I/O metrics & TCP connection metricsnetwork:# CPU load metricsload:# Paging/Swap space utilization and I/O metricspaging:# Process count metricsprocesses:# Per process CPU, Memory and Disk I/O metrics. Disabled by default.# process:otlp:protocols:grpc:endpoint:0.0.0.0:4317http:endpoint:0.0.0.0:4318opencensus:endpoint:0.0.0.0:55678# Collect own metricsprometheus/internal:config:scrape_configs:- job_name:'otel-collector'scrape_interval:10sstatic_configs:- targets:['0.0.0.0:8888']jaeger:protocols:grpc:endpoint:0.0.0.0:14250thrift_binary:endpoint:0.0.0.0:6832thrift_compact:endpoint:0.0.0.0:6831thrift_http:endpoint:0.0.0.0:14268zipkin:endpoint:0.0.0.0:9411processors:batch:resourcedetection/system:detectors:[system]system:hostname_sources:[os]resourcedetection/ec2:detectors:[ec2]attributes/conf:actions:- key:participant.nameaction:insertvalue:"INSERT_YOUR_NAME_HERE"exporters:debug:verbosity:normalotlphttp/splunk:metrics_endpoint:https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlpheaders:X-SF-Token:${env:ACCESS_TOKEN}service:pipelines:traces:receivers:[otlp, opencensus, jaeger, zipkin]processors:[batch]exporters:[debug]metrics:receivers:[otlp, opencensus, prometheus]processors:[batch]exporters:[debug]logs:receivers:[otlp]processors:[batch]exporters:[debug]extensions:[health_check, pprof, zpages]
Of course, you can easily configure the metrics_endpoint to point to any other solution that supports the OTLP protocol.
Next, we need to enable the receivers, processors and exporters we have just configured in the service section of the config.yaml.
OpenTelemetry Collector Service
The Service section is used to configure what components are enabled in the Collector based on the configuration found in the receivers, processors, exporters, and extensions sections.
Info
If a component is configured, but not defined within the Service section then it is not enabled.
The service section consists of three sub-sections:
extensions
pipelines
telemetry
In the default configuration, the extension section has been configured to enable health_check, pprof and zpages, which we configured in the Extensions module earlier.
service:extensions:[health_check, pprof, zpages]
So lets configure our Metric Pipeline!
Subsections of 6. Service
OpenTelemetry Collector Service
Hostmetrics Receiver
If you recall from the Receivers portion of the workshop, we defined the Host Metrics Receiver to generate metrics about the host system, which are scraped from various sources. To enable the receiver, we must include the hostmetrics receiver in the metrics pipeline.
In the metrics pipeline, add hostmetrics to the metrics receivers section.
Earlier in the workshop, we also renamed the prometheus receiver to reflect that is was collecting metrics internal to the collector, renaming it to prometheus/internal.
We now need to enable the prometheus/internal receiver under the metrics pipeline. Update the receivers section to include prometheus/internal under the metrics pipeline:
We also added resourcedetection/system and resourcedetection/ec2 processors so that the collector can capture the instance hostname and AWS/EC2 metadata. We now need to enable these two processors under the metrics pipeline.
Update the processors section to include resourcedetection/system and resourcedetection/ec2 under the metrics pipeline:
Also in the Processors section of this workshop, we added the attributes/conf processor so that the collector will insert a new attribute called participant.name to all the metrics. We now need to enable this under the metrics pipeline.
Update the processors section to include attributes/conf under the metrics pipeline:
In the Exporters section of the workshop, we configured the otlphttp exporter to send metrics to Splunk Observability Cloud. We now need to enable this under the metrics pipeline.
Update the exporters section to include otlphttp/splunk under the metrics pipeline:
The collector captures internal signals about its behavior this also includes additional signals from running components.
The reason for this is that components that make decisions about the flow of data need a way to surface that information
as metrics or traces.
Why monitor the collector?
This is somewhat of a chicken and egg problem of, “Who is watching the watcher?”, but it is important that we can surface this information. Another interesting part of the collector’s history is that it existed before the Go metrics’ SDK was considered stable so the collector exposes a Prometheus endpoint to provide this functionality for the time being.
Considerations
Monitoring the internal usage of each running collector in your organization can contribute a significant amount of new Metric Time Series (MTS). The Splunk distribution has curated these metrics for you and would be able to help forecast the expected increases.
The Ninja Zone
To expose the internal observability of the collector, some additional settings can be adjusted:
service:telemetry:logs:level:<info|warn|error>development:<true|false>encoding:<console|json>disable_caller:<true|false>disable_stacktrace:<true|false>output_paths:[<stdout|stderr>, paths...]error_output_paths:[<stdout|stderr>, paths...]initial_fields:key:valuemetrics:level:<none|basic|normal|detailed># Address binds the promethues endpoint to scrapeaddress:<hostname:port>
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacksextensions:health_check:endpoint:0.0.0.0:13133pprof:endpoint:0.0.0.0:1777zpages:endpoint:0.0.0.0:55679receivers:hostmetrics:collection_interval:10sscrapers:# CPU utilization metricscpu:# Disk I/O metricsdisk:# File System utilization metricsfilesystem:# Memory utilization metricsmemory:# Network interface I/O metrics & TCP connection metricsnetwork:# CPU load metricsload:# Paging/Swap space utilization and I/O metricspaging:# Process count metricsprocesses:# Per process CPU, Memory and Disk I/O metrics. Disabled by default.# process:otlp:protocols:grpc:endpoint:0.0.0.0:4317http:endpoint:0.0.0.0:4318opencensus:endpoint:0.0.0.0:55678# Collect own metricsprometheus/internal:config:scrape_configs:- job_name:'otel-collector'scrape_interval:10sstatic_configs:- targets:['0.0.0.0:8888']jaeger:protocols:grpc:endpoint:0.0.0.0:14250thrift_binary:endpoint:0.0.0.0:6832thrift_compact:endpoint:0.0.0.0:6831thrift_http:endpoint:0.0.0.0:14268zipkin:endpoint:0.0.0.0:9411processors:batch:resourcedetection/system:detectors:[system]system:hostname_sources:[os]resourcedetection/ec2:detectors:[ec2]attributes/conf:actions:- key:participant.nameaction:insertvalue:"INSERT_YOUR_NAME_HERE"exporters:debug:verbosity:normalotlphttp/splunk:metrics_endpoint:https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlpheaders:X-SF-Token:${env:ACCESS_TOKEN}service:pipelines:traces:receivers:[otlp, opencensus, jaeger, zipkin]processors:[batch]exporters:[debug]metrics:receivers:[hostmetrics, otlp, opencensus, prometheus/internal]processors:[batch, resourcedetection/system, resourcedetection/ec2, attributes/conf]exporters:[debug, otlphttp/splunk]logs:receivers:[otlp]processors:[batch]exporters:[debug]extensions:[health_check, pprof, zpages]
Tip
It is recommended that you validate your configuration file before restarting the collector. You can do this by pasting the contents of your config.yaml file into the OTelBin Configuration Validator tool.
ScreenshotOTelBin
Now that we have a working configuration, let’s start the collector and then check to see what zPages is reporting.
Now that we have configured the OpenTelemetry Collector to send metrics to Splunk Observability Cloud, let’s take a look at the data in Splunk Observability Cloud. If you have not received an invite to Splunk Observability Cloud, your instructor will provide you with login credentials.
Before that, let’s make things a little more interesting and run a stress test on the instance. This in turn will light up the dashboards.
Once you are logged into Splunk Observability Cloud, using the left-hand navigation, navigate to Dashboards from the main menu. This will take you to the Teams view. At the top of this view click on All Dashboards :
In the search box, search for OTel Contrib:
Info
If the dashboard does not exist, then your instructor will be able to quickly add it. If you are not attending a Splunk hosted version of this workshop then the Dashboard Group to import can be found at the bottom of this page.
Click on the OTel Contrib Dashboard dashboard to open it, next click in the Participant Name box, at the top of the dashboard, and select the name you configured for participant.name in the config.yaml in the drop-down list or start typing the name to search for it:
You can now see the host metrics for the host upon which you configured the OpenTelemetry Collector.
Building a component for the Open Telemetry Collector requires three key parts:
The Configuration - What values are exposed to the user to configure
The Factory - Make the component using the provided values
The Business Logic - What the component needs to do
For this, we will use the example of building a component that works with Jenkins so that we can track important DevOps metrics of our project(s).
The metrics we are looking to measure are:
Lead time for changes - “How long it takes for a commit to get into production”
Change failure rate - “The percentage of deployments causing a failure in production”
Deployment frequency - “How often a [team] successfully releases to production”
Mean time to recover - “How long does it take for a [team] to recover from a failure in production”
These indicators were identified Google’s DevOps Research and Assesment (DORA)[^1] team to help
show performance of a software development team. The reason for choosing Jenkins CI is that we remain in the same Open Source Software ecosystem which we can serve as the example for the vendor managed CI tools to adopt in future.
Instrument Vs Component
There is something to consider when improving level of Observability within your organisation
since there are some trade offs that get made.
Pros
Cons
(Auto) Instrumented
Does not require an external API to be monitored in order to observe the system.
Changing instrumentation requires changes to the project.
Gives system owners/developers to make changes in their observability.
Requires additional runtime dependancies.
Understands system context and can corrolate captured data with Exemplars.
Can impact performance of the system.
Component
- Changes to data names or semantics can be rolled out independently of the system’s release cycle.
Breaking API changes require a coordinated release between system and collector.
Updating/extending data collected is a seemless user facing change.
Captured data semantics can unexpectedly break that does not align with a new system release.
Does not require the supporting teams to have a deep understanding of observability practice.
Strictly external / exposed information can be surfaced from the system.
Subsections of 8. Develop
OpenTelemetry Collector Development
Project Setup Ninja
Note
The time to finish this section of the workshop can vary depending on experience.
A complete solution can be found here in case you’re stuck or want to follow
along with the instructor.
To get started developing the new Jenkins CI receiver, we first need to set up a Golang project.
The steps to create your new Golang project is:
Create a new directory named ${HOME}/go/src/jenkinscireceiver and change into it
The actual directory name or location is not strict, you can choose your own development directory to make it in.
Initialize the golang module by going go mod init splunk.conf/workshop/example/jenkinscireceiver
This will create a file named go.mod which is used to track our direct and indirect dependencies
Eventually, there will be a go.sum which is the checksum value of the dependencies being imported.
Check-inReview your go.mod
module splunk.conf/workshop/example/jenkinscireceiver
go 1.20
OpenTelemetry Collector Development
Building The Configuration
The configuration portion of the component is how the user is able to have their inputs over the component,
so the values that is used for the configuration need to be:
Intuitive for users to understand what that field controls
Be explicit in what is required and what is optional
The bad configuration highlights how doing the opposite of the recommendations of configuration practices impacts the usability
of the component. It doesn’t make it clear what field values should be, it includes features that can be pushed to existing processors,
and the field naming is not consistent with other components that exist in the collector.
The good configuration keeps the required values simple, reuses field names from other components, and ensures the component focuses on
just the interaction between Jenkins and the collector.
The code tab shows how much is required to be added by us and what is already provided for us by shared libraries within the collector.
These will be explained in more detail once we get to the business logic. The configuration should start off small and will change
once the business logic has started to include additional features that is needed.
Write the code
In order to implement the code needed for the configuration, we are going to create a new file named config.go with the following content:
packagejenkinscireceiverimport("go.opentelemetry.io/collector/config/confighttp""go.opentelemetry.io/collector/receiver/scraperhelper""splunk.conf/workshop/example/jenkinscireceiver/internal/metadata")typeConfigstruct{// HTTPClientSettings contains all the values
// that are commonly shared across all HTTP interactions
// performed by the collector.
confighttp.HTTPClientSettings`mapstructure:",squash"`// ScraperControllerSettings will allow us to schedule
// how often to check for updates to builds.
scraperhelper.ScraperControllerSettings`mapstructure:",squash"`// MetricsBuilderConfig contains all the metrics
// that can be configured.
metadata.MetricsBuilderConfig`mapstructure:",squash"`}
OpenTelemetry Collector Development
Component Review
To recap the type of component we will need to capture metrics from Jenkins:
The business use case an extension helps solves for are:
Having shared functionality that requires runtime configuration
Indirectly helps with observing the runtime of the collector
This is commonly referred to pull vs push based data collection, and you read more about the details in the Receiver Overview.
The business use case a processor solves for is:
Adding or removing data, fields, or values
Observing and making decisions on the data
Buffering, queueing, and reordering
The thing to keep in mind is the data type flowing through a processor needs to forward
the same data type to its downstream components. Read through Processor Overview for the details.
The business use case an exporter solves for:
Send the data to a tool, service, or storage
The OpenTelemetry collector does not want to be “backend”, an all-in-one observability suite, but rather
keep to the principles that founded OpenTelemetry to begin with; A vendor agnostic Observability for all.
To help revisit the details, please read through Exporter Overview.
This is a component type that was missed in the workshop since it is a relatively new addition to the collector, but the best way to think about a connector is that it is like a processor that allows it to be used across different telemetry types and pipelines. Meaning that a connector can accept data as logs, and output metrics, or accept metrics from one pipeline and provide metrics on the data it has observed.
The business case that a connector solves for:
Converting from different telemetry types
logs to metrics
traces to metrics
metrics to logs
Observing incoming data and producing its own data
Accepting metrics and generating analytical metrics of the data.
There was a brief overview within the Ninja section as part of the Processor Overview,
and be sure what the project for updates for new connector components.
From the component overviews, it is clear that developing a pull-based receiver for Jenkins.
OpenTelemetry Collector Development
Designing The Metrics
To help define and export the metrics captured by our receiver, we will be using, mdatagen, a tool developed for the collector that turns YAML defined metrics into code.
---# Type defines the name to reference the component# in the configuration filetype:jenkins# Status defines the component type and the stability levelstatus:class:receiverstability:development:[metrics]# Attributes are the expected fields reported# with the exported values.attributes:job.name:description:The name of the associated Jenkins jobtype:stringjob.status:description:Shows if the job had passed, or failedtype:stringenum:- failed- success- unknown# Metrics defines all the pontentially exported values from this receiver. metrics:jenkins.jobs.count:enabled:truedescription:Provides a count of the total number of configured jobsunit:"{Count}"gauge:value_type:intjenkins.job.duration:enabled:truedescription:Show the duration of the jobunit:"s"gauge:value_type:intattributes:- job.name- job.statusjenkins.job.commit_delta:enabled:truedescription:The calculation difference of the time job was finished minus commit timestampunit:"s"gauge:value_type:intattributes:- job.name- job.status
// To generate the additional code needed to capture metrics,
// the following command to be run from the shell:
// go generate -x ./...
//go:generate go run github.com/open-telemetry/opentelemetry-collector-contrib/cmd/mdatagen@v0.80.0 metadata.yaml
packagejenkinscireceiver// There is no code defined within this file.
Create these files within the project folder before continuing onto the next section.
Building The Factory
The Factory is a software design pattern that effectively allows for an object, in this case a jenkinscireceiver, to be created dynamically with the provided configuration. To use a more real-world example, it would be going to a phone store, asking for a phone
that matches your exact description, and then providing it to you.
Run the following command go generate -x ./... , it will create a new folder, jenkinscireceiver/internal/metadata, that contains all code required to export the defined metrics. The required code is:
packagejenkinscireceiverimport("errors""go.opentelemetry.io/collector/component""go.opentelemetry.io/collector/config/confighttp""go.opentelemetry.io/collector/receiver""go.opentelemetry.io/collector/receiver/scraperhelper""splunk.conf/workshop/example/jenkinscireceiver/internal/metadata")funcNewFactory()receiver.Factory{returnreceiver.NewFactory(metadata.Type,newDefaultConfig,receiver.WithMetrics(newMetricsReceiver,metadata.MetricsStability),)}funcnewMetricsReceiver(_context.Context,setreceiver.CreateSettings,cfgcomponent.Config,consumerconsumer.Metrics)(receiver.Metrics,error){// Convert the configuration into the expected type
conf,ok:=cfg.(*Config)if!ok{returnnil,errors.New("can not convert config")}sc,err:=newScraper(conf,set)iferr!=nil{returnnil,err}returnscraperhelper.NewScraperControllerReceiver(&conf.ScraperControllerSettings,set,consumer,scraperhelper.AddScraper(sc),)}
packagejenkinscireceiverimport("go.opentelemetry.io/collector/config/confighttp""go.opentelemetry.io/collector/receiver/scraperhelper""splunk.conf/workshop/example/jenkinscireceiver/internal/metadata")typeConfigstruct{// HTTPClientSettings contains all the values
// that are commonly shared across all HTTP interactions
// performed by the collector.
confighttp.HTTPClientSettings`mapstructure:",squash"`// ScraperControllerSettings will allow us to schedule
// how often to check for updates to builds.
scraperhelper.ScraperControllerSettings`mapstructure:",squash"`// MetricsBuilderConfig contains all the metrics
// that can be configured.
metadata.MetricsBuilderConfig`mapstructure:",squash"`}funcnewDefaultConfig()component.Config{return&Config{ScraperControllerSettings:scraperhelper.NewDefaultScraperControllerSettings(metadata.Type),HTTPClientSettings:confighttp.NewDefaultHTTPClientSettings(),MetricsBuilderConfig:metadata.DefaultMetricsBuilderConfig(),}}
packagejenkinscireceivertypescraperstruct{}funcnewScraper(cfg*Config,setreceiver.CreateSettings)(scraperhelper.Scraper,error){// Create a our scraper with our values
s:=scraper{// To be filled in later
}returnscraperhelper.NewScraper(metadata.Type,s.scrape)}func(scraper)scrape(ctxcontext.Context)(pmetric.Metrics,error){// To be filled in
returnpmetrics.NewMetrics(),nil}
---dist:name:otelcoldescription:"Conf workshop collector"output_path:./distversion:v0.0.0-experimentalextensions:- gomod:github.com/open-telemetry/opentelemetry-collector-contrib/extension/basicauthextension v0.80.0- gomod:github.com/open-telemetry/opentelemetry-collector-contrib/extension/healthcheckextension v0.80.0receivers:- gomod:go.opentelemetry.io/collector/receiver/otlpreceiver v0.80.0- gomod:github.com/open-telemetry/opentelemetry-collector-contrib/receiver/jaegerreceiver v0.80.0- gomod:github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.80.0- gomod:splunk.conf/workshop/example/jenkinscireceiver v0.0.0path:./jenkinscireceiverprocessors:- gomod:go.opentelemetry.io/collector/processor/batchprocessor v0.80.0exporters:- gomod:go.opentelemetry.io/collector/exporter/loggingexporter v0.80.0- gomod:go.opentelemetry.io/collector/exporter/otlpexporter v0.80.0- gomod:go.opentelemetry.io/collector/exporter/otlphttpexporter v0.80.0# This replace is a go directive that allows for redefine# where to fetch the code to use since the default would be from a remote project.replaces:- splunk.conf/workshop/example/jenkinscireceiver => ./jenkinscireceiver
Once you have written these files into the project with the expected contents run, go mod tidy, which will fetch all the remote dependencies and update go.mod and generate the go.sum files.
OpenTelemetry Collector Development
Building The Business Logic
At this point, we have a custom component that currently does nothing so we need to add in the required
logic to capture this data from Jenkins.
From this point, the steps that we need to take are:
Create a client that connect to Jenkins
Capture all the configured jobs
Report the status of the last build for the configured job
Calculate the time difference between commit timestamp and job completion.
The changes will be made to scraper.go.
To be able to connect to the Jenkins server, we will be using the package,
“github.com/yosida95/golang-jenkins”,
which provides the functionality required to read data from the jenkins server.
Then we are going to utilise some of the helper functions from the,
“go.opentelemetry.io/collector/receiver/scraperhelper” ,
library to create a start function so that we can connect to the Jenkins server once component has finished starting.
packagejenkinscireceiverimport("context"jenkins"github.com/yosida95/golang-jenkins""go.opentelemetry.io/collector/component""go.opentelemetry.io/collector/pdata/pmetric""go.opentelemetry.io/collector/receiver""go.opentelemetry.io/collector/receiver/scraperhelper""splunk.conf/workshop/example/jenkinscireceiver/internal/metadata")typescraperstruct{mb*metadata.MetricsBuilderclient*jenkins.Jenkins}funcnewScraper(cfg*Config,setreceiver.CreateSettings)(scraperhelper.Scraper,error){s:=&scraper{mb:metadata.NewMetricsBuilder(cfg.MetricsBuilderConfig,set),}returnscraperhelper.NewScraper(metadata.Type,s.scrape,scraperhelper.WithStart(func(ctxcontext.Context,hcomponent.Host)error{client,err:=cfg.ToClient(h,set.TelemetrySettings)iferr!=nil{returnerr}// The collector provides a means of injecting authentication
// on our behalf, so this will ignore the libraries approach
// and use the configured http client with authentication.
s.client=jenkins.NewJenkins(nil,cfg.Endpoint)s.client.SetHTTPClient(client)returnnil}),)}func(sscraper)scrape(ctxcontext.Context)(pmetric.Metrics,error){// To be filled in
returnpmetric.NewMetrics(),nil}
This finishes all the setup code that is required in order to initialise a Jenkins receiver.
From this point on, we will be focuses on the scrape method that has been waiting to be filled in.
This method will be run on each interval that is configured within the configuration (by default, every minute).
The reason we want to capture the number of jobs configured so we can see the growth of our Jenkins server,
and measure of many projects have onboarded. To do this we will call the jenkins client to list all jobs,
and if it reports an error, return that with no metrics, otherwise, emit the data from the metric builder.
func(sscraper)scrape(ctxcontext.Context)(pmetric.Metrics,error){jobs,err:=s.client.GetJobs()iferr!=nil{returnpmetric.Metrics{},err}// Recording the timestamp to ensure
// all captured data points within this scrape have the same value.
now:=pcommon.NewTimestampFromTime(time.Now())// Casting to an int64 to match the expected type
s.mb.RecordJenkinsJobsCountDataPoint(now,int64(len(jobs)))// To be filled in
returns.mb.Emit(),nil}
In the last step, we were able to capture all jobs ands report the number of jobs
there was. Within this step, we are going to examine each job and use the report values
to capture metrics.
func(sscraper)scrape(ctxcontext.Context)(pmetric.Metrics,error){jobs,err:=s.client.GetJobs()iferr!=nil{returnpmetric.Metrics{},err}// Recording the timestamp to ensure
// all captured data points within this scrape have the same value.
now:=pcommon.NewTimestampFromTime(time.Now())// Casting to an int64 to match the expected type
s.mb.RecordJenkinsJobsCountDataPoint(now,int64(len(jobs)))for_,job:=rangejobs{// Ensure we have valid results to start off with
var(build=job.LastCompletedBuildstatus=metadata.AttributeJobStatusUnknown)// This will check the result of the job, however,
// since the only defined attributes are
// `success`, `failure`, and `unknown`.
// it is assume that anything did not finish
// with a success or failure to be an unknown status.
switchbuild.Result{case"aborted","not_built","unstable":status=metadata.AttributeJobStatusUnknowncase"success":status=metadata.AttributeJobStatusSuccesscase"failure":status=metadata.AttributeJobStatusFailed}s.mb.RecordJenkinsJobDurationDataPoint(now,int64(job.LastCompletedBuild.Duration),job.Name,status,)}returns.mb.Emit(),nil}
The final step is to calculate how long it took from
commit to job completion to help infer our DORA metrics.
func(sscraper)scrape(ctxcontext.Context)(pmetric.Metrics,error){jobs,err:=s.client.GetJobs()iferr!=nil{returnpmetric.Metrics{},err}// Recording the timestamp to ensure
// all captured data points within this scrape have the same value.
now:=pcommon.NewTimestampFromTime(time.Now())// Casting to an int64 to match the expected type
s.mb.RecordJenkinsJobsCountDataPoint(now,int64(len(jobs)))for_,job:=rangejobs{// Ensure we have valid results to start off with
var(build=job.LastCompletedBuildstatus=metadata.AttributeJobStatusUnknown)// Previous step here
// Ensure that the `ChangeSet` has values
// set so there is a valid value for us to reference
iflen(build.ChangeSet.Items)==0{continue}// Making the assumption that the first changeset
// item is the most recent change.
change:=build.ChangeSet.Items[0]// Record the difference from the build time
// compared against the change timestamp.
s.mb.RecordJenkinsJobCommitDeltaDataPoint(now,int64(build.Timestamp-change.Timestamp),job.Name,status,)}returns.mb.Emit(),nil}
Once all of these steps have been completed, you now have built a custom Jenkins CI receiver!
Whats next?
There are more than likely features that would be desired from component that you can think of, like:
Can I include the branch name that the job used?
Can I include the project name for the job?
How I calculate the collective job durations for project?
How do I validate the changes work?
Please take this time to play around, break it, change things around, or even try to capture logs from the builds.