OpenTelemetry Collector ワークショップ

OpenTelemetry Collector の基本概念
OpenTelemetry Collector の概念と、Splunk Observability Cloud へデータを送信する方法を学びます。
Advanced OpenTelemetry Collector
OpenTelemetry Collector の設定をゼロから行う練習を行い、いくつかの高度な設定シナリオを体験します。

OpenTelemetry でオブザーバビリティをクラウドネイティブに

1 hour Author Robert Castley

概要

OpenTelemetry を始めたばかりの組織では、まずオブザーバビリティバックエンドに直接データを送信することから始めることが多いでしょう。これは初期テストには有効ですが、OpenTelemetry Collector をオブザーバビリティアーキテクチャの一部として使用することで多くのメリットがあり、本番環境へのデプロイには推奨されています。

このワークショップでは、OpenTelemetry Collector の使用に焦点を当て、Splunk Observability Cloud で使用するための Receiver、Processor、Exporter の設定の基本から始めます。参加者は初心者から、分散プラットフォームのビジネスオブザーバビリティニーズを解決するためのカスタムコンポーネントを追加できるレベルまで到達します。

Ninja セクション

ワークショップを通じて、展開可能な Ninja セクションがあります。これらはより実践的で、ワークショップ中または自分の時間に探求できる詳細な技術情報を提供します。

OpenTelemetry プロジェクトは頻繁に開発が行われているため、これらのセクションの内容が古くなる可能性があることに注意してください。詳細が同期していない場合はリンクが提供されます。更新が必要な箇所を見つけた場合はお知らせください。

Ninja: テストしてみよう！

このワークショップを完了すると、正式に OpenTelemetry Collector Ninja になれます！

対象者

このインタラクティブなワークショップは、OpenTelemetry Collector のアーキテクチャとデプロイについて詳しく学びたい開発者およびシステム管理者を対象としています。

前提条件

データ収集の基本的な理解があること
コマンドラインと vim/vi の経験があること
Ubuntu 20.04 LTS または 22.04 LTS を実行しているインスタンス/ホスト/VM があること
- 最小要件は AWS/EC2 t2.micro（1 CPU、1GB RAM、8GB ストレージ）です

学習目標

このワークショップを終えると、参加者は以下ができるようになります

OpenTelemetry のコンポーネントを理解する
Receiver、Processor、Exporter を使用してデータを収集・分析する
OpenTelemetry を使用するメリットを理解する
ビジネスニーズを解決するカスタムコンポーネントを構築する

OpenTelemetry アーキテクチャ

%%{
  init:{
    "theme":"base",
    "themeVariables": {
      "primaryColor": "#ffffff",
      "clusterBkg": "#eff2fb",
      "defaultLinkColor": "#333333"
    }
  }
}%%

flowchart LR;
    subgraph Collector
    A[OTLP] --> M(Receivers)
    B[JAEGER] --> M(Receivers)
    C[Prometheus] --> M(Receivers)
    end
    subgraph Processors
    M(Receivers) --> H(Filters, Attributes, etc)
    E(Extensions)
    end
    subgraph Exporters
    H(Filters, Attributes, etc) --> S(OTLP)
    H(Filters, Attributes, etc) --> T(JAEGER)
    H(Filters, Attributes, etc) --> U(Prometheus)
    end

OpenTelemetry Collector Contrib のインストール

OpenTelemetry Collector Contrib ディストリビューションのダウンロード

OpenTelemetry Collector をインストールする最初のステップは、ダウンロードです。このラボでは、wget コマンドを使用して OpenTelemetry の GitHub リポジトリから .deb パッケージをダウンロードします。

お使いのプラットフォーム用の .deb パッケージを OpenTelemetry Collector Contrib リリースページ から取得します。

wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.111.0/otelcol-contrib_0.111.0_linux_amd64.deb

OpenTelemetry Collector Contrib ディストリビューションのインストール

dpkg を使用して .deb パッケージをインストールします。インストールが成功した場合の出力例は、下の dpkg Output タブを確認してください

sudo dpkg -i otelcol-contrib_0.111.0_linux_amd64.deb

Selecting previously unselected package otelcol-contrib.
(Reading database ... 89232 files and directories currently installed.)
Preparing to unpack otelcol-contrib_0.111.0_linux_amd64.deb ...
Unpacking otelcol-contrib (0.111.0) ...
Setting up otelcol-contrib (0.111.0) ...
Created symlink /etc/systemd/system/multi-user.target.wants/otelcol-contrib.service → /lib/systemd/system/otelcol-contrib.service.

OpenTelemetry Collector Contrib のインストール

Collector が動作していることを確認する

Collector が動作しているはずです。systemctl コマンドを使用して root として確認します。ステータス表示を終了するには q を押してください。

sudo systemctl status otelcol-contrib

● otelcol-contrib.service - OpenTelemetry Collector Contrib
     Loaded: loaded (/lib/systemd/system/otelcol-contrib.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2024-10-07 10:27:49 BST; 52s ago
   Main PID: 17113 (otelcol-contrib)
      Tasks: 13 (limit: 19238)
     Memory: 34.8M
        CPU: 155ms
     CGroup: /system.slice/otelcol-contrib.service
             └─17113 /usr/bin/otelcol-contrib --config=/etc/otelcol-contrib/config.yaml

Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: Descriptor:
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]:      -> Name: up
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]:      -> Description: The scraping was successful
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]:      -> Unit:
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]:      -> DataType: Gauge
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: NumberDataPoints #0
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: Timestamp: 2024-10-07 09:28:36.942 +0000 UTC
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]: Value: 1.000000
Oct 07 10:28:36 petclinic-rum-testing otelcol-contrib[17113]:         {"kind": "exporter", "data_type": "metrics", "name": "debug"}

このワークショップでは、設定ファイルの変更、環境変数の設定、Collector の再起動を複数回行うため、Collector サービスを停止し、起動時の自動起動を無効にする必要があります。

sudo systemctl stop otelcol-contrib && sudo systemctl disable otelcol-contrib

Ninja: Open Telemetry Collector Builder (ocb) を使用して独自の Collector をビルドする

このパートでは、システムに以下がインストールされている必要があります

Golang（最新バージョン）

cd /tmp
wget https://golang.org/dl/go1.20.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.20.linux-amd64.tar.gz

.profile を編集して、以下の環境変数を追加します

export GOROOT=/usr/local/go
export GOPATH=$HOME/go
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH

シェルセッションを更新します

source ~/.profile

Go のバージョンを確認します

go version

ocb のインストール
- プロジェクトリリースから ocb バイナリをダウンロードし、以下のコマンドを実行します
```
mv ocb_0.80.0_darwin_arm64 /usr/bin/ocb
chmod 755 /usr/bin/ocb
```
  別の方法として、golang ツールチェーンを使用してローカルでバイナリをビルドすることもできます
```
go install go.opentelemetry.io/collector/cmd/builder@v0.80.0
mv $(go env GOPATH)/bin/builder /usr/bin/ocb
```
（オプション）Docker

なぜ独自の Collector をビルドするのか？

Collector のデフォルトディストリビューション（core と contrib）は、提供する機能が多すぎるか少なすぎるかのどちらかです。

また、contrib Collector を本番環境で実行することは推奨されません。これは、インストールされるコンポーネントの量が多く、そのほとんどがデプロイメントに必要ないためです。

独自の Collector をビルドするメリットは？

独自の Collector バイナリ（一般的にディストリビューションと呼ばれる）を作成することは、必要なものだけをビルドすることを意味します。

これには以下のメリットがあります

より小さなサイズのバイナリ
脆弱性に対して既存の Go スキャナーを使用できる
組織と連携できる内部コンポーネントを含めることができる

Collector をビルドする際の考慮事項は？

さて、いくつかのデメリットがなければ 🥷 Ninja ゾーンとは言えません

Go の経験が推奨される（必須ではないが）
Splunk サポートなし
ディストリビューションとライフサイクル管理の責任

プロジェクトは安定性に向けて取り組んでいますが、変更によってワークフローが壊れないとは限らないことに注意することが重要です。Splunk のチームは、より高いサポートと安定性を提供しており、デプロイメントのニーズに応じたキュレーションされた体験を提供できます。

Ninja ゾーン

必要なツールがすべてインストールされたら、otelcol-builder.yaml という名前の新しいファイルを作成し、以下のディレクトリ構造に従います

.
└── otelcol-builder.yaml

ファイルを作成したら、いくつかの追加メタデータとともにインストールするコンポーネントのリストを追加する必要があります。

この例では、入門用の設定に必要なコンポーネントのみをインストールするビルダーマニフェストを作成します

dist:
  name: otelcol-ninja
  description: A custom build of the Open Telemetry Collector
  output_path: ./dist

extensions:
- gomod: go.opentelemetry.io/collector/extension/ballastextension v0.80.0
- gomod: go.opentelemetry.io/collector/extension/zpagesextension  v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/httpforwarder v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/healthcheckextension v0.80.0

exporters:
- gomod: go.opentelemetry.io/collector/exporter/loggingexporter v0.80.0
- gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/splunkhecexporter v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter v0.80.0

processors:
- gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.80.0
- gomod: go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.80.0

receivers:
- gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/jaegerreceiver v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.80.0
- gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/zipkinreceiver v0.80.0

yaml ファイルが ocb 用に更新されたら、以下のコマンドを実行します

ocb --config=otelcol-builder.yaml

これにより、以下のディレクトリ構造が作成されます

├── dist
│   ├── components.go
│   ├── components_test.go
│   ├── go.mod
│   ├── go.sum
│   ├── main.go
│   ├── main_others.go
│   ├── main_windows.go
│   └── otelcol-ninja
└── otelcol-builder.yaml

参考資料

https://opentelemetry.io/docs/collector/custom-collector/

デフォルト設定

OpenTelemetry は YAML ファイルを通じて設定されます。これらのファイルには、ニーズに合わせて変更できるデフォルト設定があります。提供されるデフォルト設定を見てみましょう

cat /etc/otelcol-contrib/config.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:

exporters:
  debug:
    verbosity: detailed

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

おめでとうございます！OpenTelemetry Collector のダウンロードとインストールに成功しました。OTel Ninja への道を順調に歩んでいます。しかしまず、設定ファイルと OpenTelemetry Collector の異なるディストリビューションについて説明していきましょう。

メモ

Splunk は独自の、完全にサポートされた OpenTelemetry Collector ディストリビューションを提供しています。このディストリビューションは、Splunk GitHub リポジトリからインストールするか、Splunk Observability Cloud のウィザードを使用してコピー＆ペーストするだけの簡単なインストールスクリプトを作成できます。このディストリビューションには、OpenTelemetry Collector Contrib ディストリビューションでは利用できない多くの追加機能と拡張が含まれています。

Splunk Distribution of the OpenTelemetry Collector は本番環境でテスト済みです。大多数のお客様が本番環境で使用しています。
このディストリビューションを使用するお客様は、SLA 内で Splunk の公式サポートから直接サポートを受けることができます。
お客様は、メトリクスとトレース収集のコア設定体験に対する将来の破壊的変更を心配することなく、Splunk Distribution of the OpenTelemetry Collector を使用または移行できます（OpenTelemetry ログ収集の設定はベータ版です）。Collector のメトリクスには破壊的変更がある可能性があります。

これから設定ファイルの各セクションを説明し、ホストメトリクスを Splunk Observability Cloud に送信するように変更していきます。

OpenTelemetry Collector Extensions

OpenTelemetry Collector がインストールできたので、OpenTelemetry Collector の拡張機能について見ていきましょう。拡張機能はオプションであり、主にテレメトリデータの処理を伴わないタスクに使用されます。拡張機能の例としては、ヘルスモニタリング、サービスディスカバリ、データ転送などがあります。

%%{
  init:{
    "theme": "base",
    "themeVariables": {
      "primaryColor": "#ffffff",
      "clusterBkg": "#eff2fb",
      "defaultLinkColor": "#333333"
    }
  }
}%%

flowchart LR;
    style E fill:#e20082,stroke:#333,stroke-width:4px,color:#fff
    subgraph Collector
    A[OTLP] --> M(Receivers)
    B[JAEGER] --> M(Receivers)
    C[Prometheus] --> M(Receivers)
    end
    subgraph Processors
    M(Receivers) --> H(Filters, Attributes, etc)
    E(Extensions)
    end
    subgraph Exporters
    H(Filters, Attributes, etc) --> S(OTLP)
    H(Filters, Attributes, etc) --> T(JAEGER)
    H(Filters, Attributes, etc) --> U(Prometheus)
    end

OpenTelemetry Collector Extensions

Health Check

拡張機能は、インストール手順で参照した同じ config.yaml ファイルで設定します。config.yaml ファイルを編集して拡張機能を設定しましょう。pprof と zpages 拡張機能はデフォルトの config.yaml ファイルに既に設定されていることに注意してください。このワークショップでは、Collector のヘルス状態にアクセスできるよう、すべてのネットワークインターフェースでポートを公開するように health_check 拡張機能のみを更新します。

sudo vi /etc/otelcol-contrib/config.yaml

extensions:
  health_check:
    endpoint: 0.0.0.0:13133

Collector を起動します

otelcol-contrib --config=file:/etc/otelcol-contrib/config.yaml

この拡張機能により、OpenTelemetry Collector のステータスを確認するためにプローブできる HTTP URL が有効になります。この拡張機能は、Kubernetes で liveness プローブや readiness プローブとして使用できます。curl コマンドについて詳しく知るには、curl man page を確認してください。

新しいターミナルセッションを開き、インスタンスに SSH 接続して以下のコマンドを実行します

curl http://localhost:13133

{"status":"Server available","upSince":"2024-10-07T11:00:08.004685295+01:00","uptime":"12.56420005s"}

OpenTelemetry Collector Extensions

Performance Profiler

Performance Profiler 拡張機能は、golang の net/http/pprof エンドポイントを有効にします。これは通常、開発者がパフォーマンスプロファイルを収集し、サービスの問題を調査するために使用されます。このワークショップではこれを扱いません。

OpenTelemetry Collector Extensions

zPages

zPages は、外部エクスポーターの代わりにインプロセスで使用できる機能です。組み込まれると、バックグラウンドでトレースとメトリクス情報を収集・集約し、リクエストされたときにウェブページでこのデータを提供します。zPages は、Collector が期待どおりに動作していることを確認するための非常に便利な診断機能です。

ServiceZ は、Collector サービスの概要と、pipelinez、extensionz、featurez の各 zPages へのクイックアクセスを提供します。このページには、ビルド情報とランタイム情報も表示されます。

サンプル URL: http://localhost:55679/debug/servicez（localhost をご自身の環境に合わせて変更してください）。

PipelineZ は、Collector で実行されているパイプラインに関する洞察を提供します。タイプ、データが変更されるかどうかの情報を確認でき、各パイプラインで使用されているレシーバー、プロセッサー、エクスポーターの情報も確認できます。

サンプル URL: http://localhost:55679/debug/pipelinez（localhost をご自身の環境に合わせて変更してください）。

ExtensionZ は、Collector でアクティブな拡張機能を表示します。

サンプル URL: http://localhost:55679/debug/extensionz（localhost をご自身の環境に合わせて変更してください）。

Ninja: storage 拡張機能でデータの耐久性を向上させる

このためには、使用しているディストリビューションに file_storage 拡張機能がインストールされていることを確認する必要があります。これは、otelcol-contrib components コマンドを実行することで確認でき、以下のような結果が表示されるはずです

# ... truncated for clarity
extensions:
  - file_storage

buildinfo:
    command: otelcol-contrib
    description: OpenTelemetry Collector Contrib
    version: 0.80.0
receivers:
    - prometheus_simple
    - apache
    - influxdb
    - purefa
    - purefb
    - receiver_creator
    - mongodbatlas
    - vcenter
    - snmp
    - expvar
    - jmx
    - kafka
    - skywalking
    - udplog
    - carbon
    - kafkametrics
    - memcached
    - prometheus
    - windowseventlog
    - zookeeper
    - otlp
    - awsecscontainermetrics
    - iis
    - mysql
    - nsxt
    - aerospike
    - elasticsearch
    - httpcheck
    - k8sobjects
    - mongodb
    - hostmetrics
    - signalfx
    - statsd
    - awsxray
    - cloudfoundry
    - collectd
    - couchdb
    - kubeletstats
    - jaeger
    - journald
    - riak
    - splunk_hec
    - active_directory_ds
    - awscloudwatch
    - sqlquery
    - windowsperfcounters
    - flinkmetrics
    - googlecloudpubsub
    - podman_stats
    - wavefront
    - k8s_events
    - postgresql
    - rabbitmq
    - sapm
    - sqlserver
    - redis
    - solace
    - tcplog
    - awscontainerinsightreceiver
    - awsfirehose
    - bigip
    - filelog
    - googlecloudspanner
    - cloudflare
    - docker_stats
    - k8s_cluster
    - pulsar
    - zipkin
    - nginx
    - opencensus
    - azureeventhub
    - datadog
    - fluentforward
    - otlpjsonfile
    - syslog
processors:
    - resource
    - batch
    - cumulativetodelta
    - groupbyattrs
    - groupbytrace
    - k8sattributes
    - experimental_metricsgeneration
    - metricstransform
    - routing
    - attributes
    - datadog
    - deltatorate
    - spanmetrics
    - span
    - memory_limiter
    - redaction
    - resourcedetection
    - servicegraph
    - transform
    - filter
    - probabilistic_sampler
    - tail_sampling
exporters:
    - otlp
    - carbon
    - datadog
    - f5cloud
    - kafka
    - mezmo
    - skywalking
    - awsxray
    - dynatrace
    - loki
    - prometheus
    - logging
    - azuredataexplorer
    - azuremonitor
    - instana
    - jaeger
    - loadbalancing
    - sentry
    - splunk_hec
    - tanzuobservability
    - zipkin
    - alibabacloud_logservice
    - clickhouse
    - file
    - googlecloud
    - prometheusremotewrite
    - awscloudwatchlogs
    - googlecloudpubsub
    - jaeger_thrift
    - logzio
    - sapm
    - sumologic
    - otlphttp
    - googlemanagedprometheus
    - opencensus
    - awskinesis
    - coralogix
    - influxdb
    - logicmonitor
    - signalfx
    - tencentcloud_logservice
    - awsemf
    - elasticsearch
    - pulsar
extensions:
    - zpages
    - bearertokenauth
    - oidc
    - host_observer
    - sigv4auth
    - file_storage
    - memory_ballast
    - health_check
    - oauth2client
    - awsproxy
    - http_forwarder
    - jaegerremotesampling
    - k8s_observer
    - pprof
    - asapclient
    - basicauth
    - headers_setter

この拡張機能は、エクスポーターが設定されたエンドポイントにデータを送信できない場合に、エクスポーターがデータをディスクにキューイングする機能を提供します。

拡張機能を設定するには、以下の情報を含めるように設定を更新する必要があります。まず、/tmp/otel-data ディレクトリを作成し、読み書き権限を付与してください

extensions:
...
  file_storage:
    directory: /tmp/otel-data
    timeout: 10s
    compaction:
      directory: /tmp/otel-data
      on_start: true
      on_rebound: true
      rebound_needed_threshold_mib: 5
      rebound_trigger_threshold_mib: 3

# ... truncated for clarity

service:
  extensions: [health_check, pprof, zpages, file_storage]

なぜデータをディスクにキューイングするのか？

これにより、Collector はネットワークの中断（さらには Collector の再起動）を乗り越えて、データがアップストリームプロバイダーに送信されることを保証できます。

データをディスクにキューイングする際の考慮事項

ディスクのパフォーマンスにより、データスループットのパフォーマンスに影響を与える可能性があります。

参考資料

設定の確認

拡張機能について学んだので、設定の変更を確認しましょう。

Check-in設定を確認する

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:

exporters:
  debug:
    verbosity: detailed

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

拡張機能について確認したので、次はワークショップのデータパイプライン部分に進みましょう。パイプラインは、Collector 内でデータが辿る経路を定義し、受信から始まり、さらなる処理や変更を経て、最終的にエクスポーターを通じて Collector を出ていきます。

OpenTelemetry Collector のデータパイプラインは、レシーバー、プロセッサー、エクスポーターで構成されています。まずはレシーバーから始めます。

OpenTelemetry Collector Receivers

ワークショップの Receiver セクションへようこそ！ここは OpenTelemetry Collector のデータパイプラインの出発点です。早速始めましょう。

Receiver は、プッシュベースまたはプルベースであり、データを Collector に取り込む方法です。Receiver は1つ以上のデータソースをサポートできます。一般的に、Receiver は指定された形式でデータを受け取り、内部形式に変換してから、該当するパイプラインで定義された Processor と Exporter に渡します。

%%{
  init:{
    "theme":"base",
    "themeVariables": {
      "primaryColor": "#ffffff",
      "clusterBkg": "#eff2fb",
      "defaultLinkColor": "#333333"
    }
  }
}%%

flowchart LR;
    style M fill:#e20082,stroke:#333,stroke-width:4px,color:#fff
    subgraph Collector
    A[OTLP] --> M(Receivers)
    B[JAEGER] --> M(Receivers)
    C[Prometheus] --> M(Receivers)
    end
    subgraph Processors
    M(Receivers) --> H(Filters, Attributes, etc)
    E(Extensions)
    end
    subgraph Exporters
    H(Filters, Attributes, etc) --> S(OTLP)
    H(Filters, Attributes, etc) --> T(JAEGER)
    H(Filters, Attributes, etc) --> U(Prometheus)
    end

OpenTelemetry Collector Receivers

Host Metrics Receiver

Host Metrics Receiver は、さまざまなソースからスクレイピングしたホストシステムに関するメトリクスを生成します。これは、Collector がエージェントとしてデプロイされる場合に使用することを想定しており、このワークショップでもその方法を採用します。

/etc/otel-contrib/config.yaml ファイルを更新して、hostmetrics Receiver を設定しましょう。以下の YAML を receivers セクションの下に挿入してください。インデントはスペース2つで行います。

sudo vi /etc/otelcol-contrib/config.yaml

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      # CPU utilization metrics
      cpu:
      # Disk I/O metrics
      disk:
      # File System utilization metrics
      filesystem:
      # Memory utilization metrics
      memory:
      # Network interface I/O metrics & TCP connection metrics
      network:
      # CPU load metrics
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Process count metrics
      processes:
      # Per process CPU, Memory and Disk I/O metrics. Disabled by default.
      # process:

OpenTelemetry Collector Receivers

Prometheus Receiver

prometheus という別の Receiver があることにも気づくでしょう。Prometheus は、OpenTelemetry Collector が使用するオープンソースのツールキットです。この Receiver は、OpenTelemetry Collector 自体からメトリクスをスクレイピングするために使用されます。これらのメトリクスは、Collector の健全性を監視するために使用できます。

prometheus Receiver を変更して、Collector 自体からメトリクスを収集するためのものであることを明確にしましょう。Receiver の名前を prometheus から prometheus/internal に変更することで、その Receiver が何をしているかがより明確になります。設定ファイルを以下のように更新してください

prometheus/internal:
  config:
    scrape_configs:
    - job_name: 'otel-collector'
      scrape_interval: 10s
      static_configs:
      - targets: ['0.0.0.0:8888']

ダッシュボード例 - Prometheus メトリクス

以下のスクリーンショットは、Prometheus internal Receiver が OpenTelemetry Collector から収集するメトリクスの一部を表示するダッシュボード例です。ここでは、受け入れられたスパン、メトリクス、ログレコードと送信されたものを確認できます。

メモ

以下のスクリーンショットは、Splunk Observability Cloud の標準（OOTB）ダッシュボードで、Splunk OpenTelemetry Collector のインストール状況を簡単に監視できます。

OpenTelemetry Collector Receivers

その他の Receiver

デフォルト設定には、otlp、opencensus、jaeger、zipkin などの他の Receiver があることに気づくでしょう。これらは他のソースからテレメトリデータを受信するために使用されます。このワークショップではこれらの Receiver については取り上げませんので、そのままにしておいてください。

Ninja: Receiver を動的に作成する

Docker コンテナ、Kubernetes Pod、SSH セッションなどの短期間のタスクを監視するために、receiver creator と observer extensions を使用して、これらのサービスが起動するときに新しい Receiver を作成できます。

何が必要ですか？

receiver creator とそれに関連する observer extension を使い始めるには、それらが Collector のビルドマニフェストに含まれている必要があります。

詳細は installation を参照してください。

考慮すべき事項

一部の短期間のタスクでは、username や password などの追加設定が必要な場合があります。これらの値は環境変数で参照するか、 ${file:./path/to/database/password} のようなスキーム展開構文を使用できます。この方法を採用する場合は、組織のシークレット管理のベストプラクティスに従ってください。

Ninja ゾーン

この Ninja ゾーンに必要なことは2つだけです

ビルダーマニフェストに receiver creator と observer extension が追加されていることを確認します。
検出されたエンドポイントとマッチングするために使用できる設定を作成します。

テンプレート化された設定を作成するには、以下のようにします

receiver_creator:
  watch_observers: [host_observer]
  receivers:
    redis:
      rule: type == "port" && port == 6379
      config:
        password: ${env:HOST_REDIS_PASSWORD}

その他の例については、receiver creator の例を参照してください。

設定の確認

Receiver について説明しましたので、設定の変更を確認しましょう。

Check-in設定を確認する

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      # CPU utilization metrics
      cpu:
      # Disk I/O metrics
      disk:
      # File System utilization metrics
      filesystem:
      # Memory utilization metrics
      memory:
      # Network interface I/O metrics & TCP connection metrics
      network:
      # CPU load metrics
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Process count metrics
      processes:
      # Per process CPU, Memory and Disk I/O metrics. Disabled by default.
      # process:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus/internal:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:

exporters:
  debug:
    verbosity: detailed

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

Receiver を通じてデータが OpenTelemetry Collector にどのように取り込まれるかを確認しました。次は、Collector が受信したデータをどのように処理するかを見ていきましょう。

警告

/etc/otelcol-contrib/config.yaml はまだ完成していないため、この時点では Collector を再起動しないでください。

OpenTelemetry Collector Processors

Processors は、データを受信してからエクスポートするまでの間に実行されます。Processors はオプションですが、一部は推奨されています。OpenTelemetry contrib Collector には 多数の Processors が含まれています。

%%{
  init:{
    "theme":"base",
    "themeVariables": {
      "primaryColor": "#ffffff",
      "clusterBkg": "#eff2fb",
      "defaultLinkColor": "#333333"
    }
  }
}%%

flowchart LR;
    style Processors fill:#e20082,stroke:#333,stroke-width:4px,color:#fff
    subgraph Collector
    A[OTLP] --> M(Receivers)
    B[JAEGER] --> M(Receivers)
    C[Prometheus] --> M(Receivers)
    end
    subgraph Processors
    M(Receivers) --> H(Filters, Attributes, etc)
    E(Extensions)
    end
    subgraph Exporters
    H(Filters, Attributes, etc) --> S(OTLP)
    H(Filters, Attributes, etc) --> T(JAEGER)
    H(Filters, Attributes, etc) --> U(Prometheus)
    end

OpenTelemetry Collector Processors

Batch Processor

デフォルトでは、batch processor のみが有効になっています。この processor は、データをエクスポートする前にバッチ処理するために使用されます。これは、exporter へのネットワーク呼び出しの回数を減らすのに役立ちます。このワークショップでは、Collector にハードコードされている以下のデフォルト値を継承します

send_batch_size（デフォルト = 8192）：タイムアウトに関係なくバッチが送信されるスパン、メトリクスデータポイント、またはログレコードの数。send_batch_size はトリガーとして機能し、バッチのサイズには影響しません。パイプラインの次のコンポーネントに送信されるバッチサイズの制限を強制する必要がある場合は、send_batch_max_size を参照してください。
timeout（デフォルト = 200ms）：サイズに関係なくバッチが送信されるまでの時間。ゼロに設定すると、send_batch_max_size のみに従ってデータが即座に送信されるため、send_batch_size は無視されます。
send_batch_max_size（デフォルト = 0）：バッチサイズの上限。0 はバッチサイズに上限がないことを意味します。このプロパティは、大きなバッチを小さな単位に分割することを保証します。send_batch_size 以上である必要があります。

Batch processor の詳細については、Batch Processor のドキュメントを参照してください。

OpenTelemetry Collector Processors

Resource Detection Processor

resourcedetection processor は、ホストからリソース情報を検出し、テレメトリデータのリソース値にこの情報を追加または上書きするために使用できます。

デフォルトでは、ホスト名は可能であれば FQDN に設定され、それ以外の場合は OS が提供するホスト名がフォールバックとして使用されます。このロジックは hostname_sources 設定オプションを使用して変更できます。FQDN を取得せずに OS が提供するホスト名を使用するには、hostname_sources を os に設定します。

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]

ワークショップインスタンスが AWS/EC2 インスタンスで実行されている場合、EC2 メタデータ API から以下のタグを収集できます（これは他のプラットフォームでは利用できません）。

cloud.provider ("aws")
cloud.platform ("aws_ec2")
cloud.account.id
cloud.region
cloud.availability_zone
host.id
host.image.id
host.name
host.type

これらのタグをメトリクスに追加するために、別の processor を作成します。

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]
  resourcedetection/ec2:
    detectors: [ec2]

OpenTelemetry Collector Processors

Attributes Processor

attributes processor は、スパン、ログ、またはメトリクスの属性を変更します。この processor は、指定されたアクションに含めるか除外するかを決定するために、入力データをフィルタリングおよびマッチングする機能もサポートしています。

設定で指定された順序で実行されるアクションのリストを受け取ります。サポートされているアクションは以下の通りです

insert：キーがまだ存在しない入力データに新しい属性を挿入します。
update：キーが存在する入力データの属性を更新します。
upsert：insert または update を実行します。キーがまだ存在しない入力データに新しい属性を挿入し、キーが存在する入力データの属性を更新します。
delete：入力データから属性を削除します。
hash：既存の属性値をハッシュ化（SHA1）します。
extract：正規表現ルールを使用して入力キーからルールで指定されたターゲットキーに値を抽出します。ターゲットキーがすでに存在する場合は上書きされます。

すべてのホストメトリクスに participant.name という新しい属性を insert する attributes processor を作成します。値にはあなたの名前（例：marge_simpson）を設定します。

警告

INSERT_YOUR_NAME_HERE を必ずあなたの名前に置き換えてください。また、名前にスペースを使用しないようにしてください。

ワークショップの後半で、Splunk Observability Cloud でメトリクスをフィルタリングするためにこの属性を使用します。

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]
  resourcedetection/ec2:
    detectors: [ec2]
  attributes/conf:
    actions:
      - key: participant.name
        action: insert
        value: "INSERT_YOUR_NAME_HERE"

Ninja: connector を使用して内部インサイトを取得する

Collector への最新の追加機能の1つは connector の概念で、あるパイプラインの出力を別のパイプラインの入力に接続できます。

これが有益な例として、一部のサービスはエクスポートされるデータポイントの量、エラーステータスを含むログの数、または特定のデプロイ環境から送信されるデータ量に基づいてメトリクスを出力します。count connector はこれをすぐに使える形で対処するのに役立ちます。

processor の代わりに connector を使用する理由

processor は処理したデータを渡す必要があるため、追加データを生成する点で制限があり、追加情報を公開するのが困難です。connector は受信したデータを出力する必要がないため、求めているインサイトを作成する機会を提供します。

例えば、デプロイ環境属性を持たないログ、メトリクス、トレースの数をカウントする connector を作成できます。

デプロイ環境別にデータ使用量を分類できる非常にシンプルな例です。

connector に関する考慮事項

connector は、あるパイプラインからエクスポートされ、別のパイプラインで受信されたデータのみを受け入れます。これは、connector を活用するために Collector の設定をどのように構成するかを検討する必要があることを意味します。

参考資料

設定の確認

processors について説明しました。設定の変更を確認しましょう。

Check-in設定を確認する

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      # CPU utilization metrics
      cpu:
      # Disk I/O metrics
      disk:
      # File System utilization metrics
      filesystem:
      # Memory utilization metrics
      memory:
      # Network interface I/O metrics & TCP connection metrics
      network:
      # CPU load metrics
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Process count metrics
      processes:
      # Per process CPU, Memory and Disk I/O metrics. Disabled by default.
      # process:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus/internal:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]
  resourcedetection/ec2:
    detectors: [ec2]
  attributes/conf:
    actions:
      - key: participant.name
        action: insert
        value: "INSERT_YOUR_NAME_HERE"

exporters:
  debug:
    verbosity: detailed

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

OpenTelemetry Collector Exporters

Exporter はプッシュ型またはプル型で、1つ以上のバックエンド/宛先にデータを送信する方法です。Exporter は1つ以上のデータソースをサポートする場合があります。

このワークショップでは、otlphttp exporter を使用します。OpenTelemetry Protocol (OTLP) は、テレメトリデータを送信するためのベンダー中立で標準化されたプロトコルです。OTLP exporter は、OTLP プロトコルを実装しているサーバーにデータを送信します。OTLP exporter は gRPC と HTTP/JSON の両方のプロトコルをサポートしています。

%%{
  init:{
    "theme":"base",
    "themeVariables": {
      "primaryColor": "#ffffff",
      "clusterBkg": "#eff2fb",
      "defaultLinkColor": "#333333"
    }
  }
}%%

flowchart LR;
    style Exporters fill:#e20082,stroke:#333,stroke-width:4px,color:#fff
    subgraph Collector
    A[OTLP] --> M(Receivers)
    B[JAEGER] --> M(Receivers)
    C[Prometheus] --> M(Receivers)
    end
    subgraph Processors
    M(Receivers) --> H(Filters, Attributes, etc)
    E(Extensions)
    end
    subgraph Exporters
    H(Filters, Attributes, etc) --> S(OTLP)
    H(Filters, Attributes, etc) --> T(JAEGER)
    H(Filters, Attributes, etc) --> U(Prometheus)
    end

OpenTelemetry Collector Exporters

OTLP HTTP Exporter

HTTP 経由で Splunk Observability Cloud にメトリクスを送信するには、otlphttp exporter を設定する必要があります。

/etc/otelcol-contrib/config.yaml ファイルを編集して、otlphttp exporter を設定しましょう。以下の YAML を exporters セクションの下に挿入してください。インデントは2スペースで行ってください。

また、ディスクがいっぱいにならないように、logging exporter の詳細度を変更します。デフォルトの detailed は非常に冗長です。

exporters:
  logging:
    verbosity: normal
  otlphttp/splunk:

次に、metrics_endpoint を定義してターゲット URL を設定する必要があります。

メモ

Splunk 主催のワークショップに参加されている場合、使用しているインスタンスにはすでに Realm 環境変数が設定されています。設定ファイルでその環境変数を参照します。それ以外の場合は、新しい環境変数を作成して Realm を設定する必要があります。例

export REALM="us1"

使用する URL は https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlp です。（Splunk は、データレジデンシーのために世界中の主要な地理的場所に Realm を持っています）。

otlphttp exporter は、traces_endpoint と logs_endpoint のターゲット URL を定義することで、トレースとログを送信するように設定することもできます。これらの設定は、このワークショップの範囲外です。

exporters:
  logging:
    verbosity: normal
  otlphttp/splunk:
    metrics_endpoint: https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlp

デフォルトでは、すべてのエンドポイントで gzip 圧縮が有効になっています。これは、exporter 設定で compression: none を設定することで無効にできます。このワークショップでは、データを送信する最も効率的な方法であるため、圧縮を有効のままにしてデフォルトを使用します。

Splunk Observability Cloud にメトリクスを送信するには、アクセストークンを使用する必要があります。これは、Splunk Observability Cloud UI で新しいトークンを作成することで行えます。トークンの作成方法の詳細については、Create a token を参照してください。トークンは INGEST タイプである必要があります。

メモ

Splunk 主催のワークショップに参加されている場合、使用しているインスタンスにはすでにアクセストークンが設定されています（環境変数として設定されています）。設定ファイルでその環境変数を参照します。それ以外の場合は、新しいトークンを作成して環境変数として設定する必要があります。例

export ACCESS_TOKEN=<replace-with-your-token>

トークンは、headers: セクションの下に X-SF-TOKEN: ${env:ACCESS_TOKEN} を挿入することで設定ファイルに定義されます

exporters:
  logging:
    verbosity: normal
  otlphttp/splunk:
    metrics_endpoint: https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlp
    headers:
      X-SF-TOKEN: ${env:ACCESS_TOKEN}

設定の確認

Exporter について説明したので、設定の変更を確認しましょう

Check-inReview your configuration

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      # CPU utilization metrics
      cpu:
      # Disk I/O metrics
      disk:
      # File System utilization metrics
      filesystem:
      # Memory utilization metrics
      memory:
      # Network interface I/O metrics & TCP connection metrics
      network:
      # CPU load metrics
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Process count metrics
      processes:
      # Per process CPU, Memory and Disk I/O metrics. Disabled by default.
      # process:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus/internal:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]
  resourcedetection/ec2:
    detectors: [ec2]
  attributes/conf:
    actions:
      - key: participant.name
        action: insert
        value: "INSERT_YOUR_NAME_HERE"

exporters:
  debug:
    verbosity: normal
  otlphttp/splunk:
    metrics_endpoint: https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlp
    headers:
      X-SF-Token: ${env:ACCESS_TOKEN}

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

もちろん、OTLP プロトコルをサポートする他のソリューションを指すように metrics_endpoint を簡単に設定できます。

次に、config.yaml の service セクションで、設定した receivers、processors、exporters を有効にする必要があります。

OpenTelemetry Collector Service

Service セクションは、receivers、processors、exporters、extensions セクションで定義された設定に基づいて、Collector でどのコンポーネントを有効にするかを設定するために使用します。

情報

コンポーネントが設定されていても、Service セクション内で定義されていない場合、そのコンポーネントは有効になりません。

service セクションは3つのサブセクションで構成されています

extensions
pipelines
telemetry

デフォルト設定では、extension セクションは health_check、pprof、zpages を有効にするよう設定されています。これらは先ほど Extensions モジュールで設定しました。

service:
  extensions: [health_check, pprof, zpages]

それでは、Metric Pipeline を設定しましょう！

OpenTelemetry Collector Service

Hostmetrics Receiver

ワークショップの Receivers セクションで、さまざまなソースからスクレイプされるホストシステムに関するメトリクスを生成する Host Metrics Receiver を定義したことを思い出してください。この receiver を有効にするには、metrics パイプラインに hostmetrics receiver を含める必要があります。

metrics パイプラインで、metrics の receivers セクションに hostmetrics を追加します。

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus]
      processors: [batch]
      exporters: [debug]

OpenTelemetry Collector Service

Prometheus Internal Receiver

ワークショップの前半で、Collector 内部のメトリクスを収集していることを反映するために prometheus receiver の名前を prometheus/internal に変更しました。

ここで、metrics パイプラインで prometheus/internal receiver を有効にする必要があります。metrics パイプラインの receivers セクションに prometheus/internal を含めるように更新します

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus/internal]
      processors: [batch]
      exporters: [debug]

OpenTelemetry Collector Service

Resource Detection Processor

また、Collector がインスタンスのホスト名と AWS/EC2 メタデータをキャプチャできるように、resourcedetection/system と resourcedetection/ec2 processor を追加しました。ここで、metrics パイプラインでこれら2つの processor を有効にする必要があります。

metrics パイプラインの processors セクションに resourcedetection/system と resourcedetection/ec2 を含めるように更新します

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus/internal]
      processors: [batch, resourcedetection/system, resourcedetection/ec2]
      exporters: [debug]

OpenTelemetry Collector Service

Attributes Processor

また、このワークショップの Processors セクションで、Collector がすべてのメトリクスに participant.name という新しい属性を挿入するように attributes/conf processor を追加しました。ここで、metrics パイプラインでこれを有効にする必要があります。

metrics パイプラインの processors セクションに attributes/conf を含めるように更新します

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus/internal]
      processors: [batch, resourcedetection/system, resourcedetection/ec2, attributes/conf]
      exporters: [debug]

OpenTelemetry Collector Service

OTLP HTTP Exporter

ワークショップの Exporters セクションで、メトリクスを Splunk Observability Cloud に送信するための otlphttp exporter を設定しました。ここで、metrics パイプラインでこれを有効にする必要があります。

metrics パイプラインの exporters セクションに otlphttp/splunk を含めるように更新します

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus/internal]
      processors: [batch, resourcedetection/system, resourcedetection/ec2, attributes/conf]
      exporters: [debug, otlphttp/splunk]

Ninja: Collector の内部を観察する

Collector は、実行中のコンポーネントからの追加シグナルを含む、自身の動作に関する内部シグナルをキャプチャします。これは、データフローに関する判断を行うコンポーネントが、その情報をメトリクスまたはトレースとして公開する方法を必要とするためです。

なぜ Collector を監視するのか？

これは「監視者を誰が監視するのか？」という、鶏と卵のような問題ですが、この情報を公開できることは重要です。Collector の歴史において興味深い点は、Go メトリクス SDK が安定版と見なされる前から存在していたため、当面の間、Collector はこの機能を提供するために Prometheus エンドポイントを公開しているということです。

考慮事項

組織内で実行中の各 Collector の内部使用状況を監視すると、大量の新しい Metric Time Series (MTS) が発生する可能性があります。Splunk ディストリビューションでは、これらのメトリクスが厳選されており、予想される増加を予測するのに役立ちます。

Ninja Zone

Collector の内部オブザーバビリティを公開するために、いくつかの追加設定を調整できます

service:
  telemetry:
    logs:
      level: <info|warn|error>
      development: <true|false>
      encoding: <console|json>
      disable_caller: <true|false>
      disable_stacktrace: <true|false>
      output_paths: [<stdout|stderr>, paths...]
      error_output_paths: [<stdout|stderr>, paths...]
      initial_fields:
        key: value
    metrics:
      level: <none|basic|normal|detailed>
      # Address binds the promethues endpoint to scrape
      address: <hostname:port>

service:
  telemetry:
    logs:
      level: info
      encoding: json
      disable_stacktrace: true
      initial_fields:
        instance.name: ${env:INSTANCE}
    metrics:
      address: localhost:8888

参考資料

https://opentelemetry.io/docs/collector/configuration/#service

最終設定

Check-in最終設定を確認する

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
# To limit exposure to denial of service attacks, change the host in endpoints below from 0.0.0.0 to a specific network interface.
# See https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  pprof:
    endpoint: 0.0.0.0:1777
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      # CPU utilization metrics
      cpu:
      # Disk I/O metrics
      disk:
      # File System utilization metrics
      filesystem:
      # Memory utilization metrics
      memory:
      # Network interface I/O metrics & TCP connection metrics
      network:
      # CPU load metrics
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Process count metrics
      processes:
      # Per process CPU, Memory and Disk I/O metrics. Disabled by default.
      # process:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  opencensus:
    endpoint: 0.0.0.0:55678

  # Collect own metrics
  prometheus/internal:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_binary:
        endpoint: 0.0.0.0:6832
      thrift_compact:
        endpoint: 0.0.0.0:6831
      thrift_http:
        endpoint: 0.0.0.0:14268

  zipkin:
    endpoint: 0.0.0.0:9411

processors:
  batch:
  resourcedetection/system:
    detectors: [system]
    system:
      hostname_sources: [os]
  resourcedetection/ec2:
    detectors: [ec2]
  attributes/conf:
    actions:
      - key: participant.name
        action: insert
        value: "INSERT_YOUR_NAME_HERE"

exporters:
  debug:
    verbosity: normal
  otlphttp/splunk:
    metrics_endpoint: https://ingest.${env:REALM}.signalfx.com/v2/datapoint/otlp
    headers:
      X-SF-Token: ${env:ACCESS_TOKEN}

service:

  pipelines:

    traces:
      receivers: [otlp, opencensus, jaeger, zipkin]
      processors: [batch]
      exporters: [debug]

    metrics:
      receivers: [hostmetrics, otlp, opencensus, prometheus/internal]
      processors: [batch, resourcedetection/system, resourcedetection/ec2, attributes/conf]
      exporters: [debug, otlphttp/splunk]

    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]

  extensions: [health_check, pprof, zpages]

ヒント

Collector を再起動する前に、設定ファイルを検証することをお勧めします。config.yaml ファイルの内容を otelbin.io に貼り付けることで検証できます。

ScreenshotOTelBin

これで動作する設定ができたので、Collector を起動して zPages が何を報告しているか確認しましょう。

otelcol-contrib --config=file:/etc/otelcol-contrib/config.yaml

ブラウザで zPages を開きます：http://localhost:55679/debug/pipelinez（localhost を自分の環境に合わせて変更してください）。

データの可視化

Splunk Observability Cloud

OpenTelemetry Collector からメトリクスを Splunk Observability Cloud に送信する設定が完了したので、Splunk Observability Cloud でデータを確認してみましょう。Splunk Observability Cloud への招待を受け取っていない場合は、インストラクターがログイン資格情報を提供します。

その前に、少し面白くするためにインスタンスでストレステストを実行してみましょう。これによりダッシュボードが活性化されます。

sudo apt install stress
while true; do stress -c 2 -t 40; stress -d 5 -t 40; stress -m 20 -t 40; done

Splunk Observability Cloud にログインしたら、左側のナビゲーションを使用してメインメニューから Dashboards に移動します。これによりチームビューが表示されます。このビューの上部にある All Dashboards をクリックします

検索ボックスで OTel Contrib を検索します

情報

ダッシュボードが存在しない場合は、インストラクターがすぐに追加できます。Splunk 主催のワークショップに参加していない場合は、インポートするダッシュボードグループをこのページの下部で見つけることができます。

OTel Contrib Dashboard ダッシュボードをクリックして開き、次にダッシュボード上部の Participant Name ボックスをクリックして、config.yaml で participant.name に設定した名前をドロップダウンリストから選択するか、名前を入力して検索します

これで、OpenTelemetry Collector を設定したホストのホストメトリクスを確認できます。

Download Dashboard Group JSON for importing

OpenTelemetry-Contrib-Dashboard-Group.json (40 KB)

OpenTelemetry Collector 開発

カスタムコンポーネントの開発

OpenTelemetry Collector 用のコンポーネントを構築するには、3つの主要な部分が必要です

Configuration - ユーザーに公開される設定値
Factory - 提供された値を使用してコンポーネントを作成する
Business Logic - コンポーネントが実行する必要がある処理

ここでは、プロジェクトの重要な DevOps メトリクスを追跡できるように、Jenkins と連携するコンポーネントを構築する例を使用します。

測定しようとしているメトリクスは以下の通りです

Lead time for changes - “コミットが本番環境にデプロイされるまでにかかる時間”
Change failure rate - “本番環境で障害を引き起こすデプロイメントの割合”
Deployment frequency - "[チーム]が本番環境に正常にリリースする頻度"
Mean time to recover - "[チーム]が本番環境の障害から復旧するまでにかかる時間"

これらの指標は、ソフトウェア開発チームのパフォーマンスを示すために、Google の DevOps Research and Assessment（DORA）[^1] チームによって特定されました。Jenkins CI を選択した理由は、同じオープンソースソフトウェアのエコシステム内にとどまることで、将来ベンダーが管理する CI ツールが採用するための例として機能できるからです。

計装 vs コンポーネント

組織内のオブザーバビリティのレベルを向上させる際に考慮すべきことがあります。いくつかのトレードオフが生じるためです。

	メリット	デメリット
（自動）計装	システムを監視するために外部 API を監視する必要がありません。	計装を変更するにはプロジェクトへの変更が必要です。
	システムオーナー/開発者がオブザーバビリティを変更する権限を持てます。	追加のランタイム依存関係が必要です。
	システムコンテキストを理解し、キャプチャしたデータを Exemplars と関連付けることができます。	システムのパフォーマンスに影響を与える可能性があります。
コンポーネント	データ名やセマンティクスへの変更をシステムのリリースサイクルとは独立して展開できます。	API の破壊的な変更には、システムと Collector の間で調整されたリリースが必要です。
	収集されるデータの更新/拡張は、ユーザーにとってシームレスな変更です。	キャプチャされたデータのセマンティクスが、新しいシステムリリースと一致しない形で予期せず壊れる可能性があります。
	サポートチームがオブザーバビリティの実践について深い理解を持つ必要がありません。	システムから外部に公開された情報のみを表面化できます。

OpenTelemetry Collector 開発

プロジェクトのセットアップ Ninja

メモ

このワークショップセクションを完了するまでの時間は、経験によって異なります。

行き詰まった場合やインストラクターに沿って進めたい場合は、こちらに完全なソリューションがあります。

新しい Jenkins CI レシーバーの開発を始めるには、まず Golang プロジェクトをセットアップする必要があります。新しい Golang プロジェクトを作成する手順は以下の通りです

${HOME}/go/src/jenkinscireceiver という名前の新しいディレクトリを作成し、そのディレクトリに移動します
1. 実際のディレクトリ名や場所は厳密ではなく、独自の開発ディレクトリを選択して作成できます。
go mod init splunk.conf/workshop/example/jenkinscireceiver を実行して golang モジュールを初期化します
1. これにより go.mod というファイルが作成され、直接的および間接的な依存関係を追跡するために使用されます
2. 最終的には、インポートされる依存関係のチェックサム値である go.sum が生成されます。

Check-ingo.mod を確認する

module splunk.conf/workshop/example/jenkinscireceiver

go 1.20

OpenTelemetry Collector 開発

Configuration の構築

コンポーネントの Configuration 部分は、ユーザーがコンポーネントに対して入力を行う方法です。そのため、設定に使用される値は以下の条件を満たす必要があります

そのフィールドが何を制御するかをユーザーが直感的に理解できること
必須と任意を明確にすること
一般的な名前とフィールドを再利用すること
オプションをシンプルに保つこと

---
jenkins_server_addr: hostname
jenkins_server_api_port: 8089
interval: 10m
filter_builds_by:
    - name: my-awesome-build
      status: amber
track:
    values:
        example.metric.1: yes
        example.metric.2: yes
        example.metric.3: no
        example.metric.4: no

---
# Required Values
endpoint: http://my-jenkins-server:8089
auth:
    authenticator: basicauth/jenkins
# Optional Values
collection_interval: 10m
metrics:
    example.metric.1:
        enabled: true
    example.metric.2:
        enabled: true
    example.metric.3:
        enabled: true
    example.metric.4:
        enabled: true

悪い設定例は、設定のベストプラクティスの逆を行うことがコンポーネントの使いやすさにどのように影響するかを示しています。フィールド値が何であるべきかが明確ではなく、既存のプロセッサにプッシュできる機能が含まれており、フィールド名が Collector に存在する他のコンポーネントと一貫していません。

良い設定例は、必須の値をシンプルに保ち、他のコンポーネントからフィールド名を再利用し、コンポーネントが Jenkins と Collector 間の相互作用のみに焦点を当てることを確保しています。

コードタブは、私たちが追加する必要がある量と、Collector 内の共有ライブラリによってすでに提供されているものを示しています。これらはビジネスロジックに到達したときにより詳細に説明します。Configuration は小さく始まり、追加機能が必要になるとビジネスロジックが含まれるようになると変更されます。

コードを書く

Configuration に必要なコードを実装するために、以下の内容で config.go という新しいファイルを作成します

package jenkinscireceiver

import (
    "go.opentelemetry.io/collector/config/confighttp"
    "go.opentelemetry.io/collector/receiver/scraperhelper"

    "splunk.conf/workshop/example/jenkinscireceiver/internal/metadata"
)

type Config struct {
    // HTTPClientSettings contains all the values
    // that are commonly shared across all HTTP interactions
    // performed by the collector.
    confighttp.HTTPClientSettings `mapstructure:",squash"`
    // ScraperControllerSettings will allow us to schedule
    // how often to check for updates to builds.
    scraperhelper.ScraperControllerSettings `mapstructure:",squash"`
    // MetricsBuilderConfig contains all the metrics
    // that can be configured.
    metadata.MetricsBuilderConfig `mapstructure:",squash"`
}

OpenTelemetry Collector 開発

コンポーネントのレビュー

Jenkins からメトリクスをキャプチャするために必要なコンポーネントの種類を振り返ります

Extension が解決するビジネスユースケースは以下の通りです

ランタイム設定が必要な共有機能を持つこと
Collector のランタイムを観測することを間接的に支援すること

詳細は Extensions の概要 を参照してください。

Receiver が解決するビジネスユースケースは以下の通りです

リモートソースからデータをフェッチする
リモートソースからデータを受信する

これは一般的に pull 型と push 型のデータ収集と呼ばれ、詳細は Receiver の概要で読むことができます。

Processor が解決するビジネスユースケースは以下の通りです

データ、フィールド、または値の追加や削除
データを観測し、意思決定を行う
バッファリング、キューイング、並び替え

留意すべき点は、Processor を流れるデータタイプは、下流のコンポーネントに同じデータタイプを転送する必要があるということです。詳細は Processor の概要をお読みください。

Exporter が解決するビジネスユースケースは以下の通りです

ツール、サービス、またはストレージにデータを送信する

OpenTelemetry Collector は「バックエンド」、つまりオールインワンのオブザーバビリティスイートになることを望んでおらず、むしろ OpenTelemetry を創設した原則を維持しています。つまり、すべての人のためのベンダーに依存しないオブザーバビリティです。詳細を再確認するには、Exporter の概要 をお読みください。

これはワークショップで見逃されたコンポーネントタイプです。比較的新しい Collector への追加であるためです。Connector を考える最良の方法は、異なるテレメトリタイプとパイプライン間で使用できる Processor のようなものです。つまり、Connector はログとしてデータを受け入れ、メトリクスを出力したり、あるパイプラインからメトリクスを受け入れ、観測したデータに関するメトリクスを提供したりできます。

Connector が解決するビジネスケースは以下の通りです

異なるテレメトリタイプ間の変換
- ログからメトリクス
- トレースからメトリクス
- メトリクスからログ
受信データを観測し、独自のデータを生成する
- メトリクスを受け入れ、データの分析メトリクスを生成する。

Processor の概要の Ninja セクションに簡単な概要がありました。新しい Connector コンポーネントの更新についてはプロジェクトを確認してください。

コンポーネントの概要から、Jenkins 用のプルベースのレシーバーを開発することが明確です。

OpenTelemetry Collector 開発

メトリクスの設計

レシーバーでキャプチャしたメトリクスを定義およびエクスポートするために、mdatagen を使用します。これは、YAML で定義されたメトリクスをコードに変換する Collector 用に開発されたツールです。

---
# Type defines the name to reference the component
# in the configuration file
type: jenkins

# Status defines the component type and the stability level
status:
  class: receiver
  stability:
    development: [metrics]

# Attributes are the expected fields reported
# with the exported values.
attributes:
  job.name:
    description: The name of the associated Jenkins job
    type: string
  job.status:
    description: Shows if the job had passed, or failed
    type: string
    enum:
    - failed
    - success
    - unknown

# Metrics defines all the pontentially exported values from this receiver.
metrics:
  jenkins.jobs.count:
    enabled: true
    description: Provides a count of the total number of configured jobs
    unit: "{Count}"
    gauge:
      value_type: int
  jenkins.job.duration:
    enabled: true
    description: Show the duration of the job
    unit: "s"
    gauge:
      value_type: int
    attributes:
    - job.name
    - job.status
  jenkins.job.commit_delta:
    enabled: true
    description: The calculation difference of the time job was finished minus commit timestamp
    unit: "s"
    gauge:
      value_type: int
    attributes:
    - job.name
    - job.status

// To generate the additional code needed to capture metrics,
// the following command to be run from the shell:
//  go generate -x ./...

//go:generate go run github.com/open-telemetry/opentelemetry-collector-contrib/cmd/mdatagen@v0.80.0 metadata.yaml
package jenkinscireceiver

// There is no code defined within this file.

次のセクションに進む前に、これらのファイルをプロジェクトフォルダー内に作成してください。

Factory の構築

Factory は、オブジェクト（この場合は jenkinscireceiver）を提供された設定で動的に作成できるようにするソフトウェアデザインパターンです。より現実世界の例を使うと、電話ショップに行き、自分の説明と正確に一致する電話を求め、それを提供してもらうようなものです。

go generate -x ./... コマンドを実行すると、定義されたメトリクスをエクスポートするために必要なすべてのコードを含む新しいフォルダー jenkinscireceiver/internal/metadata が作成されます。必要なコードは以下の通りです

package jenkinscireceiver

import (
    "errors"

    "go.opentelemetry.io/collector/component"
    "go.opentelemetry.io/collector/config/confighttp"
    "go.opentelemetry.io/collector/receiver"
    "go.opentelemetry.io/collector/receiver/scraperhelper"

    "splunk.conf/workshop/example/jenkinscireceiver/internal/metadata"
)

func NewFactory() receiver.Factory {
    return receiver.NewFactory(
        metadata.Type,
        newDefaultConfig,
        receiver.WithMetrics(newMetricsReceiver, metadata.MetricsStability),
    )
}

func newMetricsReceiver(_ context.Context, set receiver.CreateSettings, cfg component.Config, consumer consumer.Metrics) (receiver.Metrics, error) {
    // Convert the configuration into the expected type
    conf, ok := cfg.(*Config)
    if !ok {
        return nil, errors.New("can not convert config")
    }
    sc, err := newScraper(conf, set)
    if err != nil {
        return nil, err
    }
    return scraperhelper.NewScraperControllerReceiver(
        &conf.ScraperControllerSettings,
        set,
        consumer,
        scraperhelper.AddScraper(sc),
    )
}

package jenkinscireceiver

import (
    "go.opentelemetry.io/collector/config/confighttp"
    "go.opentelemetry.io/collector/receiver/scraperhelper"

    "splunk.conf/workshop/example/jenkinscireceiver/internal/metadata"
)

type Config struct {
    // HTTPClientSettings contains all the values
    // that are commonly shared across all HTTP interactions
    // performed by the collector.
    confighttp.HTTPClientSettings `mapstructure:",squash"`
    // ScraperControllerSettings will allow us to schedule
    // how often to check for updates to builds.
    scraperhelper.ScraperControllerSettings `mapstructure:",squash"`
    // MetricsBuilderConfig contains all the metrics
    // that can be configured.
    metadata.MetricsBuilderConfig `mapstructure:",squash"`
}

func newDefaultConfig() component.Config {
    return &Config{
        ScraperControllerSettings: scraperhelper.NewDefaultScraperControllerSettings(metadata.Type),
        HTTPClientSettings:        confighttp.NewDefaultHTTPClientSettings(),
        MetricsBuilderConfig:      metadata.DefaultMetricsBuilderConfig(),
    }
}

package jenkinscireceiver

type scraper struct {}

func newScraper(cfg *Config, set receiver.CreateSettings) (scraperhelper.Scraper, error) {
    // Create a our scraper with our values
    s := scraper{
        // To be filled in later
    }
    return scraperhelper.NewScraper(metadata.Type, s.scrape)
}

func (scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
    // To be filled in
    return pmetrics.NewMetrics(), nil
}

---
dist:
  name: otelcol
  description: "Conf workshop collector"
  output_path: ./dist
  version: v0.0.0-experimental

extensions:
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/basicauthextension v0.80.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/healthcheckextension v0.80.0

receivers:
  - gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.80.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/jaegerreceiver v0.80.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver v0.80.0
  - gomod: splunk.conf/workshop/example/jenkinscireceiver v0.0.0
    path: ./jenkinscireceiver

processors:
  - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.80.0

exporters:
  - gomod: go.opentelemetry.io/collector/exporter/loggingexporter v0.80.0
  - gomod: go.opentelemetry.io/collector/exporter/otlpexporter v0.80.0
  - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.80.0

# This replace is a go directive that allows for redefine
# where to fetch the code to use since the default would be from a remote project.
replaces:
- splunk.conf/workshop/example/jenkinscireceiver => ./jenkinscireceiver

├── build-config.yaml
└── jenkinscireceiver
    ├── go.mod
    ├── config.go
    ├── factory.go
    ├── scraper.go
    └── internal
      └── metadata

これらのファイルを期待される内容でプロジェクトに書き込んだら、go mod tidy を実行します。これにより、すべてのリモート依存関係がフェッチされ、go.mod が更新され、go.sum ファイルが生成されます。

OpenTelemetry Collector 開発

ビジネスロジックの構築

この時点で、現在何もしないカスタムコンポーネントがあるため、Jenkins からこのデータをキャプチャするために必要なロジックを追加する必要があります。

ここから、実行する必要があるステップは以下の通りです

Jenkins に接続するクライアントを作成する
設定されたすべてのジョブをキャプチャする
設定されたジョブの最後のビルドのステータスを報告する
コミットのタイムスタンプとジョブ完了の時間差を計算する

変更は scraper.go に対して行われます。

Jenkins サーバーに接続できるようにするために、“github.com/yosida95/golang-jenkins” パッケージを使用します。このパッケージは、Jenkins サーバーからデータを読み取るために必要な機能を提供します。

次に、“go.opentelemetry.io/collector/receiver/scraperhelper” ライブラリのヘルパー関数を利用して、コンポーネントの起動が完了した後に Jenkins サーバーに接続できるように start 関数を作成します。

package jenkinscireceiver

import (
    "context"

    jenkins "github.com/yosida95/golang-jenkins"
    "go.opentelemetry.io/collector/component"
    "go.opentelemetry.io/collector/pdata/pmetric"
    "go.opentelemetry.io/collector/receiver"
    "go.opentelemetry.io/collector/receiver/scraperhelper"

    "splunk.conf/workshop/example/jenkinscireceiver/internal/metadata"
)

type scraper struct {
    mb     *metadata.MetricsBuilder
    client *jenkins.Jenkins
}

func newScraper(cfg *Config, set receiver.CreateSettings) (scraperhelper.Scraper, error) {
    s := &scraper{
        mb : metadata.NewMetricsBuilder(cfg.MetricsBuilderConfig, set),
    }

    return scraperhelper.NewScraper(
        metadata.Type,
        s.scrape,
        scraperhelper.WithStart(func(ctx context.Context, h component.Host) error {
            client, err := cfg.ToClient(h, set.TelemetrySettings)
            if err != nil {
                return err
            }
            // The collector provides a means of injecting authentication
            // on our behalf, so this will ignore the libraries approach
            // and use the configured http client with authentication.
            s.client = jenkins.NewJenkins(nil, cfg.Endpoint)
            s.client.SetHTTPClient(client)
            return nil
        }),
    )
}

func (s scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
    // To be filled in
    return pmetric.NewMetrics(), nil
}

これで Jenkins レシーバーを初期化するために必要なすべてのセットアップコードが完了しました。

ここからは、入力を待っていた scrape メソッドに焦点を当てます。このメソッドは、設定で構成された間隔（デフォルトでは毎分）で実行されます。

設定されたジョブの数をキャプチャしたい理由は、Jenkins サーバーの成長を確認し、オンボードしたプロジェクトの数を測定できるようにするためです。これを行うために、Jenkins クライアントを呼び出してすべてのジョブをリストし、エラーが報告された場合はメトリクスなしでそれを返し、そうでなければメトリクスビルダーからデータを出力します。

func (s scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
    jobs, err := s.client.GetJobs()
    if err != nil {
        return pmetric.Metrics{}, err
    }

    // Recording the timestamp to ensure
    // all captured data points within this scrape have the same value.
    now := pcommon.NewTimestampFromTime(time.Now())

    // Casting to an int64 to match the expected type
    s.mb.RecordJenkinsJobsCountDataPoint(now, int64(len(jobs)))

    // To be filled in

    return s.mb.Emit(), nil
}

前のステップでは、すべてのジョブをキャプチャし、ジョブの数を報告することができました。このステップでは、各ジョブを調べ、報告された値を使用してメトリクスをキャプチャします。

func (s scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
    jobs, err := s.client.GetJobs()
    if err != nil {
        return pmetric.Metrics{}, err
    }

    // Recording the timestamp to ensure
    // all captured data points within this scrape have the same value.
    now := pcommon.NewTimestampFromTime(time.Now())

    // Casting to an int64 to match the expected type
    s.mb.RecordJenkinsJobsCountDataPoint(now, int64(len(jobs)))

    for _, job := range jobs {
        // Ensure we have valid results to start off with
        var (
            build  = job.LastCompletedBuild
            status = metadata.AttributeJobStatusUnknown
        )

        // This will check the result of the job, however,
        // since the only defined attributes are
        // `success`, `failure`, and `unknown`.
        // it is assume that anything did not finish
        // with a success or failure to be an unknown status.

        switch build.Result {
        case "aborted", "not_built", "unstable":
            status = metadata.AttributeJobStatusUnknown
        case "success":
            status = metadata.AttributeJobStatusSuccess
        case "failure":
            status = metadata.AttributeJobStatusFailed
        }

        s.mb.RecordJenkinsJobDurationDataPoint(
            now,
            int64(job.LastCompletedBuild.Duration),
            job.Name,
            status,
        )
    }

    return s.mb.Emit(), nil
}

最後のステップは、コミットからジョブ完了までにかかった時間を計算し、DORA メトリクスを推測するのに役立てることです。

func (s scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
    jobs, err := s.client.GetJobs()
    if err != nil {
        return pmetric.Metrics{}, err
    }

    // Recording the timestamp to ensure
    // all captured data points within this scrape have the same value.
    now := pcommon.NewTimestampFromTime(time.Now())

    // Casting to an int64 to match the expected type
    s.mb.RecordJenkinsJobsCountDataPoint(now, int64(len(jobs)))

    for _, job := range jobs {
        // Ensure we have valid results to start off with
        var (
            build  = job.LastCompletedBuild
            status = metadata.AttributeJobStatusUnknown
        )

        // Previous step here

        // Ensure that the `ChangeSet` has values
        // set so there is a valid value for us to reference
        if len(build.ChangeSet.Items) == 0 {
            continue
        }

        // Making the assumption that the first changeset
        // item is the most recent change.
        change := build.ChangeSet.Items[0]

        // Record the difference from the build time
        // compared against the change timestamp.
        s.mb.RecordJenkinsJobCommitDeltaDataPoint(
            now,
            int64(build.Timestamp-change.Timestamp),
            job.Name,
            status,
        )
    }

    return s.mb.Emit(), nil
}

これらのステップがすべて完了すると、カスタム Jenkins CI レシーバーの構築が完了です！

次のステップ

コンポーネントから欲しい機能がおそらくもっとあるでしょう。例えば

ジョブが使用したブランチ名を含めることはできますか？
ジョブのプロジェクト名を含めることはできますか？
プロジェクトの累積ジョブ期間をどのように計算しますか？
変更が機能することをどのように検証しますか？

この時間を使って、遊んでみたり、壊してみたり、変更したり、ビルドからログをキャプチャしてみたりしてください。

Advanced OpenTelemetry Collector

75 minutes Authors Robert Castley, Charity Anderson, Pieter Hagen, & Geoff Higginbottom

このワークショップの目的は、OpenTelemetry Collector の設定ファイルを作成・変更する際の自信を深めることです。最小限の agent.yaml と gateway.yaml ファイルから始め、いくつかの高度な実際のシナリオに対応できるよう段階的に構築していきます。

このワークショップの重要なポイントは、テレメトリーデータをサードパーティベンダーのバックエンドに送信するのではなく、ローカルに保存するよう OpenTelemetry Collector を設定する方法を学ぶことです。このアプローチはデバッグやトラブルシューティングを簡素化するだけでなく、本番システムへのデータ送信を避けたいテストや開発環境にも最適です。

このワークショップを最大限に活用するために、以下の知識が必要です

OpenTelemetry Collector とその設定ファイル構造の基本的な理解
YAML ファイルの編集スキル

このワークショップのすべての内容はローカルで実行できるよう設計されており、実践的でアクセスしやすい学習体験を提供します。それでは、構築を始めましょう！

ワークショップの概要

このワークショップでは、以下のトピックを取り上げます

Agent と Gateway をローカルでセットアップ: メトリクス、トレース、ログが Agent 経由で Gateway に送られることをテストします。
Agent の耐障害性を強化: フォールトトレランスのための基本設定を行います。
Processor の設定:
- 特定のスパン（例：ヘルスチェック）をドロップしてノイズをフィルタリングします。
- 不要なタグを削除し、機密データを処理します。
- エクスポート前にパイプラインで OTTL（OpenTelemetry Transformation Language）を使用してデータを変換します。
Connector の設定:
- 受信した値に基づいて、データを異なるエンドポイントにルーティングします。

このワークショップを終了すると、さまざまな実際のユースケースに対応する OpenTelemetry Collector の設定に精通しているでしょう。

前提条件

5 minutes

前提条件

vi、vim、nano、またはお好みのテキストエディタを使用して YAML ファイルを編集するスキル
サポートされている環境
- 提供される Splunk Workshop インスタンス（推奨）。ssh アクセス用にポート 2222 への外部アクセスが必要です。
- Apple Mac（Apple Silicon）。jq のインストールが必要です - https://jqlang.org/download/

Exercise

ディレクトリの作成: 環境内で新しいディレクトリを作成し、そのディレクトリに移動します

mkdir advanced-otel-workshop && \
cd advanced-otel-workshop

このワークショップの残りの部分では、このディレクトリを [WORKSHOP] と呼びます。

既存の OpenTelemetry Collector を削除してください

Splunk IM ワークショップを完了している場合は、続行する前に Kubernetes で実行中の Collector を削除してください。以下のコマンドを実行して削除できます

helm delete splunk-otel-collector

その場合、EC2 インスタンスでこのワークショップと干渉する可能性のあるサービスが実行されている場合があるため、以下のコマンドを実行してそれらが存在する場合は停止してください

kubectl delete ~/workshop/apm/deployment.yaml

ワークショップバイナリのダウンロード: [WORKSHOP] ディレクトリに移動し、OpenTelemetry Collector、Load Generator バイナリ、およびセットアップスクリプトをダウンロードします

curl -L https://github.com/signalfx/splunk-otel-collector/releases/download/v0.136.0/otelcol_linux_amd64 -o otelcol && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/loadgen/build/loadgen-linux-amd64 -o loadgen && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/setup-workshop.sh -o setup-workshop.sh && \
chmod +x setup-workshop.sh

curl -L https://github.com/signalfx/splunk-otel-collector/releases/download/v0.136.0/otelcol_darwin_arm64 -o otelcol && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/loadgen/build/loadgen-darwin-arm64 -o loadgen && \
curl -L https://github.com/splunk/observability-workshop/raw/refs/heads/main/workshop/ninja/advanced-otel/setup-workshop.sh -o setup-workshop.sh && \
chmod +x setup-workshop.sh

setup-workshop.sh スクリプトを実行します。このスクリプトは正しい権限を設定し、Agent と Gateway の初期設定も作成します

./setup-workshop.sh

███████╗██████╗ ██╗     ██╗   ██╗███╗   ██╗██╗  ██╗    ██╗
██╔════╝██╔══██╗██║     ██║   ██║████╗  ██║██║ ██╔╝    ╚██╗
███████╗██████╔╝██║     ██║   ██║██╔██╗ ██║█████╔╝      ╚██╗
╚════██║██╔═══╝ ██║     ██║   ██║██║╚██╗██║██╔═██╗      ██╔╝
███████║██║     ███████╗╚██████╔╝██║ ╚████║██║  ██╗    ██╔╝
╚══════╝╚═╝     ╚══════╝ ╚═════╝ ╚═╝  ╚═══╝╚═╝  ╚═╝    ╚═╝

Welcome to the Splunk Advanced OpenTelemetry Workshop!
======================================================

macOS detected. Removing quarantine attributes...
otelcol version v0.126.0
Usage: loadgen [OPTIONS]
Options:
  -base       Send base traces (enabled by default)
  -health     Send health traces
  -security   Send security traces
  -logs       Enable logging of random quotes to quotes.log
  -json       Output logs in JSON format (only applicable with -logs)
  -count      Number of traces or logs to send (default: infinite)
  -h, --help  Display this help message

Example:
  loadgen -health -security -count 10   Send 10 health and security traces
  loadgen -logs -json -count 5          Write 5 random quotes in JSON format to quotes.log
Creating workshop directories...
✓ Created subdirectories:
  ├── 1-agent-gateway
  ├── 2-building-resilience
  ├── 3-dropping-spans
  ├── 4-sensitive-data
  ├── 5-transform-data
  ├── 6-routing-data
  └── 7-sum-count

Creating configuration files for 1-agent-gateway...
Creating OpenTelemetry Collector agent configuration file: 1-agent-gateway/agent.yaml
✓ Configuration file created successfully: 1-agent-gateway/agent.yaml
✓ File size:     4355 bytes

Creating OpenTelemetry Collector gateway configuration file: 1-agent-gateway/gateway.yaml
✓ Configuration file created successfully: 1-agent-gateway/gateway.yaml
✓ File size:     3376 bytes

✓ Completed configuration files for 1-agent-gateway

Creating configuration files for 2-building-resilience...
Creating OpenTelemetry Collector agent configuration file: 2-building-resilience/agent.yaml
✓ Configuration file created successfully: 2-building-resilience/agent.yaml
✓ File size:     4355 bytes

Creating OpenTelemetry Collector gateway configuration file: 2-building-resilience/gateway.yaml
✓ Configuration file created successfully: 2-building-resilience/gateway.yaml
✓ File size:     3376 bytes

✓ Completed configuration files for 2-building-resilience

Workshop environment setup complete!
Configuration files created in the following directories:
  1-agent-gateway/
    ├── agent.yaml
    └── gateway.yaml
  2-building-resilience/
    ├── agent.yaml
    └── gateway.yaml

[WORKSHOP]
├── 1-agent-gateway
├── 2-building-resilience
├── 3-dropping-spans
├── 4-sensitive-data
├── 5-transform-data
├── 6-routing-data
├── 7-sum-count
├── loadgen
├── otelcol
└── setup-workshop.sh

1. Agent 設定の確認

15 minutes

ようこそ！このセクションでは、Agent と Gateway の両方を含む完全に機能する OpenTelemetry セットアップから始めます。

まず、設定ファイルを簡単に確認して、全体的な構造に慣れ、テレメトリーパイプラインを制御する重要なセクションを確認します。

Tip

ワークショップを通じて、複数のターミナルウィンドウを使用します。整理しやすくするために、各ターミナルに固有の名前または色を付けてください。これにより、演習中にターミナルを簡単に識別して切り替えることができます。

これらのターミナルを Agent、Gateway、Loadgen、Test と呼びます。

Exercise

最初のターミナルウィンドウを作成し、Agent と名前を付けます。最初の演習用ディレクトリ [WORKSHOP]/1-agent-gateway に移動し、必要なファイルが生成されていることを確認します
```
cd 1-agent-gateway
ls -l
```
ディレクトリに以下のファイルが表示されるはずです。表示されない場合は、前提条件 セクションで説明されている setup-workshop.sh スクリプトを再実行してください
. ├── agent.yaml └── gateway.yaml

Agent 設定の理解

このワークショップで使用する agent.yaml ファイルの主要なコンポーネントを確認しましょう。メトリクス、トレース、ログをサポートするために重要な追加が行われています。

Receiver

receivers セクションは、Agent がテレメトリーデータを取り込む方法を定義します。このセットアップでは、3種類の Receiver が設定されています

Host Metrics Receiver

hostmetrics:                         # Host Metrics Receiver
  collection_interval: 3600s         # Collection Interval (1hr)
  scrapers:
    cpu:                             # CPU Scraper

ローカルシステムから1時間ごとに CPU 使用率を収集します。これを使用してサンプルメトリクスデータを生成します。

OTLP Receiver（HTTP プロトコル）

otlp:                                # OTLP Receiver
  protocols:
    http:                            # Configure HTTP protocol
      endpoint: "0.0.0.0:4318"       # Endpoint to bind to

Agent がポート 4318 で HTTP 経由でメトリクス、トレース、ログを受信できるようにします。これは、今後の演習で Collector にデータを送信するために使用されます。

FileLog Receiver

filelog/quotes:                      # Receiver Type/Name
  include: ./quotes.log              # The file to read log data from
  include_file_path: true            # Include file path in the log data
  include_file_name: false           # Exclude file name from the log data
  resource:                          # Add custom resource attributes
    com.splunk.source: ./quotes.log  # Source of the log data
    com.splunk.sourcetype: quotes    # Source type of the log data

Agent がローカルログファイル（quotes.log）を tail し、source や sourceType などのメタデータで強化された構造化ログイベントに変換できるようにします。

Exporter

Debug Exporter

  debug:                               # Exporter Type
    verbosity: detailed                # Enabled detailed debug output

OTLPHTTP Exporter
```
  otlphttp:                            # Exporter Type
    endpoint: "http://localhost:5318"  # Gateway OTLP endpoint
```
debug Exporter はワークショップ中の可視性とデバッグのためにデータをコンソールに送信し、otlphttp Exporter はすべてのテレメトリーをローカルの Gateway インスタンスに転送します。
このデュアルエクスポート戦略により、生データをローカルで確認しながら、さらなる処理とエクスポートのためにダウンストリームに送信することができます。

1.1 Gateway 設定の確認

OpenTelemetry Gateway は、テレメトリーデータの受信、処理、エクスポートのための中央ハブとして機能します。テレメトリーソース（アプリケーションやサービスなど）と Splunk Observability Cloud のようなオブザーバビリティバックエンドの間に位置します。

テレメトリートラフィックを集中化することで、Gateway はデータのフィルタリング、エンリッチメント、変換、および1つ以上の宛先へのルーティングなどの高度な機能を実現します。個々のサービスからテレメトリー処理をオフロードすることで負担を軽減し、分散システム全体で一貫した標準化されたデータを確保します。

これにより、オブザーバビリティパイプラインの管理、スケーリング、分析が容易になります。特に複雑なマルチサービス環境では効果的です。

Exercise

2つ目のターミナルウィンドウを開くか作成し、Gateway と名前を付けます。最初の演習ディレクトリ [WORKSHOP]/1-agent-gateway に移動し、gateway.yaml ファイルの内容を確認します。

このファイルは、Gateway モードでデプロイされた OpenTelemetry Collector のコア構造を示しています。

Gateway 設定の理解

このワークショップで Gateway モードの OpenTelemetry Collector がどのように設定されているかを定義する gateway.yaml ファイルを確認しましょう。この Gateway は、Agent からテレメトリーを受信し、処理してから検査または転送のためにエクスポートする役割を担います。

OTLP Receiver（カスタムポート）
```
receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:5318"
```
ポート 5318 は Agent 設定の otlphttp Exporter と一致しており、Agent が送信するすべてのテレメトリーデータが Gateway で受け入れられることを保証します。

メモ

このポートの分離により、競合を回避し、Agent と Gateway の役割間の責任を明確に保ちます。

File Exporter

Gateway は3つの File Exporter を使用して、テレメトリーデータをローカルファイルに出力します。これらの Exporter は以下のように定義されています

exporters:                        # List of exporters
  debug:                          # Debug exporter
    verbosity: detailed           # Enable detailed debug output
  file/traces:                    # Exporter Type/Name
    path: "./gateway-traces.out"  # Path for OTLP JSON output for traces
    append: false                 # Overwrite the file each time
  file/metrics:                   # Exporter Type/Name
    path: "./gateway-metrics.out" # Path for OTLP JSON output for metrics
    append: false                 # Overwrite the file each time
  file/logs:                      # Exporter Type/Name
    path: "./gateway-logs.out"    # Path for OTLP JSON output for logs
    append: false                 # Overwrite the file each time

各 Exporter は、特定のシグナルタイプを対応するファイルに書き込みます。

これらのファイルは Gateway が起動すると作成され、Agent がデータを送信すると実際のテレメトリーが書き込まれます。これらのファイルをリアルタイムで監視して、パイプラインを通過するテレメトリーの流れを観察できます。

1.2 設定の検証とテスト

次に、Gateway と Agent を起動します。Agent は起動時に自動的に Host Metrics を送信するよう設定されています。これにより、データが Agent から Gateway に正しくルーティングされることを確認します。

Exercise

Gateway: Gateway ターミナル ウィンドウで、以下のコマンドを実行して Gateway を起動します

../otelcol --config=gateway.yaml

すべてが正しく設定されている場合、Collector が起動し、出力に Everything is ready. Begin running and processing data. と表示されます。以下のような出力になります

2025-06-09T09:22:11.944+0100    info    service@v0.126.0/service.go:289 Everything is ready. Begin running and processing data. {"resource": {}}

Gateway が実行されると、ポート 5318 で受信データをリッスンし、受信したデータを以下のファイルにエクスポートします

gateway-traces.out
gateway-metrics.out
gateway-logs.out

Agent の起動: Agent ターミナル ウィンドウで、Agent 設定を使用して Agent を起動します

../otelcol --config=agent.yaml

CPU メトリクスの確認:

Agent が起動すると、すぐに CPU メトリクスの送信を開始することを確認します。
Agent と Gateway の両方がデバッグ出力にこのアクティビティを表示します。出力は以下のスニペットのようになります

<snip>
NumberDataPoints #31
Data point attributes:
     -> cpu: Str(cpu3)
     -> state: Str(wait)
StartTimestamp: 2025-07-07 16:49:42 +0000 UTC
Timestamp: 2025-07-09 09:36:21.190226459 +0000 UTC
Value: 77.380000
        {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics"}

この段階で、Agent は1時間ごとまたは再起動ごとに CPU メトリクスを収集し、Gateway に送信し続けます。Gateway はこれらのメトリクスを処理し、gateway-metrics.out という名前のファイルにエクスポートします。このファイルは、パイプラインサービスの一部としてエクスポートされたメトリクスを保存します。

Gateway にデータが到着したことの確認: CPU メトリクス（特に cpu0）が Gateway に正常に到達したことを確認するために、jq コマンドを使用して gateway-metrics.out ファイルを検査します。

以下のコマンドは、system.cpu.time メトリクスをフィルタリングして抽出し、cpu0 に焦点を当てます。メトリクスの状態（例：user、system、idle、interrupt）と対応する値を表示します。

3つ目のターミナルウィンドウを開くか作成し、Tests と名前を付けます。Tests ターミナル で以下のコマンドを実行して system.cpu.time メトリクスを確認します

jq '.resourceMetrics[].scopeMetrics[].metrics[] | select(.name == "system.cpu.time") | .sum.dataPoints[] | select(.attributes[0].value.stringValue == "cpu0") | {cpu: .attributes[0].value.stringValue, state: .attributes[1].value.stringValue, value: .asDouble}' gateway-metrics.out

{
  "cpu": "cpu0",
  "state": "user",
  "value": 123407.02
}
{
  "cpu": "cpu0",
  "state": "system",
  "value": 64866.6
}
{
  "cpu": "cpu0",
  "state": "idle",
  "value": 216427.87
}
{
  "cpu": "cpu0",
  "state": "interrupt",
  "value": 0
}

重要

Agent と Gateway のプロセスを、それぞれのターミナルで Ctrl-C を押して停止してください。

2. 耐障害性の構築

10 minutes

OpenTelemetry Collector の FileStorage Extension は、より耐障害性の高いテレメトリーパイプラインを構築するための重要なコンポーネントです。これにより、Collector は処理中のデータを確実にチェックポイントし、リトライを効率的に管理し、貴重なテレメトリーを失うことなく一時的な障害を適切に処理できます。

FileStorage を有効にすると、Collector は中間状態をディスクに永続化できるため、ネットワークの中断、バックエンドの停止、または Collector の再起動時にもトレース、メトリクス、ログが失われないことが保証されます。つまり、ネットワーク接続が切断されたり、バックエンドが一時的に利用できなくなったりしても、Collector はテレメトリーの受信とバッファリングを継続し、接続が復旧すると配信をシームレスに再開します。

FileStorage Extension をパイプラインに統合することで、オブザーバビリティスタックの耐久性を強化し、接続が不安定な環境でも高品質なテレメトリー取り込みを維持できます。

メモ

このソリューションは、接続ダウンタイムが短い場合（最大15分）のメトリクスに対して機能します。ダウンタイムがこれを超えると、Splunk Observability Cloud はデータポイントの順序が乱れないようにデータをドロップする可能性があります。

ログについては、今後の Splunk OpenTelemetry Collector リリースで完全なエンタープライズ対応ソリューションを実装する計画があります。

2.1 File Storage の設定

この演習では、agent.yaml ファイルの extensions: セクションを更新します。このセクションは OpenTelemetry 設定 YAML の一部であり、OpenTelemetry Collector の動作を拡張または変更するオプションのコンポーネントを定義します。

これらのコンポーネントはテレメトリーデータを直接処理しませんが、Collector の機能を向上させる貴重な機能とサービスを提供します。

Exercise

重要

すべての ターミナルウィンドウを 2-building-resilience ディレクトリに移動し、clear コマンドを実行してください。

ディレクトリ構造は以下のようになります

.
├── agent.yaml
└── gateway.yaml

agent.yaml の更新: Agent ターミナル ウィンドウで、既存の health_check Extension の下に file_storage Extension を追加します

  file_storage/checkpoint:             # Extension Type/Name
    directory: "./checkpoint-dir"      # Define directory
    create_directory: true             # Create directory
    timeout: 1s                        # Timeout for file operations
    compaction:                        # Compaction settings
      on_start: true                   # Start compaction at Collector startup
      # Define compaction directory
      directory: "./checkpoint-dir/tmp"
      max_transaction_size: 65536      # Max. size limit before compaction occurs

Exporter への file_storage の追加: otlphttp Exporter を変更して、リトライとキューイングメカニズムを設定し、障害が発生した場合にデータが保持され再送信されるようにします。endpoint: "http://localhost:5318" の下に以下を追加し、インデントが endpoint と一致していることを確認してください

    retry_on_failure:
      enabled: true                    # Enable retry on failure
    sending_queue:                     #
      enabled: true                    # Enable sending queue
      num_consumers: 10                # No. of consumers
      queue_size: 10000                # Max. queue size
      storage: file_storage/checkpoint # File storage extension

services セクションの更新: 既存の extensions: セクションに file_storage/checkpoint Extension を追加します。設定は以下のようになります

service:
  extensions:
  - health_check
  - file_storage/checkpoint            # Enabled extensions for this collector

metrics パイプラインの更新: この演習では、デバッグとログのノイズを減らすために、Metric パイプラインから hostmetrics Receiver をコメントアウトします。設定は以下のようになります

    metrics:
      receivers:
      # - hostmetrics                    # Hostmetric reciever (cpu only)
      - otlp

otelbin.io を使用して Agent 設定を検証してください。参考までに、パイプラインの metrics: セクションは以下のようになります

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
      REC1(&nbsp;&nbsp;otlp&nbsp;&nbsp;<br>fa:fa-download):::receiver
      PRO1(memory_limiter<br>fa:fa-microchip):::processor
      PRO2(resourcedetection<br>fa:fa-microchip):::processor
      PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
      EXP1(&ensp;debug&ensp;<br>fa:fa-upload):::exporter
      EXP2(otlphttp<br>fa:fa-upload):::exporter
      EXP3(&ensp;file&ensp;<br>fa:fa-upload):::exporter
    %% Links
    subID1:::sub-metrics
    subgraph " "
      subgraph subID1[**Metrics**]
      direction LR
      REC1 --> PRO1
      PRO1 --> PRO2
      PRO2 --> PRO3
      PRO3 --> EXP1
      PRO3 --> EXP3
      PRO3 --> EXP2
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-metrics stroke:#38bdf8,stroke-width:1px, color:#38bdf8,stroke-dasharray: 3 3;

2.2 耐障害性テスト用の環境セットアップ

次に、File Storage 設定をテストする準備として環境を設定します。

Exercise

Gateway の起動: Gateway ターミナル ウィンドウで以下を実行します

../otelcol --config=gateway.yaml

Agent の起動: Agent ターミナル ウィンドウで以下を実行します

../otelcol --config=agent.yaml

5つのテストスパンを送信: Loadgen ターミナル ウィンドウで以下を実行します

../loadgen -count 5

Agent と Gateway の両方がデバッグログを表示し、Gateway が ./gateway-traces.out ファイルを作成するはずです。

すべてが正常に機能している場合、システムの耐障害性のテストに進むことができます。

2.3 障害のシミュレーション

Agent の耐障害性を評価するために、一時的な Gateway の停止をシミュレートし、Agent がそれをどのように処理するかを観察します

Exercise

ネットワーク障害のシミュレーション: Gateway ターミナル で Ctrl-C を使用して Gateway を停止し、Gateway のコンソールが停止したことを示すまで待ちます。Agent は引き続き実行されますが、Gateway にデータを送信できません。Gateway ターミナル の出力は以下のようになります

2025-07-09T10:22:37.941Z        info    service@v0.126.0/service.go:345 Shutdown complete.      {"resource": {}}

トレースの送信: Loadgen ターミナル ウィンドウで、loadgen を使用してさらに5つのトレースを送信します。

../loadgen -count 5

Agent のリトライメカニズムが有効になり、データを再送信しようと継続的に試みていることに注目してください。Agent のコンソール出力には、以下のようなメッセージが繰り返し表示されます

2025-01-28T14:22:47.020+0100  info  internal/retry_sender.go:126  Exporting failed. Will retry the request after interval.  {"kind": "exporter", "data_type": "traces", "name": "otlphttp", "error": "failed to make an HTTP request: Post \"http://localhost:5318/v1/traces\": dial tcp 127.0.0.1:5318: connect: connection refused", "interval": "9.471474933s"}

Agent の停止: Agent ターミナル ウィンドウで、Ctrl-C を使用して Agent を停止します。Agent のコンソールが停止を確認するまで待ちます

2025-07-09T10:25:59.344Z        info    service@v0.126.0/service.go:345 Shutdown complete.      {"resource": {}}

重要

Agent を停止すると、リトライ用にメモリに保持されているメトリクス、トレース、ログは失われます。ただし、FileStorage Extension を設定しているため、ターゲットエンドポイントでまだ受け入れられていないすべてのテレメトリーはディスクに安全にチェックポイントされています。

Agent の停止は、Agent が再起動されたときにシステムがどのように復旧するかを明確に示すための重要なステップです。

2.4 復旧

この演習では、Gateway Collector を再起動することで、OpenTelemetry Collector がネットワーク障害からどのように復旧するかをテストします。Gateway が再び利用可能になると、Agent は最後にチェックポイントされた状態からデータの送信を再開し、データ損失がないことを保証します。

Exercise

Gateway の再起動: Gateway ターミナル ウィンドウで以下を実行します

../otelcol --config=gateway.yaml

Agent の再起動: Agent ターミナル ウィンドウで以下を実行します

../otelcol --config=agent.yaml

Agent が起動して実行されると、File_Storage Extension がチェックポイントフォルダー内のバッファされたデータを検出します。最後のチェックポイントフォルダーから保存されたスパンをデキューし始め、データが失われないことを保証します。

Agent デバッグ出力の確認: Agent のデバッグ出力は変化せず、以下の行を表示し続け、新しいデータがエクスポートされていないことを示していることに注意してください

2025-07-11T08:31:58.176Z        info    service@v0.126.0/service.go:289 Everything is ready. Begin running and processing data.   {"resource": {}}

Gateway デバッグ出力の確認 Gateway のデバッグ画面から、以前見逃されていたトレースを追加のアクションなしで受信し始めていることが確認できるはずです。例

Attributes:
   -> user.name: Str(Luke Skywalker)
   -> user.phone_number: Str(+1555-867-5309)
   -> user.email: Str(george@deathstar.email)
   -> user.password: Str(LOTR>StarWars1-2-3)
   -> user.visa: Str(4111 1111 1111 1111)
   -> user.amex: Str(3782 822463 10005)
   -> user.mastercard: Str(5555 5555 5555 4444)
   -> payment.amount: Double(75.75)
      {"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "traces"}

gateway-traces.out ファイルの確認: jq を使用して、再作成された gateway-traces.out 内のトレース数をカウントします。Gateway がダウンしていたときに送信した数と一致するはずです。

jq '.resourceSpans | length | "\(.) resourceSpans found"' gateway-traces.out

"5 resourceSpans found"

重要

Agent と Gateway のプロセスを、それぞれのターミナルで Ctrl-C を押して停止してください。

まとめ

この演習では、file_storage Extension の設定、otlp Exporter のリトライメカニズムの有効化、および一時的なデータ保存用のファイルベースのキューの使用により、OpenTelemetry Collector の耐障害性を強化する方法を示しました。

ファイルベースのチェックポイントとキューの永続化を実装することで、テレメトリーパイプラインが一時的な中断から適切に復旧できることを保証し、本番環境でより堅牢で信頼性の高いものにします。

3. Spanのドロップ

5 minutes

このセクションでは、Filter Processor を使用して、特定の条件に基づいてSpanを選択的にドロップする方法を説明します。

具体的には、Span名に基づいてトレースをドロップします。これは、ヘルスチェックや内部通信トレースなどの不要なSpanをフィルタリングするためによく使用されます。今回は、ヘルスチェックリクエストに関連付けられることが多く、通常は非常に「ノイジー」な "/_healthz" を含むSpanをフィルタリングします。

Exercise

重要

すべてのターミナルウィンドウを 3-dropping-spans ディレクトリに移動し、clear コマンドを実行してください。

2-building-resilience ディレクトリから *.yaml を 3-dropping-spans にコピーします。更新後のディレクトリ構造は次のようになります

.
├── agent.yaml
└── gateway.yaml

次に、filter processor と対応するパイプラインを設定します。

3.1 設定

Exercise

Gateway terminal ウィンドウに切り替えて、gateway.yaml ファイルを開きます。以下の設定で processors セクションを更新します

filter プロセッサを追加する /_healthz という名前のSpanを除外するようにGatewayを設定します。error_mode: ignore ディレクティブは、フィルタリング中に発生したエラーを無視し、パイプラインがスムーズに動作し続けることを保証します。traces セクションはフィルタリングルールを定義し、/_healthz という名前のSpanを除外対象として指定します。
```
  filter/health:                       # Defines a filter processor
    error_mode: ignore                 # Ignore errors
    traces:                            # Filtering rules for traces
      span:                            # Exclude spans named "/_healthz"
       - 'name == "/_healthz"'
```
traces パイプラインに filter プロセッサを追加する traces パイプラインに filter/health プロセッサを追加します。最適なパフォーマンスを得るために、フィルターはできるだけ早い段階に配置します。memory_limiter の直後、batch プロセッサの前に配置してください。設定は次のようになります
```
  traces:
    receivers:
      - otlp
    processors:
      - memory_limiter
      - filter/health             # Filters data based on rules
      - resource/add_mode
      - batch
    exporters:
      - debug
      - file/traces
```

この設定により、ヘルスチェック関連のSpan（/_healthz）がパイプラインの早い段階でフィルタリングされ、テレメトリーデータの不要なノイズが削減されます。

otelbin.io を使用してAgent設定を検証します。参考として、パイプラインの traces: セクションは次のようになります

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
      REC1(&nbsp;&nbsp;otlp&nbsp;&nbsp;<br>fa:fa-download):::receiver
      PRO1(memory_limiter<br>fa:fa-microchip):::processor
      PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
      PRO4(filter<br>fa:fa-microchip<br>health):::processor
      PRO5(batch<br>fa:fa-microchip):::processor
      EXP1(&ensp;debug&ensp;<br>fa:fa-upload):::exporter
      EXP2(&ensp;&ensp;file&ensp;&ensp;<br>fa:fa-upload<br>traces):::exporter
    %% Links
    subID1:::sub-traces
    subgraph " "
      subgraph subID1[**Traces**]
      direction LR
      REC1 --> PRO1
      PRO1 --> PRO4
      PRO4 --> PRO3
      PRO3 --> PRO5
      PRO5 --> EXP1
      PRO5 --> EXP2
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-traces stroke:#fbbf24,stroke-width:1px, color:#fbbf24,stroke-dasharray: 3 3;

3.2 Filter Processorのテスト

設定をテストするには、"/_healthz" という名前のSpanを含むトレースデータを生成する必要があります。

Exercise

Gatewayを起動する：Gateway terminal ウィンドウで Gateway を起動します。

../otelcol --config ./gateway.yaml

Agentを起動する：Agent terminal ウィンドウで Agent を起動します。

../otelcol --config ./agent.yaml

Loadgenを起動する：Loadgen terminal ウィンドウで、次のコマンドを実行してヘルスチェックSpanを有効にした状態でロードジェネレーターを起動します

../loadgen -health -count 5

Agent terminal のデバッグ出力に _healthz Spanが表示されます

InstrumentationScope healthz 1.0.0
Span #0
   Trace ID       : 0cce8759b5921c8f40b346b2f6e2f4b6
   Parent ID      :
   ID             : bc32bd0e4ddcb174
   Name           : /_healthz
   Kind           : Server
   Start time     : 2025-07-11 08:47:50.938703979 +0000 UTC
   End time       : 2025-07-11 08:47:51.938704091 +0000 UTC
   Status code    : Ok
   Status message : Success

これらは、先ほど設定したFilter Processorによってドロップされるため、Gateway のデバッグ出力には表示されません。

agent.out を確認する：Test terminal で jq を使用して、Agent が受信したSpanの名前を確認します

jq -c '.resourceSpans[].scopeSpans[].spans[] | "Span \(input_line_number) found with name \(.name)"' ./agent.out

"Span 1 found with name /movie-validator"
"Span 2 found with name /_healthz"
"Span 3 found with name /movie-validator"
"Span 4 found with name /_healthz"
"Span 5 found with name /movie-validator"
"Span 6 found with name /_healthz"
"Span 7 found with name /movie-validator"
"Span 8 found with name /_healthz"
"Span 9 found with name /movie-validator"
"Span 10 found with name /_healthz"

Gateway のデバッグ出力を確認する：jq を使用して、Gateway が受信したSpanの名前を確認します

jq -c '.resourceSpans[].scopeSpans[].spans[] | "Span \(input_line_number) found with name \(.name)"' ./gateway-traces.out

gateway-metrics.out ファイルには /_healthz という名前のSpanは含まれません。

"Span 1 found with name /movie-validator"
"Span 2 found with name /movie-validator"
"Span 3 found with name /movie-validator"
"Span 4 found with name /movie-validator"
"Span 5 found with name /movie-validator"

Tip

Filter Processorで最適なパフォーマンスを確保するには、受信データの形式を十分に理解し、設定を厳密にテストしてください。できるだけ具体的なフィルタリング条件を使用して、重要なデータを誤ってドロップするリスクを最小限に抑えてください。

この設定は、さまざまな属性、タグ、またはカスタム条件に基づいてSpanをフィルタリングするように拡張でき、特定のオブザーバビリティ要件に合わせて OpenTelemetry Collector の柔軟性と効率を向上させることができます。

重要

Agent と Gateway プロセスを、それぞれのターミナルで Ctrl-C を押して停止してください。

4. 機密データの秘匿化

10 minutes

このセクションでは、OpenTelemetry Collector を設定して、テレメトリーSpanから特定のタグを削除し、機密データを秘匿化する方法を学びます。これは、クレジットカード番号、個人データ、その他のセキュリティ関連の詳細など、処理またはエクスポートする前に匿名化する必要がある機密情報を保護するために重要です。

OpenTelemetry Collector の主要なプロセッサの設定について説明します

Attributes Processor：特定のSpan属性を変更または削除します。
Redaction Processor：機密データが保存または送信される前にサニタイズされることを保証します。

Exercise

重要

すべてのターミナルウィンドウを 4-sensitive-data ディレクトリに移動し、clear コマンドを実行してください。

3-dropping-spans ディレクトリから *.yaml を 4-sensitive-data にコピーします。更新後のディレクトリ構造は次のようになります

.
├── agent.yaml
└── gateway.yaml

4.1 設定

このステップでは、agent.yaml を修正して attributes と redaction プロセッサを追加します。これらのプロセッサは、Span属性内の機密データがログに記録またはエクスポートされる前に適切に処理されるようにします。

以前、コンソールに表示されたSpan属性の一部に個人情報や機密データが含まれていることに気づいたかもしれません。これから、この情報を効果的にフィルタリングおよび秘匿化するために必要なプロセッサを設定します。

Attributes:
     -> user.name: Str(George Lucas)
     -> user.phone_number: Str(+1555-867-5309)
     -> user.email: Str(george@deathstar.email)
     -> user.account_password: Str(LOTR>StarWars1-2-3)
     -> user.visa: Str(4111 1111 1111 1111)
     -> user.amex: Str(3782 822463 10005)
     -> user.mastercard: Str(5555 5555 5555 4444)
  {"kind": "exporter", "data_type": "traces", "name": "debug"}

Exercise

Agent terminal ウィンドウに切り替えて、エディタで agent.yaml ファイルを開きます。テレメトリーデータのセキュリティとプライバシーを強化するために、2つのプロセッサを追加します。

1. attributes プロセッサを追加する：Attributes Processor を使用すると、Span属性（タグ）の値を更新、削除、またはハッシュ化して変更できます。これは、機密情報をエクスポートする前に難読化する場合に特に便利です。

このステップでは

user.phone_number 属性を静的な値（"UNKNOWN NUMBER"）に更新します。
user.email 属性をハッシュ化して、元のメールアドレスが公開されないようにします。
user.password 属性を削除して、Spanから完全に取り除きます。

  attributes/update:
    actions:                           # Actions
      - key: user.phone_number         # Target key
        action: update                 # Update action
        value: "UNKNOWN NUMBER"        # New value
      - key: user.email                # Target key
        action: hash                   # Hash the email value
      - key: user.password             # Target key
        action: delete                 # Delete the password

2. redaction プロセッサを追加する：Redaction Processor は、クレジットカード番号やその他の個人識別情報（PII）などの定義済みパターンに基づいて、Span属性内の機密データを検出して秘匿化します。

このステップでは

すべての属性が処理されるように allow_all_keys: true を設定します（false に設定すると、明示的に許可されたキーのみが保持されます）。
Visa と MasterCard のクレジットカード番号を検出して秘匿化するための正規表現を blocked_values で定義します。
summary: debug オプションは、デバッグ目的で秘匿化プロセスに関する詳細情報をログに記録します。

  redaction/redact:
    allow_all_keys: true               # If false, only allowed keys will be retained
    blocked_values:                    # List of regex patterns to block
      - '\b4[0-9]{3}[\s-]?[0-9]{4}[\s-]?[0-9]{4}[\s-]?[0-9]{4}\b'       # Visa
      - '\b5[1-5][0-9]{2}[\s-]?[0-9]{4}[\s-]?[0-9]{4}[\s-]?[0-9]{4}\b'  # MasterCard
    summary: debug                     # Show debug details about redaction

traces パイプラインを更新する：両方のプロセッサを traces パイプラインに統合します。最初は redaction プロセッサをコメントアウトしておいてください（後の演習で有効にします）。設定は次のようになります

    traces:
      receivers:
      - otlp
      processors:
      - memory_limiter
      - attributes/update              # Update, hash, and remove attributes
      #- redaction/redact               # Redact sensitive fields using regex
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - file
      - otlphttp

otelbin.io を使用してAgent設定を検証します。参考として、パイプラインの traces: セクションは次のようになります

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
      REC1(&nbsp;&nbsp;otlp&nbsp;&nbsp;<br>fa:fa-download):::receiver
      PRML(memory_limiter<br>fa:fa-microchip):::processor
      PRRD(resourcedetection<br>fa:fa-microchip):::processor
      PRRS(resource<br>fa:fa-microchip<br>add_mode):::processor
      PRUP(attributes<br>fa:fa-microchip<br>update):::processor
      EXP1(otlphttp<br>fa:fa-upload):::exporter
      EXP2(&ensp;&ensp;debug&ensp;&ensp;<br>fa:fa-upload):::exporter
      EXP3(file<br>fa:fa-upload):::exporter

    %% Links
    subID1:::sub-traces
    subgraph " "
      subgraph subID1[**Traces**]
      direction LR
      REC1 --> PRML
      PRML --> PRUP
      PRUP --> PRRD
      PRRD --> PRRS
      PRRS --> EXP2
      PRRS --> EXP3
      PRRS --> EXP1
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-traces stroke:#fbbf24,stroke-width:1px, color:#fbbf24,stroke-dasharray: 3 3;

4.2 Attribute Processorのテスト

この演習では、Agent がエクスポートする前に、Spanデータから user.account_password を削除し、user.phone_number 属性を更新し、user.email をハッシュ化します。

Exercise

Gatewayを起動する：Gateway terminal ウィンドウで Gateway を起動します。

../otelcol --config=gateway.yaml

Agentを起動する：Agent terminal ウィンドウで Agent を起動します。

../otelcol --config=agent.yaml

Load Generatorを起動する：Loadgen terminal ウィンドウで loadgen を起動します

../loadgen -count 1

デバッグ出力を確認する：Agent と Gateway の両方で、user.account_password が削除され、user.phone_number と user.email が更新されていることを確認します

   -> user.name: Str(George Lucas)
   -> user.phone_number: Str(UNKNOWN NUMBER)
   -> user.email: Str(62d5e03d8fd5808e77aee5ebbd90cf7627a470ae0be9ffd10e8025a4ad0e1287)
   -> payment.amount: Double(51.71)
   -> user.visa: Str(4111 1111 1111 1111)
   -> user.amex: Str(3782 822463 10005)
   -> user.mastercard: Str(5555 5555 5555 4444)

    -> user.name: Str(George Lucas)
    -> user.phone_number: Str(+1555-867-5309)
    -> user.email: Str(george@deathstar.email)
    -> user.password: Str(LOTR>StarWars1-2-3)
    -> user.visa: Str(4111 1111 1111 1111)
    -> user.amex: Str(3782 822463 10005)
    -> user.mastercard: Str(5555 5555 5555 4444)
    -> payment.amount: Double(95.22)

ファイル出力を確認する：jq を使用して、gateway-traces.out で user.account_password が削除され、user.phone_number と user.email が更新されていることを検証します

jq '.resourceSpans[].scopeSpans[].spans[].attributes[] | select(.key == "user.password" or .key == "user.phone_number" or .key == "user.email") | {key: .key, value: .value.stringValue}' ./gateway-traces.out

user.account_password が削除され、user.phone_number と user.email が更新されていることに注目してください

{
  "key": "user.phone_number",
  "value": "UNKNOWN NUMBER"
}
{
  "key": "user.email",
  "value": "62d5e03d8fd5808e77aee5ebbd90cf7627a470ae0be9ffd10e8025a4ad0e1287"
}

重要

Agent と Gateway プロセスを、それぞれのターミナルで Ctrl-C を押して停止してください。

4.3 Redaction Processorのテスト

redaction プロセッサは、テレメトリーデータからどの属性と値を許可または削除するかを正確に制御できます。

この演習では、Agent がエクスポートする前に、Spanデータの user.visa と user.mastercard の値を秘匿化します。

Exercise

Gatewayを起動する：Gateway terminal ウィンドウで Gateway を起動します。

../otelcol --config=gateway.yaml

redaction/redact プロセッサを有効にする：Agent terminal ウィンドウで、agent.yaml を編集して前の演習で追加した # を削除します。

    traces:
      receivers:
      - otlp
      processors:
      - memory_limiter
      - attributes/update              # Update, hash, and remove attributes
      - redaction/redact               # Redact sensitive fields using regex
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - file
      - otlphttp

Agentを起動する：Agent terminal ウィンドウで Agent を起動します。

../otelcol --config=agent.yaml

Load Generatorを起動する：Loadgen terminal ウィンドウで loadgen を起動します

../loadgen -count 1

デバッグ出力を確認する：Agent と Gateway の両方で、user.visa と user.mastercard の値が更新されていることを確認します。user.amex 属性の値は、blocked_values に一致する正規表現パターンが追加されていないため、秘匿化されていないことに注意してください。

   -> user.name: Str(George Lucas)
   -> user.phone_number: Str(UNKNOWN NUMBER)
   -> user.email: Str(62d5e03d8fd5808e77aee5ebbd90cf7627a470ae0be9ffd10e8025a4ad0e1287)
   -> payment.amount: Double(69.71)
   -> user.visa: Str(****)
   -> user.amex: Str(3782 822463 10005)
   -> user.mastercard: Str(****)
   -> redaction.masked.keys: Str(user.mastercard,user.visa)
   -> redaction.masked.count: Int(2)

    -> user.name: Str(George Lucas)
    -> user.phone_number: Str(+1555-867-5309)
    -> user.email: Str(george@deathstar.email)
    -> user.password: Str(LOTR>StarWars1-2-3)
    -> user.visa: Str(4111 1111 1111 1111)
    -> user.amex: Str(3782 822463 10005)
    -> user.mastercard: Str(5555 5555 5555 4444)
    -> payment.amount: Double(65.54)

メモ

redaction プロセッサに summary:debug を含めると、デバッグ出力に、どの一致するキー値が秘匿化されたか、およびマスクされた値の数に関するサマリー情報が含まれます。

     -> redaction.masked.keys: Str(user.mastercard,user.visa)
     -> redaction.masked.count: Int(2)

ファイル出力を確認する：jq を使用して、gateway-traces.out で user.visa と user.mastercard が更新されていることを検証します。

jq '.resourceSpans[].scopeSpans[].spans[].attributes[] | select(.key == "user.visa" or .key == "user.mastercard" or .key == "user.amex") | {key: .key, value: .value.stringValue}' ./gateway-traces.out

blocked_values に一致する正規表現パターンが追加されていないため、user.amex は秘匿化されていないことに注意してください

{
  "key": "user.visa",
  "value": "****"
}
{
  "key": "user.amex",
  "value": "3782 822463 10005"
}
{
  "key": "user.mastercard",
  "value": "****"
}

これらは、attributes と redaction プロセッサを設定して機密データを保護する方法のほんの一例です。

重要

Agent と Gateway プロセスを、それぞれのターミナルで Ctrl-C を押して停止してください。

5. Transform Data

10 minutes

Transform Processor を使用すると、パイプラインを流れるテレメトリデータ（ログ、メトリクス、トレース）を変更できます。OpenTelemetry Transformation Language (OTTL) を使用して、アプリケーションコードを変更することなく、データのフィルタリング、エンリッチメント、変換をその場で行うことができます。

この演習では、gateway.yaml を更新して、次の処理を行う Transform Processor を追加します

ログリソース属性の フィルタリング
JSON構造化ログデータの属性への パース
ログメッセージ本文に基づくログ重大度レベルの設定

以前のログで SeverityText や SeverityNumber が未定義だったことにお気づきかもしれません。これは filelog レシーバーの典型的な動作です。ただし、重大度はログ本文内に埋め込まれています。例

SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(2025-01-31 15:49:29 [WARN] - Do or do not, there is no try.)

ログには、ログ本文内にJSONとしてエンコードされた構造化データが含まれていることがよくあります。これらのフィールドを属性として抽出することで、インデックス作成、フィルタリング、クエリの効率が向上します。下流のシステムで手動でJSONをパースする代わりに、OTTLを使用してテレメトリパイプラインレベルで自動的に変換できます。

Exercise

重要

すべてのターミナルウィンドウを 5-transform-data ディレクトリに移動し、clear コマンドを実行してください。

4-sensitve-data ディレクトリから *.yaml を 5-transform-data にコピーします。更新後のディレクトリ構造は次のようになります

.
├── agent.yaml
└── gateway.yaml

5.1 Configuration

Exercise

transform プロセッサーを追加する: Gateway terminal ウィンドウに切り替え、gateway.yaml を編集して次の transform プロセッサーを追加します

  transform/logs:                      # Processor Type/Name
    log_statements:                    # Log Processing Statements
      - context: resource              # Log Context
        statements:                    # List of attribute keys to keep
          - keep_keys(attributes, ["com.splunk.sourcetype", "host.name", "otelcol.service.mode"])

-context: resource キーを使用することで、ログの resourceLog 属性をターゲットにしています。

この設定により、関連するリソース属性（com.splunk.sourcetype、host.name、otelcol.service.mode）のみが保持され、ログの効率が向上し、不要なメタデータが削減されます。

ログ重大度マッピング用のコンテキストブロックを追加する: ログレコードの severity_text と severity_number フィールドを適切に設定するために、log_statements 内に log コンテキストブロックを追加します。この設定では、ログ本文から level 値を抽出し、severity_text にマッピングし、ログレベルに基づいて対応する severity_number を割り当てます

      - context: log                   # Log Context
        statements:                    # Transform Statements Array
          - set(cache, ParseJSON(body)) where IsMatch(body, "^\\{")  # Parse JSON log body into a cache object
          - flatten(cache, "")                                        # Flatten nested JSON structure
          - merge_maps(attributes, cache, "upsert")                   # Merge cache into attributes, updating existing keys
          - set(severity_text, attributes["level"])                   # Set severity_text from the "level" attribute
          - set(severity_number, 1) where severity_text == "TRACE"    # Map severity_text to severity_number
          - set(severity_number, 5) where severity_text == "DEBUG"
          - set(severity_number, 9) where severity_text == "INFO"
          - set(severity_number, 13) where severity_text == "WARN"
          - set(severity_number, 17) where severity_text == "ERROR"
          - set(severity_number, 21) where severity_text == "FATAL"

merge_maps 関数は、2つのマップ（辞書）を1つに結合するために使用されます。この場合、cache オブジェクト（ログ本文からパースされたJSONデータを含む）を attributes マップにマージします。

パラメータ:
- attributes: データがマージされるターゲットマップ
- cache: パースされたJSONデータを含むソースマップ
- "upsert": このモードは、attributes マップにすでにキーが存在する場合、その値が cache の値で更新されることを保証します。キーが存在しない場合は、挿入されます。

このステップは、ログ本文からのすべての関連フィールド（例：level、message など）が attributes マップに追加され、さらなる処理やエクスポートで利用可能になることを保証するため、非常に重要です。

主要な変換の概要:

Parse JSON: ログ本文から構造化データを抽出します。
Flatten JSON: ネストされたJSONオブジェクトをフラットな構造に変換します。
Merge Attributes: 抽出されたデータをログ属性に統合します。
Map Severity Text: ログの level 属性から severity_text を割り当てます。
Assign Severity Numbers: 重大度レベルを標準化された数値に変換します。

重要

resource 用のコンテキストブロックと log 用のコンテキストブロックの2つを含む 単一の transform プロセッサーが必要です。

この設定により、ログの重大度が正しく抽出、標準化され、効率的な処理のために構造化されます。

Tip

すべてのJSONフィールドをトップレベルの属性にマッピングするこの方法は、OTTLのテストとデバッグのみに使用してください。本番環境では高いカーディナリティが発生します。

logs パイプラインを更新する: logs: パイプラインに transform/logs: プロセッサーを追加し、設定が次のようになるようにします

    logs:                         # Logs pipeline
      receivers:
      - otlp                      # OTLP receiver
      processors:                 # Processors for logs
      - memory_limiter
      - resource/add_mode
      - transform/logs
      - batch
      exporters:
      - debug                     # Debug exporter
      - file/logs

https://otelbin.io を使用して Agent の設定を検証します。参考として、パイプラインの logs: セクションは次のようになります

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
      REC1(&nbsp;&nbsp;otlp&nbsp;&nbsp;<br>fa:fa-download):::receiver
      PRO1(memory_limiter<br>fa:fa-microchip):::processor
      PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
      PRO4(transform<br>fa:fa-microchip<br>logs):::processor
      PRO5(batch<br>fa:fa-microchip):::processor
      EXP1(file<br>fa:fa-upload<br>logs):::exporter
      EXP2(&ensp;&ensp;debug&ensp;&ensp;<br>fa:fa-upload):::exporter
    %% Links
    subID1:::sub-logs
    subgraph " "
      subgraph subID1[**Logs**]
      direction LR
      REC1 --> PRO1
      PRO1 --> PRO3
      PRO3 --> PRO4
      PRO4 --> PRO5
      PRO5 --> EXP2
      PRO5 --> EXP1
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-logs stroke:#34d399,stroke-width:1px, color:#34d399,stroke-dasharray: 3 3;

5.2 Setup Environment

Exercise

Gateway を起動する: Gateway terminal で以下を実行します

../otelcol --config=gateway.yaml

Agent を起動する: Agent terminal で以下を実行します

../otelcol --config=agent.yaml

Load Generator を起動する: Loadgen terminal ウィンドウで、次のコマンドを実行して JSON を有効にした Load Generator を起動します

../loadgen -logs -json -count 5

loadgen は JSON 形式で 5 行のログを ./quotes.log に書き込みます。

5.3 Test Transform Processor

このテストでは、Agent によってエクスポートされる前に、com.splunk/source と os.type のメタデータがログリソース属性から削除されていることを確認します。さらに、このテストでは以下を確認します

重大度情報を抽出するためにログ本文がパースされていること
- SeverityText と SeverityNumber が LogRecord に設定されていること
ログ本文の JSON フィールドがログ attributes に昇格していること

これにより、エクスポート前に適切なメタデータフィルタリング、重大度マッピング、および構造化ログのエンリッチメントが行われることが保証されます。

Exercise

デバッグ出力を確認する: Agent と Gateway の両方で、com.splunk/source と os.type が削除されていることを確認します

Resource attributes:
   -> com.splunk.sourcetype: Str(quotes)
   -> host.name: Str(workshop-instance)
   -> otelcol.service.mode: Str(agent)

Resource attributes:
   -> com.splunk.source: Str(./quotes.log)
   -> com.splunk.sourcetype: Str(quotes)
   -> host.name: Str(workshop-instance)
   -> os.type: Str(linux)
   -> otelcol.service.mode: Str(agent)

Agent と Gateway の両方で、LogRecord の SeverityText と SeverityNumber がログ本文の重大度 level で定義されていることを確認します。また、本文の JSON フィールドがトップレベルのログ Attributes としてアクセスできることを確認します

<snip>
SeverityText: WARN
SeverityNumber: Warn(13)
Body: Str({"level":"WARN","message":"Your focus determines your reality.","movie":"SW","timestamp":"2025-03-07 11:17:26"})
Attributes:
     -> log.file.path: Str(quotes.log)
     -> level: Str(WARN)
     -> message: Str(Your focus determines your reality.)
     -> movie: Str(SW)
     -> timestamp: Str(2025-03-07 11:17:26)
</snip>

<snip>
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str({"level":"WARN","message":"Your focus determines your reality.","movie":"SW","timestamp":"2025-03-07 11:17:26"})
Attributes:
     -> log.file.path: Str(quotes.log)
</snip>

ファイル出力を確認する: 新しい gateway-logs.out ファイルでデータが変換されていることを確認します

jq '[.resourceLogs[].scopeLogs[].logRecords[] | {severityText, severityNumber, body: .body.stringValue}]' gateway-logs.out

[
  {
    "severityText": "DEBUG",
    "severityNumber": 5,
    "body": "{\"level\":\"DEBUG\",\"message\":\"All we have to decide is what to do with the time that is given us.\",\"movie\":\"LOTR\",\"timestamp\":\"2025-03-07 11:56:29\"}"
  },
  {
    "severityText": "WARN",
    "severityNumber": 13,
    "body": "{\"level\":\"WARN\",\"message\":\"The Force will be with you. Always.\",\"movie\":\"SW\",\"timestamp\":\"2025-03-07 11:56:29\"}"
  },
  {
    "severityText": "ERROR",
    "severityNumber": 17,
    "body": "{\"level\":\"ERROR\",\"message\":\"One does not simply walk into Mordor.\",\"movie\":\"LOTR\",\"timestamp\":\"2025-03-07 11:56:29\"}"
  },
  {
    "severityText": "DEBUG",
    "severityNumber": 5,
    "body": "{\"level\":\"DEBUG\",\"message\":\"Do or do not, there is no try.\",\"movie\":\"SW\",\"timestamp\":\"2025-03-07 11:56:29\"}"
  }
]
[
  {
    "severityText": "ERROR",
    "severityNumber": 17,
    "body": "{\"level\":\"ERROR\",\"message\":\"There is some good in this world, and it's worth fighting for.\",\"movie\":\"LOTR\",\"timestamp\":\"2025-03-07 11:56:29\"}"
  }
]

重要

それぞれのターミナルで Ctrl-C を押して、Agent と Gateway のプロセスを停止してください。

6. Routing Data

10 minutes

OpenTelemetry の Routing Connector は、特定の条件に基づいてデータ（traces、metrics、または logs）を異なるパイプライン/宛先に振り分けることができる強力な機能です。これは、テレメトリデータのサブセットに異なる処理やエクスポートロジックを適用したい場合に特に有用です。

例えば、本番環境 のデータを1つのエクスポーターに送信し、テスト や開発のデータを別のエクスポーターに振り分けることができます。同様に、サービス名、環境、スパン名などの属性に基づいて特定のスパンをルーティングし、カスタムの処理やストレージロジックを適用することもできます。

Exercise

重要

すべてのターミナルウィンドウを 6-routing-data ディレクトリに移動し、clear コマンドを実行してください。

5-transform-data ディレクトリから *.yaml を 6-routing-data にコピーします。更新後のディレクトリ構造は次のようになります

.
├── agent.yaml
└── gateway.yaml

次に、Routing Connector とそれぞれのパイプラインを設定します。

6.1 Configure the Routing Connector

この演習では、gateway.yaml で Routing Connector を設定します。Routing Connector はメトリクス、トレース、ログを任意の属性に基づいてルーティングできますが、ここでは deployment.environment 属性に基づくトレースルーティングに焦点を当てます（ただし、任意のスパン/ログ/メトリクス属性を使用できます）。

Exercise

新しい file エクスポーターを追加する: routing コネクターには、ルーティング用に異なるターゲットが必要です。Gateway terminal で、gateway.yaml の exporters セクションに 2 つの新しいファイルエクスポーター file/traces/route1-regular と file/traces/route2-security を作成し、データが正しく振り分けられるようにします

  file/traces/route1-regular:                     # Exporter for regular traces
    path: "./gateway-traces-route1-regular.out"   # Path for saving trace data
    append: false                                 # Overwrite the file each time
  file/traces/route2-security:                    # Exporter for security traces
    path: "./gateway-traces-route2-security.out"  # Path for saving trace data
    append: false                                 # Overwrite the file each time

ルーティングを有効にする: routing コネクターを追加します。OpenTelemetry の設定ファイルでは、connectors はレシーバーやプロセッサーと同様に専用のセクションを持っています。

#connectors: セクションを見つけてコメントを解除します。次に、connectors: セクションの下に以下を追加します

  routing:
    default_pipelines: [traces/route1-regular]  # Default pipeline if no rule matches
    error_mode: ignore                          # Ignore errors in routing
    table:                                      # Define routing rules
      # Routes spans to a target pipeline if the resourceSpan attribute matches the rule
      - statement: route() where attributes["deployment.environment"] == "security-applications"
        pipelines: [traces/route2-security]     # Security target pipeline

設定ファイルのデフォルトパイプラインは、キャッチオールとして機能します。ルーティングルールテーブルのルールに一致しないすべてのデータ（この場合はスパン）のルーティング先となります。このテーブルには、["deployment.environment"] == "security-applications" ルールに一致するスパンのターゲットパイプラインが定義されています。

routing の設定が完了したら、次のステップはこれらのルーティングルールを適用する pipelines を設定することです。

6.2 Configuring the Pipelines

Exercise

元の traces パイプラインをルーティングを使用するように更新する:

routing を有効にするには、元の traces パイプラインを更新して、routing のみをエクスポーターとして使用します。これにより、すべてのスパンデータが Routing Connector を経由して評価され、接続されたパイプラインに転送されます。また、すべての プロセッサーを削除し、空の配列（[]）に置き換えます。これは、traces/route1-regular と traces/route2-security パイプラインで処理されるようになり、各ルートに対してカスタム動作が可能になるためです。traces: の設定は次のようになります
```
traces:                       # Traces pipeline
  receivers:
  - otlp                      # OTLP receiver
  processors: []              # Processors for traces
  exporters:
  - routing
```

既存の traces パイプラインの下に route1-regular と route2-security の両方のトレースパイプラインを追加する:

Route1-regular パイプラインを設定する: このパイプラインは、コネクターのルーティングテーブルに一致しないすべてのスパンを処理します。これは唯一のレシーバーとして routing を使用し、元の traces パイプラインからの connection を通じてデータを受信することに注意してください。

    traces/route1-regular:         # Default pipeline for unmatched spans
      receivers:
      - routing                    # Receive data from the routing connector
      processors:
      - memory_limiter             # Memory Limiter Processor
      - resource/add_mode          # Adds collector mode metadata
      - batch
      exporters:
      - debug                      # Debug Exporter
      - file/traces/route1-regular # File Exporter for unmatched spans

route2-security パイプラインを追加する: このパイプラインは、ルーティングルールの "[deployment.environment"] == "security-applications" ルールに一致するすべてのスパンを処理します。このパイプラインもレシーバーとして routing を使用しています。このパイプラインを traces/route1-regular の下に追加します。

    traces/route2-security:         # Default pipeline for unmatched spans
      receivers:
      - routing                     # Receive data from the routing connector
      processors:
      - memory_limiter              # Memory Limiter Processor
      - resource/add_mode           # Adds collector mode metadata
      - batch
      exporters:
      - debug                       # Debug exporter
      - file/traces/route2-security # File exporter for unmatched spans

otelbin.io を使用して Agent の設定を検証します。参考として、パイプラインの traces: セクションは次のようになります

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
      REC1(&nbsp;&nbsp;&nbsp;otlp&nbsp;&nbsp;&nbsp;<br>fa:fa-download):::receiver
      PRO1(memory_limiter<br>fa:fa-microchip):::processor
      PRO2(memory_limiter<br>fa:fa-microchip):::processor
      PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
      PRO4(resource<br>fa:fa-microchip<br>add_mode):::processor
      PRO5(batch<br>fa:fa-microchip):::processor
      PRO6(batch<br>fa:fa-microchip):::processor
      EXP1(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload):::exporter
      EXP2(&emsp;&emsp;file&emsp;&emsp;<br>fa:fa-upload<br>traces):::exporter
      EXP3(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload):::exporter
      EXP4(&emsp;&emsp;file&emsp;&emsp;<br>fa:fa-upload<br>traces):::exporter
      ROUTE1(&nbsp;routing&nbsp;<br>fa:fa-route):::con-export
      ROUTE2(&nbsp;routing&nbsp;<br>fa:fa-route):::con-receive
      ROUTE3(&nbsp;routing&nbsp;<br>fa:fa-route):::con-receive
    %% Links
    subID1:::sub-traces
    subID2:::sub-traces
    subID3:::sub-traces
    subgraph " "
    direction LR
      subgraph subID1[**Traces**]
      REC1 --> ROUTE1
      end
      subgraph subID2[**Traces/route2-security**]
      ROUTE1 --> ROUTE2
      ROUTE2 --> PRO1
      PRO1 --> PRO3
      PRO3 --> PRO5
      PRO5 --> EXP1
      PRO5 --> EXP2
      end
      subgraph subID3[**Traces/route1-regular**]
      ROUTE1 --> ROUTE3
      ROUTE3 --> PRO2
      PRO2 --> PRO4
      PRO4 --> PRO6
      PRO6 --> EXP3
      PRO6 --> EXP4
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-traces stroke:#fbbf24,stroke-width:1px, color:#fbbf24,stroke-dasharray: 3 3;

6.3 Test Routing Connector

Exercise

このセクションでは、Gateway 用に設定した routing ルールをテストします。期待される結果は、"[deployment.environment"] == "security-applications" ルールに一致する loadgen によって生成された span が gateway-traces-route2-security.out ファイルに送信されることです。

Gateway を起動する: Gateway terminal ウィンドウで Gateway を起動します。

../otelcol --config gateway.yaml

Agent を起動する: Agent terminal ウィンドウで Agent を起動します。

../otelcol --config agent.yaml

通常のスパンを送信する: Loadgen terminal ウィンドウで loadgen を使用して通常のスパンを送信します

../loadgen -count 1

Agent と Gateway の両方でデバッグ情報が表示されます。Gateway は新しい gateway-traces-route1-regular.out ファイルも生成します。これが通常のスパンの指定された宛先になりました。

Tip

gateway-traces-route1-regular.out を確認すると、loadgen によって送信された span が含まれています。また、空の gateway-traces-route2-security..out ファイルも表示されます。これは、ルーティング設定が、一致するスパンがまだ処理されていなくても、すぐに出力ファイルを作成するためです。

セキュリティスパンを送信する: Loadgen terminal ウィンドウで security フラグを使用してセキュリティスパンを送信します

../loadgen -security -count 1

再び、Agent と Gateway の両方で、送信したスパンを含むデバッグ情報が表示されるはずです。今回は、Gateway が gateway-traces-route2-security.out ファイルに行を書き込みます。これは、deployment.environment リソース属性が "security-applications" に一致するスパン用に指定されたファイルです。

jq -c '.resourceSpans[] as $resource | $resource.scopeSpans[].spans[] | {spanId: .spanId, deploymentEnvironment: ($resource.resource.attributes[] | select(.key == "deployment.environment") | .value.stringValue)}' gateway-traces-route2-security.out

{"spanId":"cb799e92e26d5782","deploymentEnvironment":"security-applications"}

このシナリオを複数回繰り返すことができ、各トレースは対応する出力ファイルに書き込まれます。

重要

それぞれのターミナルで Ctrl-C を押して、Agent と Gateway のプロセスを停止してください。

まとめ

このセクションでは、異なるスパンを送信し、その宛先を確認することで、Gateway のルーティングコネクターを正常にテストしました。

通常のスパン は gateway-traces-route1-regular.out に正しくルーティングされ、一致する deployment.environment 属性を持たないスパンがデフォルトパイプラインに従うことが確認されました。
セキュリティ関連のスパン は gateway-traces-route2-security.out にルーティングされ、"deployment.environment": "security-applications" に基づくルーティングルールが期待どおりに機能することが実証されました。

出力ファイルを検査することで、OpenTelemetry Collector が スパン属性を正しく評価し、適切な宛先にルーティングしている ことを確認しました。これにより、ルーティングルールが異なるユースケース向けにテレメトリデータを効果的に分離して振り分けることができることが検証されました。

追加のルーティングルールを定義して、異なる属性に基づいてスパン、メトリクス、ログをさらに分類することで、このアプローチを拡張できます。

7. Count Connector でメトリクスを作成する

10 minutes

このセクションでは、Count Connector を使用して、ログから属性値を抽出し、意味のあるメトリクスに変換する方法を説明します。

具体的には、Count Connector を使用して、ログに出現する「Star Wars」と「Lord of the Rings」の名言の数を追跡し、測定可能なデータポイントに変換します。

Exercise

重要

すべてのターミナルウィンドウを 7-sum-count ディレクトリに変更し、clear コマンドを実行してください。

6-routing-data ディレクトリから *.yaml を 7-sum-count にコピーしてください。更新後のディレクトリ構造は以下のようになります

.
├── agent.yaml
└── gateway.yaml

agent.yaml を更新して、ログを読み取る頻度を変更します。 agent.yaml 内の filelog/quotes レシーバーを見つけ、poll_interval 属性を追加してください

  filelog/quotes:                      # Receiver Type/Name
    poll_interval: 10s                 # Only read every ten seconds

遅延を設定する理由は、OpenTelemetry Collector の Count Connector が各処理インターバル内でのみログをカウントするためです。つまり、データが読み取られるたびに、次のインターバルでカウントがゼロにリセットされます。デフォルトの Filelog receiver インターバル 200ms では、loadgen が書き込むすべての行を読み取り、カウントが1になります。このインターバルを設定することで、複数のエントリをカウントできるようになります。

以下に示すように、条件を省略することで、Collector は各読み取りインターバルの累計カウントを維持できます。ただし、バックエンドは長期間にわたってカウントを追跡できるため、累計カウントはバックエンドに任せるのがベストプラクティスです。

Exercise

Count Connector を追加する

設定の connectors セクションに Count Connector を追加し、使用するメトリクスカウンターを定義します

connectors:
  count:
    logs:
      logs.full.count:
        description: "Running count of all logs read in interval"
      logs.sw.count:
        description: "StarWarsCount"
        conditions:
        - attributes["movie"] == "SW"
      logs.lotr.count:
        description: "LOTRCount"
        conditions:
        - attributes["movie"] == "LOTR"
      logs.error.count:
        description: "ErrorCount"
        conditions:
        - attributes["level"] == "ERROR"

メトリクスカウンターの説明
- logs.full.count：各読み取りインターバルで処理されたログの総数を追跡します。このメトリクスにはフィルタリング条件がないため、システムを通過するすべてのログがカウントに含まれます。
- logs.sw.count：Star Wars 映画の名言を含むログをカウントします。
- logs.lotr.count：Lord of the Rings 映画の名言を含むログをカウントします。
- logs.error.count：読み取りインターバルで重大度レベルが ERROR のログをカウントする、実際のシナリオを表します。
パイプラインで Count Connector を設定する 以下のパイプライン設定では、connector exporter が logs セクションに追加され、connector receiver が metrics セクションに追加されています。

  pipelines:
    traces:
      receivers:
      - otlp
      processors:
      - memory_limiter
      - attributes/update              # Update, hash, and remove attributes
      - redaction/redact               # Redact sensitive fields using regex
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - file
      - otlphttp
    metrics:
      receivers:
      - count                           # Count Connector that receives count metric from logs count exporter in logs pipeline.
      - otlp
      #- hostmetrics                    # Host Metrics Receiver
      processors:
      - memory_limiter
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - otlphttp
    logs:
      receivers:
      - otlp
      - filelog/quotes
      processors:
      - memory_limiter
      - resourcedetection
      - resource/add_mode
      - transform/logs                 # Transform logs processor
      - batch
      exporters:
      - count                          # Count Connector that exports count as a metric to metrics pipeline.
      - debug
      - otlphttp

ログは属性に基づいてカウントされます。ログデータが属性ではなくログボディに格納されている場合は、パイプラインで Transform プロセッサーを使用して、キー/バリューのペアを抽出し、属性として追加する必要があります。

このワークショップでは、05-transform-data セクションで既に merge_maps(attributes, cache, "upsert") を追加しています。これにより、関連するすべてのデータが処理用のログ属性に含まれるようになります。

属性を作成するフィールドを選択する際は注意が必要です。すべてのフィールドを無差別に追加することは、本番環境では一般的に理想的ではありません。不要なデータの乱雑さを避けるため、本当に必要なフィールドのみを選択してください。

Exercise

otelbin.io を使用して agent 設定を検証してください。参考として、パイプラインの logs と metrics: セクションは以下のようになります

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
    REC1(otlp<br>fa:fa-download):::receiver
    REC2(filelog<br>fa:fa-download<br>quotes):::receiver
    REC3(otlp<br>fa:fa-download):::receiver
    PRO1(memory_limiter<br>fa:fa-microchip):::processor
    PRO2(memory_limiter<br>fa:fa-microchip):::processor
    PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
    PRO4(resource<br>fa:fa-microchip<br>add_mode):::processor
    PRO5(batch<br>fa:fa-microchip):::processor
    PRO6(batch<br>fa:fa-microchip):::processor
    PRO7(resourcedetection<br>fa:fa-microchip):::processor
    PRO8(resourcedetection<br>fa:fa-microchip):::processor
    PRO9(transfrom<br>fa:fa-microchip<br>logs):::processor
    EXP1(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload):::exporter
    EXP2(&emsp;&emsp;otlphttp&emsp;&emsp;<br>fa:fa-upload):::exporter
    EXP3(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload):::exporter
    EXP4(&emsp;&emsp;otlphttp&emsp;&emsp;<br>fa:fa-upload):::exporter
    ROUTE1(&nbsp;count&nbsp;<br>fa:fa-route):::con-export
    ROUTE2(&nbsp;count&nbsp;<br>fa:fa-route):::con-receive

    %% Links
    subID1:::sub-logs
    subID2:::sub-metrics
    subgraph " "
      direction LR
      subgraph subID1[**Logs**]
      direction LR
      REC1 --> PRO1
      REC2 --> PRO1
      PRO1 --> PRO7
      PRO7 --> PRO3
      PRO3 --> PRO9
      PRO9 --> PRO5
      PRO5 --> ROUTE1
      PRO5 --> EXP1
      PRO5 --> EXP2
      end

      subgraph subID2[**Metrics**]
      direction LR
      ROUTE1 --> ROUTE2
      ROUTE2 --> PRO2
      REC3 --> PRO2
      PRO2 --> PRO8
      PRO8 --> PRO4
      PRO4 --> PRO6
      PRO6 --> EXP3
      PRO6 --> EXP4
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-logs stroke:#34d399,stroke-width:1px, color:#34d399,stroke-dasharray: 3 3;
classDef sub-metrics stroke:#38bdf8,stroke-width:1px, color:#38bdf8,stroke-dasharray: 3 3;

7.1 Count Connector のテスト

Exercise

Gateway を起動する Gateway terminal ウィンドウで以下を実行します

../otelcol --config=gateway.yaml

Agent を起動する Agent terminal ウィンドウで以下を実行します

../otelcol --config=agent.yaml

Loadgen で12行のログを送信する Spans terminal ウィンドウで、12行のログを送信します。これらは2つのインターバルで読み取られるはずです。以下の loadgen コマンドを実行してください

../loadgen -logs -json -count 12

Agent と Gateway の両方がデバッグ情報を表示し、データを処理していることを示します。loadgen が完了するまで待ちます。

メトリクスが生成されたことを確認する ログが処理されると、Agent がメトリクスを生成して Gateway に転送し、Gateway がそれらを gateway-metrics.out に書き込みます。

出力に logs.full.count、logs.sw.count、logs.lotr.count、logs.error.count のメトリクスが含まれているか確認するには、以下の jq クエリを実行します

jq '.resourceMetrics[].scopeMetrics[].metrics[]
    | select(.name == "logs.full.count" or .name == "logs.sw.count" or .name == "logs.lotr.count" or .name == "logs.error.count")
    | {name: .name, value: (.sum.dataPoints[0].asInt // "-")}' gateway-metrics.out

{
  "name": "logs.sw.count",
  "value": "2"
}
{
  "name": "logs.lotr.count",
  "value": "2"
}
{
  "name": "logs.full.count",
  "value": "4"
}
{
  "name": "logs.error.count",
  "value": "2"
}
{
  "name": "logs.error.count",
  "value": "1"
}
{
  "name": "logs.sw.count",
  "value": "2"
}
{
  "name": "logs.lotr.count",
  "value": "6"
}
{
  "name": "logs.full.count",
  "value": "8"
}

Tip

注：logs.full.count は通常 logs.sw.count + logs.lotr.count と等しくなりますが、logs.error.count はランダムな数値になります。

重要

それぞれのターミナルで Ctrl-C を押して Agent と Gateway のプロセスを停止してください。

7.2 Sum Connector でメトリクスを作成する

10 minutes

このセクションでは、Sum Connector がスパンから値を抽出してメトリクスに変換する方法を説明します。

具体的には、ベーススパンからクレジットカードの請求額を取得し、Sum Connector を活用して合計請求額をメトリクスとして取得します。

この connector は、スパン、スパンイベント、メトリクス、データポイント、およびログレコードから属性値を収集（sum）するために使用できます。各個別の値をキャプチャし、メトリクスに変換して転送します。ただし、これらのメトリクスと属性を使用して計算やさらなる処理を行うのはバックエンドの役割です。

Exercise

Agent terminal ウィンドウに切り替えて、エディターで agent.yaml ファイルを開きます。

Sum Connector を追加する 設定の connectors セクションに Sum Connector を追加し、メトリクスカウンターを定義します

  sum:
    spans:
       user.card-charge:
        source_attribute: payment.amount
        conditions:
          - attributes["payment.amount"] != "NULL"
        attributes:
          - key: user.name

上記の例では、スパン内の payment.amount 属性をチェックしています。有効な値がある場合、Sum connector は user.card-charge というメトリクスを生成し、user.name を属性として含めます。これにより、バックエンドは請求サイクルなどの長期間にわたってユーザーの合計請求額を追跡して表示できます。

以下のパイプライン設定では、connector exporter が traces セクションに追加され、connector receiver が metrics セクションに追加されています。

Exercise

パイプラインで Count Connector を設定する

  pipelines:
    traces:
      receivers:
      - otlp
      processors:
      - memory_limiter
      - attributes/update              # Update, hash, and remove attributes
      - redaction/redact               # Redact sensitive fields using regex
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - file
      - otlphttp
      - sum                            # Sum connector which aggregates payment.amount from spans and sends to metrics pipeline
    metrics:
      receivers:
      - sum                            # Receives metrics from the sum exporter in the traces pipeline
      - count                          # Receives count metric from logs count exporter in logs pipeline.
      - otlp
      #- hostmetrics                   # Host Metrics Receiver
      processors:
      - memory_limiter
      - resourcedetection
      - resource/add_mode
      - batch
      exporters:
      - debug
      - otlphttp
    logs:
      receivers:
      - otlp
      - filelog/quotes
      processors:
      - memory_limiter
      - resourcedetection
      - resource/add_mode
      - transform/logs                 # Transform logs processor
      - batch
      exporters:
      - count                          # Count Connector that exports count as a metric to metrics pipeline.
      - debug
      - otlphttp

otelbin.io を使用して agent 設定を検証してください。参考として、パイプラインの traces と metrics: セクションは以下のようになります

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
    REC1(otlp<br>fa:fa-download<br> ):::receiver
    REC3(otlp<br>fa:fa-download<br> ):::receiver
    PRO1(memory_limiter<br>fa:fa-microchip<br> ):::processor
    PRO2(memory_limiter<br>fa:fa-microchip<br> ):::processor
    PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
    PRO4(resource<br>fa:fa-microchip<br>add_mode):::processor
    PRO5(batch<br>fa:fa-microchip<br> ):::processor
    PRO6(batch<br>fa:fa-microchip<br> ):::processor
    PRO7(resourcedetection<br>fa:fa-microchip<br> ):::processor
    PRO8(resourcedetection<br>fa:fa-microchip<br>):::processor

    PROA(attributes<br>fa:fa-microchip<br>redact):::processor
    PROB(redaction<br>fa:fa-microchip<br>update):::processor
    EXP1(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload<br> ):::exporter
    EXP2(&emsp;&emsp;file&emsp;&emsp;<br>fa:fa-upload<br> ):::exporter
    EXP3(&nbsp;&ensp;debug&nbsp;&ensp;<br>fa:fa-upload<br> ):::exporter
    EXP4(&emsp;&emsp;otlphttp&emsp;&emsp;<br>fa:fa-upload<br> ):::exporter
    EXP5(&emsp;&emsp;otlphttp&emsp;&emsp;<br>fa:fa-upload<br> ):::exporter
    ROUTE1(&nbsp;sum&nbsp;<br>fa:fa-route<br> ):::con-export
    ROUTE2(&nbsp;count&nbsp;<br>fa:fa-route<br> ):::con-receive
    ROUTE3(&nbsp;sum&nbsp;<br>fa:fa-route<br> ):::con-receive

    %% Links
    subID1:::sub-traces
    subID2:::sub-metrics
    subgraph " "
      direction LR
      subgraph subID1[**Traces**]
      direction LR
      REC1 --> PRO1
      PRO1 --> PROA
      PROA --> PROB
      PROB --> PRO7
      PRO7 --> PRO3
      PRO3 --> PRO5
      PRO5 --> EXP1
      PRO5 --> EXP2
      PRO5 --> EXP5
      PRO5 --> ROUTE1
      end

      subgraph subID2[**Metrics**]
      direction LR
      ROUTE1 --> ROUTE3
      ROUTE3 --> PRO2
      ROUTE2 --> PRO2
      REC3 --> PRO2
      PRO2 --> PRO8
      PRO8 --> PRO4
      PRO4 --> PRO6
      PRO6 --> EXP3
      PRO6 --> EXP4
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-logs stroke:#34d399,stroke-width:1px, color:#34d399,stroke-dasharray: 3 3;
classDef sub-traces stroke:#fbbf24,stroke-width:1px, color:#fbbf24,stroke-dasharray: 3 3;
classDef sub-metrics stroke:#38bdf8,stroke-width:1px, color:#38bdf8,stroke-dasharray: 3 3;

7.3 Sum Connector のテスト

Exercise

Gateway を起動する Gateway terminal ウィンドウで以下を実行します

../otelcol --config=gateway.yaml

Agent を起動する Agent terminal ウィンドウで以下を実行します

../otelcol --config=agent.yaml

Loadgen を起動する Spans terminal ウィンドウで、以下の loadgen コマンドを使用して8つのスパンを送信します

../loadgen -count 8

Agent と Gateway の両方がデバッグ情報を表示し、データを処理していることを示します。loadgen が完了するまで待ちます。

メトリクスを確認する スパンを処理する際、Agent はメトリクスを生成して Gateway に転送します。Gateway はそれらを gateway-metrics.out に書き込みます。

メトリクス出力に user.card-charge が存在し、それぞれに user.name 属性があることを確認するには、以下の jq クエリを実行します

jq -r '.resourceMetrics[].scopeMetrics[].metrics[] | select(.name == "user.card-charge") | .sum.dataPoints[] | "\(.attributes[] | select(.key == "user.name").value.stringValue)\t\(.asDouble)"' gateway-metrics.out | while IFS=$'\t' read -r name charge; do
    printf "%-20s %s\n" "$name" "$charge"
done

George Lucas         67.49
Frodo Baggins        87.14
Thorin Oakenshield   90.98
Luke Skywalker       51.37
Luke Skywalker       65.56
Thorin Oakenshield   67.5
Thorin Oakenshield   66.66
Peter Jackson        94.39

重要

それぞれのターミナルで Ctrl-C を押して Agent と Gateway のプロセスを停止してください。

OpenTelemetry Collector ワークショップ

OpenTelemetry Collector ワークショップのサブセクション

OpenTelemetry でオブザーバビリティをクラウドネイティブに

概要

Ninja セクション

対象者

前提条件

学習目標

OpenTelemetry アーキテクチャ

OpenTelemetry Collector の基本概念のサブセクション

OpenTelemetry Collector Contrib のインストール

OpenTelemetry Collector Contrib ディストリビューションのダウンロード

OpenTelemetry Collector Contrib ディストリビューションのインストール

1. インストールのサブセクション

OpenTelemetry Collector Contrib のインストール

Collector が動作していることを確認する

なぜ独自の Collector をビルドするのか？

独自の Collector をビルドするメリットは？

Collector をビルドする際の考慮事項は？

Ninja ゾーン

参考資料

デフォルト設定

OpenTelemetry Collector Extensions

2. Extensionsのサブセクション

OpenTelemetry Collector Extensions

Health Check

OpenTelemetry Collector Extensions

Performance Profiler

OpenTelemetry Collector Extensions

zPages

なぜデータをディスクにキューイングするのか？

データをディスクにキューイングする際の考慮事項

参考資料

設定の確認

OpenTelemetry Collector Receivers

3. Receiversのサブセクション

OpenTelemetry Collector Receivers

Host Metrics Receiver

OpenTelemetry Collector Receivers

Prometheus Receiver

ダッシュボード例 - Prometheus メトリクス

OpenTelemetry Collector Receivers

その他の Receiver

何が必要ですか？

考慮すべき事項

Ninja ゾーン

設定の確認

OpenTelemetry Collector Processors

4. Processorsのサブセクション

OpenTelemetry Collector Processors

Batch Processor

OpenTelemetry Collector Processors

Resource Detection Processor

OpenTelemetry Collector Processors

Attributes Processor

processor の代わりに connector を使用する理由

connector に関する考慮事項

参考資料

設定の確認

OpenTelemetry Collector Exporters

5. Exportersのサブセクション

OpenTelemetry Collector Exporters

OTLP HTTP Exporter

設定の確認

OpenTelemetry Collector Service

6. Serviceのサブセクション

OpenTelemetry Collector Service

Hostmetrics Receiver

OpenTelemetry Collector Service

Prometheus Internal Receiver

OpenTelemetry Collector Service

Resource Detection Processor

OpenTelemetry Collector Service

Attributes Processor

OpenTelemetry Collector Service

OTLP HTTP Exporter

なぜ Collector を監視するのか？

考慮事項

Ninja Zone

参考資料