4. Building In Resilience
10 minutesThe OpenTelemetry Collector’s FileStorage Extension enhances the resilience of your telemetry pipeline by providing reliable checkpointing, managing retries, and handling temporary failures effectively.
With this extension enabled, the OpenTelemetry Collector can store intermediate states on disk, preventing data loss during network disruptions and allowing it to resume operations seamlessly.
Note
This solution will work for metrics as long as the connection downtime is brief—up to 15 minutes. If the downtime exceeds this, Splunk Observability Cloud will drop data due to datapoints being out of order.
For logs, there are plans to implement a more enterprise-ready solution in one of the upcoming Splunk OpenTelemetry Collector releases.
Exercise
- Inside the
[WORKSHOP]
directory, create a new subdirectory named4-resilience
. - Next, copy all contents from the
3-filelog
directory into4-resilience
. - After copying, remove any
*.out
and*.log
files. - Change all terminal windows to the
[WORKSHOP]/4-reslilience
directory.
Your updated directory structure will now look like this:
WORKSHOP
├── 1-agent
├── 2-gateway
├── 3-filelog
├── 4-resilience
│ ├── agent.yaml
│ ├── gateway.yaml
│ ├── log-gen.sh (or .ps1)
│ └── trace.json
└── otelcol