7. Transform Data

10 minutes  

The Transform Processor lets you modify telemetry data—logs, metrics, and traces—as it flows through the pipeline. Using the OpenTelemetry Transformation Language (OTTL), you can filter, enrich, and transform data on the fly without touching your application code.

In this exercise we’ll update agent.yaml to include a Transform Processor that will:

  • Filter log resource attributes.
  • Parse JSON structured log data into attributes.
  • Set log severity levels based on the log message body.

You may have noticed that in previous logs, fields like SeverityText and SeverityNumber were undefined (this is typical of the filelog receiver). However, the severity is embedded within the log body:

<snip>
LogRecord #0
ObservedTimestamp: 2025-01-31 21:49:29.924017 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText: 
SeverityNumber: Unspecified(0)
Body: Str(2025-01-31 15:49:29 [WARN] - Do or do not, there is no try.)
</snip>

Logs often contain structured data encoded as JSON within the log body. Extracting these fields into attributes allows for better indexing, filtering, and querying. Instead of manually parsing JSON in downstream systems, OTTL enables automatic transformation at the telemetry pipeline level.

Exercise
  • Inside the [WORKSHOP] directory, create a new subdirectory named 7-transform.
  • Next, copy all contents from the 6-sensitve-data directory into 7-transform.
  • After copying, remove any *.out and *.log files.
  • Change all terminal windows to the [WORKSHOP]/7-transform directory.

Your updated directory structure will now look like this:

WORKSHOP
├── 1-agent
├── 2-gateway
├── 3-filelog
├── 4-resilience
├── 5-dropping-spans
├── 6-sensitive-data
├── 7-transform-data
│   ├───checkpoint-dir
│   ├── agent.yaml
│   ├── gateway.yaml
│   ├── health.json
│   ├── log-gen.sh (or .ps1)
│   └── trace.json
└── otelcol
Last Modified Feb 7, 2025

Subsections of 7. Transform Data

7.1 Transform Configuration

Exercise

Switch to your Agent terminal window. Navigate to the [WORKSHOP]/7-transform-data directory and open the agent.yaml file in your editor.

Configure the transform processor and name it /logs: By using the -context: resource key we are targeting the resourceLog attributes of logs.

This configuration ensures that only the relevant resource attributes (com.splunk.sourcetype, host.name, otelcol.service.mode) are retained, improving log efficiency and reducing unnecessary metadata.

  transform/logs:                     # Processor Type/Name
    log_statements:                   # Log Processing Statements
      - context: resource             # Log Context
        statements:                   # List of attribute keys to keep
          - keep_keys(attributes, ["com.splunk.sourcetype", "host.name", "otelcol.service.mode"])

Adding a Context Block for Log Severity Mapping: To properly set the severity_text and severity_number fields of a log record, we add another log context block within log_statements.

This configuration extracts the level value from the log body, maps it to severity_text, and assigns the appropriate severity_number:

      - context: log                  # Log Context
        statements:                   # Transform Statements Array
          - set(cache, ParseJSON(body)) where IsMatch(body, "^\\{")
          - flatten(cache, "")        
          - merge_maps(attributes, cache, "upsert")
          - set(severity_text, attributes["level"])
          - set(severity_number, 1) where severity_text == "TRACE"
          - set(severity_number, 5) where severity_text == "DEBUG"
          - set(severity_number, 9) where severity_text == "INFO"
          - set(severity_number, 13) where severity_text == "WARN"
          - set(severity_number, 17) where severity_text == "ERROR"
          - set(severity_number, 21) where severity_text == "FATAL"

Summary of Key Transformations:

  • Parse JSON: Extracts structured data from the log body.
  • Flatten JSON: Converts nested JSON objects into a flat structure.
  • Merge Attributes: Integrates extracted data into log attributes.
  • Map Severity Text: Assigns severity_text from the log’s level attribute.
  • Assign Severity Numbers: Converts severity levels into standardized numerical values.

You should have a single transform processor containing two context blocks: one for resource and one for log.

This configuration ensures that log severity is correctly extracted, standardized, and structured for efficient processing.

Tip

This method of mapping all JSON fields to top-level attributes should only be used for testing and debugging OTTL. It will result in high cardinality in a production scenario.

Update the logs pipeline: Add the transform/logs: processor into the logs: pipeline:

    logs:
      receivers:
      - otlp                     # OTLP Receiver
      - filelog/quotes           # Filelog Receiver reading quotes.log
      processors:
      - memory_limiter           # Memory Limiter Processor
      - resourcedetection        # Adds system attributes to the data
      - resource/add_mode        # Adds collector mode metadata
      - transform/logs           # Transform Processor to update log lines
      - batch                    # Batch Processor, groups data before send

Validate the agent configuration using otelbin.io. For reference, the logs: section of your pipelines will look similar to this:

%%{init:{"fontFamily":"monospace"}}%%
graph LR
    %% Nodes
      REC1(&nbsp;&nbsp;otlp&nbsp;&nbsp;<br>fa:fa-download):::receiver
      REC2(filelog<br>fa:fa-download<br>quotes):::receiver
      PRO1(memory_limiter<br>fa:fa-microchip):::processor
      PRO2(resourcedetection<br>fa:fa-microchip):::processor
      PRO3(resource<br>fa:fa-microchip<br>add_mode):::processor
      PRO4(transform<br>fa:fa-microchip<br>logs):::processor
      PRO5(batch<br>fa:fa-microchip):::processor
      EXP1(otlphttp<br>fa:fa-upload):::exporter
      EXP2(&ensp;&ensp;debug&ensp;&ensp;<br>fa:fa-upload):::exporter
    %% Links
    subID1:::sub-logs
    subgraph " "
      subgraph subID1[**Logs**]
      direction LR
      REC1 --> PRO1
      REC2 --> PRO1
      PRO1 --> PRO2
      PRO2 --> PRO3
      PRO3 --> PRO4
      PRO4 --> PRO5
      PRO5 --> EXP2
      PRO5 --> EXP1
      end
    end
classDef receiver,exporter fill:#8b5cf6,stroke:#333,stroke-width:1px,color:#fff;
classDef processor fill:#6366f1,stroke:#333,stroke-width:1px,color:#fff;
classDef con-receive,con-export fill:#45c175,stroke:#333,stroke-width:1px,color:#fff;
classDef sub-logs stroke:#34d399,stroke-width:1px, color:#34d399,stroke-dasharray: 3 3;
Last Modified Feb 13, 2025

7.2 Setup Environment

Exercise

Start the Log Generator: In the Test terminal window, navigate to the [WORKSHOP]/7-transform-data directory and start the appropriate log-gen script for your system. We want to work with structured JSON logs, so add the -json flag.

./log-gen.sh -json

The script will begin writing lines to a file named ./quotes.log, while displaying a single line of output in the console.

Writing logs to quotes.log. Press Ctrl+C to stop.

Start the Gateway: In the Gateway terminal window navigate to the [WORKSHOP]/7-transform-data directory and run:

../otelcol --config=gateway.yaml

Start the Agent: In the Agent terminal window navigate to the [WORKSHOP]/7-transform-data directory and run:

../otelcol --config=agent.yaml
Last Modified Feb 7, 2025

7.3 Test Transform Processor

This test verifies that the com.splunk/source and os.type metadata have been removed from the log resource attributes before being exported by the Agent. Additionally, the test ensures that:

  1. The log body is parsed to extract severity information.
    • SeverityText and SeverityNumber are set on the LogRecord.
  2. JSON fields from the log body are promoted to log attributes.

This ensures proper metadata filtering, severity mapping, and structured log enrichment before export.

Exercise

Check the debug output: For both the Agent and Gateway confirm that com.splunk/source and os.type have been removed:

  Resource attributes:
   -> com.splunk.sourcetype: Str(quotes)
   -> host.name: Str(YOUR_HOST_NAME)
   -> otelcol.service.mode: Str(agent)
  Resource attributes:
   -> com.splunk.sourcetype: Str(quotes)
   -> com.splunk/source: Str(./quotes.log)
   -> host.name: Str(YOUR_HOST_NAME)
   -> os.type: Str(YOUR_OS)
   -> otelcol.service.mode: Str(agent)

Check the debug output: For both the Agent and Gateway confirm that SeverityText and SeverityNumber in the LogRecord is now defined with the severity level from the log body. Confirm that the JSON fields from the body can be accessed as top-level log Attributes:

LogRecord #0
ObservedTimestamp: 2025-01-31 21:49:29.924017 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText: WARN
SeverityNumber: Warn(13)
Body: Str(2025-01-31 15:49:29 [WARN] - Do or do not, there is no try.)
Attributes:
    -> log.file.path: Str(quotes.log)
    -> timestamp: Str(2025-01-31 15:49:29)
    -> level: Str(WARN)
    -> message: Str(Do or do not, there is no try.)
Trace ID:
Span ID:
Flags: 0
  {"kind": "exporter", "data_type": "logs", "name": "debug"}
LogRecord #0
ObservedTimestamp: 2025-01-31 21:49:29.924017 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText: 
SeverityNumber: Unspecified(0)
Body: Str(2025-01-31 15:49:29 [WARN] - Do or do not, there is no try.)
Attributes:
    -> log.file.path: Str(quotes.log)
Trace ID:
Span ID:
Flags: 0
  {"kind": "exporter", "data_type": "logs", "name": "debug"}

Check file output: In the new gateway-logs.out file verify the data has been transformed:

      "resource": {
        "attributes": [
          {
            "key": "com.splunk.sourcetype",
            "value": {
              "stringValue": "quotes"
            }
          },
          {
            "key": "host.name",
            "value": {
              "stringValue": "YOUR_HOST_NAME"
            }
          },
          {
            "key": "otelcol.service.mode",
            "value": {
              "stringValue": "agent"
            }
          }
        ]
      },
      "scopeLogs": [
        {
          "scope": {},
          "logRecords": [
            {
              "observedTimeUnixNano": "1738360169924017000",
              "severityText": "WARN",
              "body": {
                "stringValue": "2025-01-31 15:49:29 [WARN] - Do or do not, there is no try."
              },
              "attributes": [
                {
                  "key": "log.file.path",
                  "value": {
                    "stringValue": "quotes.log"
                  }
                },
                {
                  "key": "timestamp",
                  "value": {
                    "stringValue": "2025-01-31 15:49:29"
                  }
                },
                {
                  "key": "level",
                  "value": {
                    "stringValue": "WARN"
                  }
                },
                {
                  "key": "message",
                  "value": {
                    "stringValue": "Do or do not, there is no try."
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            }
          ]
        }
      ]
      "resource": {
        "attributes": [
          {
            "key": "com.splunk.sourcetype",
            "value": {
              "stringValue": "quotes"
            }
          },
          {
            "key": "com.splunk.source",
            "value": {
              "stringValue": "./quotes.log"
            }
          },
          {
            "key": "host.name",
            "value": {
              "stringValue": "YOUR_HOST_NAME"
            }
          },
          {
            "key": "os.type",
            "value": {
              "stringValue": "YOUR_OS"
            }
          },
          {
            "key": "otelcol.service.mode",
            "value": {
              "stringValue": "agent"
            }
          }
        ]
      },
      "scopeLogs": [
        {
          "scope": {},
          "logRecords": [
            {
              "observedTimeUnixNano": "1738349801265812000",
              "body": {
                "stringValue": "2025-01-31 12:56:41 [INFO] - There is some good in this world, and it's worth fighting for."
              },
              "attributes": [
                {
                  "key": "log.file.path",
                  "value": {
                    "stringValue": "quotes.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            }
          ]
        }
      ]