UNIX and Linux df logs: Reduce log size and convert to TSV format¶

Disclaimer

By using SPL2 templates for data processing (the “templates”), you understand and agree that templates are provided “as is”. Splunk disclaims any and all warranties, express or implied, including without limitation the implied warranties of merchantability, fitness for a particular purpose and warranties arising out of course of dealing or usage of trade or by statute or in law. Splunk specifically does not warrant that templates will meet your requirements, the operation or output of templates will be error-free, accurate, reliable, complete or uninterrupted.

Use case¶

Reduce the size of Unix and Linux df logs by removing unnecessary fields while maintaining compatibility with the Splunk Common Information Model (CIM).

Version 0.4.1¶

Version 0.4.1 of the UNIX and Linux df logs: Reduce log size and convert to TSV format template was released on March 19, 2025.

Template details¶

Compatibility¶

This template is compatible with Splunk Add-on for Unix and Linux v9.2.0 and v10.0.0.

Template description¶

This is a sample pipeline that reduces the size of df logs coming from the Splunk Add-on for Unix and Linux while preserving compatibility with the Splunk Common Information Model (CIM). This pipeline processes data with the df source type and performs the following:

Removes unnecessary data from the logs.
Optionally updates the values of the source and sourcetype event fields.

Supported sourcetype¶

This pipeline only works on complete df events that include the header row and all subsequent rows.

Template outline¶

The template consists of several custom functions followed by a pipeline that uses these functions.

Functions¶

The following table shows all functions, including possible configuration options.

Function name	Description	Configuration options
`extract_split_rows`	Splits the logs into rows using the newline character (`\n`) as the delimiter and further splits the rows into columns using the tab character (`\t`) or a custom delimiter.	`$row_split_separator`: The delimiter for splitting rows into columns (default: `\t`).
`find_header_indexes`	Assigns a number to each log field. Other functions in this pipeline use these numbers as indexes for accessing the log contents.	No configuration options.
`rows_to_delimiter_separated`	Transforms the logs from JSON array format to TSV format.	`$delimiter`: The delimiter to use between columns (default: `\t`).
`remove_used_column`	Removes the “Used” column from the final output.	No configuration options.
`apply_transformations`	Groups the previously defined custom functions together to apply transformations to the logs.	No configuration options.
`update_source_and_source_type`	Appends a suffix to the values in the `source` and `sourcetype` fields, if those fields exist in the event.	`$suffix`: The suffix to append to the `source` and `sourcetype` values.

Pipeline¶

The pipeline has the following outline:

Splits the logs into rows and columns using the extract_split_rows function.
Assigns indexes to the log fields using the find_header_indexes function.
Removes the “Used” column using the remove_used_column function.
Transforms the logs into TSV format using the rows_to_delimiter_separated function.
Optionally updates the source and sourcetype fields using the update_source_and_source_type function.
Sends the transformed logs to the destination.

Configuration instructions¶

For significant license savings, use remove_used_column functions to remove unnecessary fields. You can comment out this function if you do not want to apply it, by prefixing it with //, allowing you to retain specific data that might otherwise be removed or modified.

Optionally, use the update_source_and_source_type function to append a suffix to the source and sourcetype fields, helping to distinguish modified logs from unmodified ones.

Notes¶

Ensure that the df source type is properly configured in your Splunk deployment.
Customize the pipeline as needed to meet your specific requirements.
If using the update_source_and_source_type function, ensure that the suffixed source type is configured in your Splunk platform deployment with appropriate time extraction and line-breaking settings.