Skip to content

UNIX and Linux cpu logs: Reduce log size and convert to TSV format

Disclaimer

By using SPL2 templates for data processing (the “templates”), you understand and agree that templates are provided “as is”. Splunk disclaims any and all warranties, express or implied, including without limitation the implied warranties of merchantability, fitness for a particular purpose and warranties arising out of course of dealing or usage of trade or by statute or in law. Splunk specifically does not warrant that templates will meet your requirements, the operation or output of templates will be error-free, accurate, reliable, complete or uninterrupted.

Use case

Reduce the size of Unix and Linux CPU logs by removing unnecessary fields and converting logs into a tab-separated values (TSV) format while maintaining compatibility with the Splunk Common Information Model (CIM).

Version 0.4.2

Version 0.4.2 of the UNIX and Linux cpu logs: Reduce log size and convert to TSV format template was released on March 19, 2025.


Template details

Compatibility

This template is compatible with Splunk Add-on for Unix and Linux v9.2.0 and v10.0.0.

Template description

This is a sample pipeline that reduces the size of cpu logs coming from the Splunk Add-on for Unix and Linux while preserving compatibility with the Splunk Common Information Model (CIM). This pipeline takes data that has the cpu source type and then does the following:

  1. Replaces the repeated space characters used as delimiters in the logs with a different string. By default, a tab character (\t) is used.
  2. Removes unnecessary data from the logs.
  3. Converts the logs into tab-separated values (TSV) format.
  4. Optionally updates the values of the source and sourcetype event fields.

Supported sourcetype

This pipeline only works on complete cpu events that include the header row and all subsequent rows.

Template outline

The template consists of several custom functions followed by a pipeline that uses these functions.

Functions

The following table shows all functions, including possible configuration options.

Function name Description Configuration options
replace_multiple_whitespaces This function changes the delimiter in the logs from repeated space characters to a single tab character (\t). You can choose a different replacement delimiter by configuring the $delimiter parameter in the function definition. $delimiter: The replacement delimiter to use instead of repeated space characters (default: \t).
remove_trailing_zeros This function removes trailing zeros from decimal values in the logs. For example, 10.00 is transformed into 10. No configuration options.
extract_split_rows This function splits the logs into rows using the newline character (\n) as the delimiter and further splits the rows into columns using the tab character (\t) or a custom delimiter. $row_split_separator: The delimiter for splitting rows into columns (default: \t).
find_header_indexes This function assigns a number to each log field. Other functions in this pipeline use these numbers as indexes for accessing the log contents. No configuration options.
drop_events_pctIdle_greater_than_30 This function removes rows where the value of the pctIdle field is greater than or equal to 30. No configuration options.
drop_events_pctIdle_greater_than_50 This function removes rows where the value of the pctIdle field is greater than or equal to 50. No configuration options.
drop_events_CPU_not_equal_to_all This function removes rows where the value of the CPU field is not “all” or “CPU”. No configuration options.
rows_to_delimiter_separated This function transforms the logs from JSON array format to TSV format. $delimiter: The delimiter to use between columns (default: \t).
update_source_and_source_type This function appends a suffix to the values in the source and sourcetype fields, if those fields exist in the event. $suffix: The suffix to append to the source and sourcetype values.

Pipeline

The pipeline has the following outline:

  1. Replaces delimiters using the replace_multiple_whitespaces function.
  2. Removes trailing zeros using the remove_trailing_zeros function.
  3. Optionally updates the source and sourcetype fields using the update_source_and_source_type function.
  4. Sends the transformed logs to the destination.

Configuration instructions

Log reduction options

Minimal filtering

For minimal filtering, you can use drop_events_CPU_not_equal_to_all function which is enabled by default. This approach ensures that minimally filtered data continues to flow through the pipeline with minimal impact on license usage.

Maximal filtering

For significant license savings, use the drop_events_pctIdle_greater_than_50,drop_events_pctIdle_greater_than_30 functions to remove unnecessary rows from the logs. You can comment out any of these functions you do not want to apply by prefixing it with //, allowing you to retain specific data that might otherwise be removed or modified.

Optionally, use the update_source_and_source_type function to append a suffix to the source and sourcetype fields, helping to distinguish modified logs from unmodified ones.


Notes

  • Ensure that the cpu source type is properly configured in your Splunk deployment.
  • Customize the pipeline as needed to meet your specific requirements.
  • If using the update_source_and_source_type function, ensure that the suffixed source type is configured in your Splunk platform deployment with appropriate time extraction and line-breaking settings.