UNIX and Linux vmstat logs: Reduce log size and convert to tab-separated key-value pair format¶
Disclaimer
By using SPL2 templates for data processing (the “templates”), you understand and agree that templates are provided “as is”. Splunk disclaims any and all warranties, express or implied, including without limitation the implied warranties of merchantability, fitness for a particular purpose and warranties arising out of course of dealing or usage of trade or by statute or in law. Splunk specifically does not warrant that templates will meet your requirements, the operation or output of templates will be error-free, accurate, reliable, complete or uninterrupted.
Use case¶
Reduce the size of Unix and Linux vmstat
logs by removing unnecessary rows, replacing redundant or irrelevant values, and optimizing log storage. Extract essential fields while maintaining compatibility with the Splunk Common Information Model (CIM).
Version 0.3.1¶
Version 0.3.1 of the UNIX and Linux vmstat logs: Reduce log size and convert to tab-separated key-value pair format
template was released on November 25, 2024.
Template details¶
Compatibility¶
This template is compatible with Splunk Add-on for Unix and Linux v9.2.0 and v10.0.0.
Template description¶
This is a sample pipeline that reduces the size of vmstat
logs coming from the Splunk Add-on for Unix and Linux while preserving compatibility with the Splunk Common Information Model (CIM). This pipeline processes data with the vmstat
source type and performs the following:
- Replaces repeated space characters used as delimiters in the logs with a single tab character (
\t
) or a custom delimiter. - Removes unnecessary rows and replaces redundant or irrelevant values in specific fields.
- Converts the logs into tab-separated values (TSV) format.
- Optionally updates the values of the
source
andsourcetype
event fields.
Supported sourcetype¶
This pipeline only works on complete vmstat
events that include the header row and all subsequent rows.
Template outline¶
The template consists of several custom functions followed by a pipeline that uses these functions.
Functions¶
The following table shows all functions, including possible configuration options.
Function name | Description | Configuration options |
---|---|---|
replace_multiple_whitespaces |
Replaces repeated space characters in the logs with a single tab character (\t ) or a custom delimiter. |
$delimiter : The replacement delimiter to use instead of repeated space characters (default: \t ). |
remove_trailing_zeros |
Removes trailing zeros from decimal values. For example, 10.00 is transformed into 10 . |
No configuration options. |
extract_split_rows |
Splits the logs into rows using the newline character (\n ) as the delimiter and further splits the rows into columns using the tab character (\t ) or a custom delimiter. |
$row_split_separator : The delimiter for splitting rows into columns (default: \t ). |
find_header_indexes |
Assigns a number to each log field. Other functions in this pipeline use these numbers as indexes for accessing the log contents. | No configuration options. |
remove_fields_pgPageIn_pgPageOut |
Removes the fields pgPageIn_PS , pgPageOut , and pgPageOut_PS . |
No configuration options. |
remove_fields_loadAvg1mi_interrupts |
Removes the fields loadAvg1mi and interrupts . |
No configuration options. |
remove_fields_memFreePct_memUsedPct |
Removes the fields memFreePct and memUsedPct . |
No configuration options. |
replace_waitThreads |
Removes the field waitThreads if it has the value 0.00 . |
No configuration options. |
replace_pgSwapOut |
Removes the field pgSwapOut if it has the value 0.00 . |
No configuration options. |
rows_to_delimiter_separated |
Transforms the logs from JSON array format to tab-separated key-value pair format. | $delimiter : The delimiter to use between columns (default: \t ). |
apply_transformations |
Groups the previously defined custom functions together to apply transformations to the logs. | No configuration options. |
update_source_and_source_type |
Appends a suffix to the values in the source and sourcetype fields, if those fields exist in the event. |
$suffix : The suffix to append to the source and sourcetype values. |
Pipeline¶
This pipeline has the following outline:
- Replaces repeated space characters with tabs using the
replace_multiple_whitespaces
function. - Removes trailing zeros from numeric fields using the
remove_trailing_zeros
function. - Splits the logs into rows and columns using the
extract_split_rows
function. - Assigns indexes to the log fields using the
find_header_indexes
function. - Removes specific fields using the following functions:
-
remove_fields_pgPageIn_pgPageOut
-remove_fields_loadAvg1mi_interrupts
-remove_fields_memFreePct_memUsedPct
- Replaces redundant or irrelevant values in specific fields using the following functions:
-
replace_waitThreads
-replace_pgSwapOut
- Transforms the logs into tab-separated key-value pair format using the
rows_to_delimiter_separated
function. - Optionally updates the
source
andsourcetype
fields using theupdate_source_and_source_type
function. - Sends the transformed logs to the destination.
Configuration instructions¶
For significant license savings, use the remove_fields_*
and replace_*
functions to remove unnecessary fields and replace redundant or irrelevant values in specific fields. You can comment out any of these functions you do not want to apply by prefixing it with //
, allowing you to retain specific data that might otherwise be removed or modified.
Optionally, use the update_source_and_source_type
function to append a suffix to the source
and sourcetype
fields, helping to distinguish modified logs from unmodified ones.
Notes¶
- Ensure that the
vmstat
source type is properly configured in your Splunk deployment. - Customize the pipeline as needed to meet your specific requirements.
- If using the
update_source_and_source_type
function, ensure that the suffixed source type is configured in your Splunk platform deployment with appropriate time extraction and line-breaking settings.