Palo Alto Network logs: Reduce log size¶

Disclaimer

By using SPL2 templates for data processing (the “templates”), you understand and agree that templates are provided “as is”. Splunk disclaims any and all warranties, express or implied, including without limitation the implied warranties of merchantability, fitness for a particular purpose and warranties arising out of course of dealing or usage of trade or by statute or in law. Splunk specifically does not warrant that templates will meet your requirements, the operation or output of templates will be error-free, accurate, reliable, complete or uninterrupted.

Use case¶

Reduce the size of Palo Alto Network logs by removing unnecessary fields. Extract recommended event fields.

Template details¶

Compatibility¶

This template is compatible with Splunk Add-on for Palo Alto Networks v1.0.1 and v2.0.0.

Template description¶

This is a sample pipeline that reduces the size of Palo Alto Network logs and extracts a few recommended event fields while preserving compatibility with the Splunk Common Information Model (CIM). This pipeline takes data that has a source type starting with “pan:” or “pan_” and then does the following:

Removes unnecessary fields from the logs. You can choose which fields to remove by adjusting the configuration of the custom functions defined in this module. For more information, see the comment titled configuration instructions for the “TRANSFORM” custom functions.
Extracts the host, _time, source, and index fields.

Supported sourcetypes¶

This template partitions by sourcetype matching regex:

/pan[:|_]/i which means that this pipeline processes any sourcetype that starts with “pan:” or “pan_”.

In the pipeline actual sourcetype is extracted based on the event. It supports the following detailed sourcetypes:

pan:system
pan:config
pan:userid
pan:hipmatch
pan:decryption
pan:threat
pan:traffic
pan:correlation
pan:globalprotect

Events that don’t match any of the supported sourcetypes are passed through the pipeline and the sourcetype is not changed.

Template outline¶

Template consists of few custom functions followed by a pipeline that uses these functions.

Functions

The following table shows all functions, including possible configuration options.

Function Name	Description	Configuration options
`_set_event_fields`	This function extracts the following event fields if they are missing: `host`, `index`, `_time`, and `source`. The values of these fields are based on the log contents.	By default, this function sets several recommended destination indexes for the logs if the `index` field is missing. You must ensure these indexes exist in the Splunk platform or change them to existing index names.
`_replace_escaped_comma` (`_replace_escaped_commas`)	These functions replace commas inside quotation marks with `#SEP#`. This temporary change is required for field extractions and is reverted by the `_postprocess` function.	No customization is available. This will be replaced by proper CSV parsing in a future update.
`_preprocess`	This function preprocesses logs to remove unwanted fields. It temporarily replaces escaped commas with `#SEP#`, splits each log by non-escaped commas, and extracts the `host`, `index`, `_time`, and `source` fields.	None
`transform_[type]`	This function groups the previously defined custom functions for processing a specific log `type` (e.g., config logs).	Additional high-level operations may be added. For `traffic` logs, it drops `start` events by default, which can be disabled. See the configuration instructions for more details.
`_pan_[type]_filtering_min`	This function sets a small, default selection of system log fields to empty values.	Fields listed in this function are replaced with empty values. For example, `"{0}" /future_use1/, "",` clears the field at index `0` to save space.
`_pan_[type]_filtering_max`	This function sets a large selection of system log fields to empty, including those from the `_pan_system_filtering_min` function.	Fields listed here are replaced with empty values. For example, `"{17}" /devicegroup_level1/, "",` clears the field at index `17`. This function references the `min` function, so customizations may be needed there.
`_is_pan_sourcetype`	This function checks if the extracted sourcetype value is a Palo Alto Networks source type.	This function lists all currently supported types. More types can be added if support is provided.
`extract_sourcetype`	This function extracts the sourcetype for each log based on the value of the `_log_type` field.	None

Pipeline

The pipeline outline has the following stages:

Extracts sourcetype.
Branch based on sourcetype.
Applies associated transformations function for each branch and just pass-thru for other events.

Configuration instructions¶

Each transform function has two pan_filtering functions available for removing different amounts of fields from the logs. Review each transform function definition and choose which pan_filtering function to use.

Be aware of the following considerations when deciding which pan_filtering function to use:

pan_*_filtering_min - These “min” functions reduce the log size without losing any significant information. Minimally filtered data continues to flow through the pipeline, and when ingested into a Splunk index, the effect on license saving is minimal. When you use this function for filtering, there is no need to preserve a copy of the raw data by sending it to a separate storage (such as Amazon S3).
pan_*_filtering_max - These “max” functions reduce the log size by filtering out any data that is not needed for Splunk security detections. When this filtered data is ingested into a Splunk index, the effect on license saving is significant. When you use this function for filtering, it is strongly recommended that you preserve a copy of the raw data by sending it to a separate storage (such as Amazon S3).

See the following sections for examples.

Configuration example scenarios¶

Scenario 1: Change all my filters to max

To achieve maximum license savings, you can change all your filters to the “max” version. This removes all fields which could be potentially not required for typical security detections. You should preserve a copy of the raw data by sending it to a separate storage (such as Amazon S3).

Perform the following steps to change traffic filtering to max:

For each transform function, change the used filtering function from _pan_*_filtering_min to _pan_*_filtering_max.
Execute the pipeline preview and confirm that reduction rate is higher for all events.
Save the changes.

Scenario 2: Change only traffic filtering to max

Perform the following steps to change traffic filtering to max:

For transform_traffic function, change the used filtering function from _pan_traffic_filtering_min to _pan_traffic_filtering_max.
Execute the pipeline preview and confirm that reduction rate is higher for traffic event.
Save the changes.

Scenario 3: Filter additional fields on top of the default filtering

Perform the following steps to filter additional fields:

Depending on used filtering functions, add the additional fields in the _pan_*_filtering_min or _pan_*_filtering_max functions.
Inside json_set(_ts_arr, add additional fields to be removed. For example, to remove field at index 5, add "{5}" /*field_name*/, "", to the list. Indexes of fields are zero-based and can be checked in the PAN documentation.
Execute the pipeline preview and confirm that the field is removed.
Save the changes.

Scenario 4: Filter less fields

Perform the following steps to filter less fields:

Depending on used filtering functions, make modifications to _pan_*_filtering_min, _pan_*_filtering_max or both functions.
Inside json_set(_ts_arr, remove fields which should not be removed. For example, to keep field at index 5, remove "{5}" /*field_name*/, "", from the list in respective function.
Execute the pipeline preview and confirm that the field is not removed.
Save the changes.