Skip to content

Palo Alto Network logs: Reduce log size

Disclaimer

By using SPL2 templates for data processing (the “templates”), you understand and agree that templates are provided “as is”. Splunk disclaims any and all warranties, express or implied, including without limitation the implied warranties of merchantability, fitness for a particular purpose and warranties arising out of course of dealing or usage of trade or by statute or in law. Splunk specifically does not warrant that templates will meet your requirements, the operation or output of templates will be error-free, accurate, reliable, complete or uninterrupted.

Use case

Reduce the size of Palo Alto Network logs by removing unnecessary fields. Extract recommended event fields.

Template details

Compatibility

This template is compatible with Splunk Add-on for Palo Alto Networks v1.0.1 and v2.0.0.

Template description

This is a sample pipeline that reduces the size of Palo Alto Network logs and extracts a few recommended event fields while preserving compatibility with the Splunk Common Information Model (CIM). This pipeline takes data that has a source type starting with “pan:” or “pan_” and then does the following:

  1. Removes unnecessary fields from the logs. You can choose which fields to remove by adjusting the configuration of the custom functions defined in this module. For more information, see the comment titled configuration instructions for the “TRANSFORM” custom functions.
  2. Extracts the host, _time, source, and index fields.

Supported sourcetypes

This template partitions by sourcetype matching regex:

  • /pan[:|_]/i which means that this pipeline processes any sourcetype that starts with “pan:” or “pan_”.

In the pipeline actual sourcetype is extracted based on the event. It supports the following detailed sourcetypes:

  • pan:system
  • pan:config
  • pan:userid
  • pan:hipmatch
  • pan:decryption
  • pan:threat
  • pan:traffic
  • pan:correlation
  • pan:globalprotect

Events that don’t match any of the supported sourcetypes are passed through the pipeline and the sourcetype is not changed.

Template outline

Template consists of few custom functions followed by a pipeline that uses these functions.

Functions

The following table shows all functions, including possible configuration options.

Function Name Description Configuration options
_set_event_fields This function extracts the following event fields if they are missing: host, index, _time, and source. The values of these fields are based on the log contents. By default, this function sets several recommended destination indexes for the logs if the index field is missing. You must ensure these indexes exist in the Splunk platform or change them to existing index names.
_replace_escaped_comma (_replace_escaped_commas) These functions replace commas inside quotation marks with #SEP#. This temporary change is required for field extractions and is reverted by the _postprocess function. No customization is available. This will be replaced by proper CSV parsing in a future update.
_preprocess This function preprocesses logs to remove unwanted fields. It temporarily replaces escaped commas with #SEP#, splits each log by non-escaped commas, and extracts the host, index, _time, and source fields. None
transform_[type] This function groups the previously defined custom functions for processing a specific log type (e.g., config logs). Additional high-level operations may be added. For traffic logs, it drops start events by default, which can be disabled. See the configuration instructions for more details.
_pan_[type]_filtering_min This function sets a small, default selection of system log fields to empty values. Fields listed in this function are replaced with empty values. For example, "{0}" /*future_use1*/, "", clears the field at index 0 to save space.
_pan_[type]_filtering_max This function sets a large selection of system log fields to empty, including those from the _pan_system_filtering_min function. Fields listed here are replaced with empty values. For example, "{17}" /*devicegroup_level1*/, "", clears the field at index 17. This function references the min function, so customizations may be needed there.
_is_pan_sourcetype This function checks if the extracted sourcetype value is a Palo Alto Networks source type. This function lists all currently supported types. More types can be added if support is provided.
extract_sourcetype This function extracts the sourcetype for each log based on the value of the _log_type field. None

Pipeline

The pipeline outline has the following stages:

  1. Extracts sourcetype.
  2. Branch based on sourcetype.
  3. Applies associated transformations function for each branch and just pass-thru for other events.

Configuration instructions

Each transform function has two pan_filtering functions available for removing different amounts of fields from the logs. Review each transform function definition and choose which pan_filtering function to use.

Be aware of the following considerations when deciding which pan_filtering function to use:

  • pan_*_filtering_min - These “min” functions reduce the log size without losing any significant information. Minimally filtered data continues to flow through the pipeline, and when ingested into a Splunk index, the effect on license saving is minimal. When you use this function for filtering, there is no need to preserve a copy of the raw data by sending it to a separate storage (such as Amazon S3).
  • pan_*_filtering_max - These “max” functions reduce the log size by filtering out any data that is not needed for Splunk security detections. When this filtered data is ingested into a Splunk index, the effect on license saving is significant. When you use this function for filtering, it is strongly recommended that you preserve a copy of the raw data by sending it to a separate storage (such as Amazon S3).

See the following sections for examples.

Configuration example scenarios

Scenario 1: Change all my filters to max

To achieve maximum license savings, you can change all your filters to the “max” version. This removes all fields which could be potentially not required for typical security detections. You should preserve a copy of the raw data by sending it to a separate storage (such as Amazon S3).

Perform the following steps to change traffic filtering to max:

  1. For each transform function, change the used filtering function from _pan_*_filtering_min to _pan_*_filtering_max.
  2. Execute the pipeline preview and confirm that reduction rate is higher for all events.
  3. Save the changes.

Scenario 2: Change only traffic filtering to max

Perform the following steps to change traffic filtering to max:

  1. For transform_traffic function, change the used filtering function from _pan_traffic_filtering_min to _pan_traffic_filtering_max.
  2. Execute the pipeline preview and confirm that reduction rate is higher for traffic event.
  3. Save the changes.

Scenario 3: Filter additional fields on top of the default filtering

Perform the following steps to filter additional fields:

  1. Depending on used filtering functions, add the additional fields in the _pan_*_filtering_min or _pan_*_filtering_max functions.
  2. Inside json_set(_ts_arr, add additional fields to be removed. For example, to remove field at index 5, add "{5}" /*field_name*/, "", to the list. Indexes of fields are zero-based and can be checked in the PAN documentation.
  3. Execute the pipeline preview and confirm that the field is removed.
  4. Save the changes.

Scenario 4: Filter less fields

Perform the following steps to filter less fields:

  1. Depending on used filtering functions, make modifications to _pan_*_filtering_min, _pan_*_filtering_max or both functions.
  2. Inside json_set(_ts_arr, remove fields which should not be removed. For example, to keep field at index 5, remove "{5}" /*field_name*/, "", from the list in respective function.
  3. Execute the pipeline preview and confirm that the field is not removed.
  4. Save the changes.