Palo Alto Network logs: Reduce log size¶
Disclaimer: BY USING SPL2 TEMPLATES FOR DATA PROCESSING (THE “TEMPLATES”), YOU UNDERSTAND AND AGREE THAT TEMPLATES ARE PROVIDED “AS IS”. SPLUNK DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND WARRANTIES ARISING OUT OF COURSE OF DEALING OR USAGE OF TRADE OR BY STATUTE OR IN LAW. SPLUNK SPECIFICALLY DOES NOT WARRANT THAT TEMPLATES WILL MEET YOUR REQUIREMENTS, THE OPERATION OR OUTPUT OF TEMPLATES WILL BE ERROR-FREE, ACCURATE, RELIABLE, COMPLETE OR UNINTERRUPTED.
Use case¶
Reduce the size of Palo Alto Network logs by removing unnecessary fields. Then, extract recommended event fields.
Template details¶
Compatibility¶
This template is compatible with Splunk Add-on for Palo Alto Networks v1.0.1 and v2.0.0.
Template description¶
This is a sample pipeline that reduces the size of Palo Alto Network logs and extracts a few recommended event fields while preserving compatibility with the Splunk Common Information Model (CIM). This pipeline takes data that has a source type starting with “pan:” or “pan_” and then does the following:
- Removes unnecessary fields from the logs. You can choose which fields to remove by adjusting the configuration of the custom functions defined in this module. For more information, see the comment titled configuration instructions for the “TRANSFORM” custom functions.
- Extracts the host, _time, source, and index fields.
Supported sourcetypes¶
This template partitions by sourcetype matching regex:
/pan[:|_]/i
which means that this pipeline processes any sourcetype that starts with “pan:” or “pan_”.
In the pipeline actual sourcetype is extracted based on the event. It supports following detailed sourcetypes: - pan:system - pan:config - pan:userid - pan:hipmatch - pan:decryption - pan:threat - pan:traffic - pan:correlation - pan:globalprotect
Events not matching any of the above sourcetypes are passed through the pipeline and the sourcetype is not changed.
Template outline¶
Template consists of few custom functions followed by a pipeline that uses these functions.
Functions¶
The following table shows all functions, including possible configuration options.
Function name | Description | Configuration options |
---|---|---|
_set_event_fields |
This function extracts the following event fields if they are missing: host, index, _time, and source. The values of those fields are based on the log contents. | By default, this function sets several recommended destination indexes for the logs if the index field is missing. You must either make sure that these indexes exist in the Splunk platform before you try to send the logs to them, or change the recommended indexes to the names of different indexes that exist. |
_replace_escaped_comma (_replace_escaped_commas ) |
These functions replaces commas inside quotation marks with #SEP#. This is a temporary change that is required to support field extractions and removals, and it will be reverted by the _postprocess custom function that is defined later in the pipeline. | No customization option, in future this will be replaced by proper CSV parsing. |
_preprocess |
This function preprocesses the logs so that unwanted fields can be removed from them. It temporarily replaces escaped commas with #SEP#, splits each log into a collection of values using non-escaped commas as the delimiter, and extracts these event fields: host, index, _time, and source. | - |
transform_[type] - where type is one of supported sourcetypes. |
This function groups together the previously defined custom functions for processing config logs. | In these functions some additional high level operations may be added. For traffic it drops start events, which can be disabled. See more in configuration instructions section. |
_pan_[type]_filtering_min |
This function sets a small selection of system log fields to empty values. (default proposition) | The fields listed in this function are replaced with empty values in the event’s array. For example, the following field is removed: "{0}" /*future_use1*/, "", . This means that the field at index 0 is cleared to save space. |
_pan_[type]_filtering_max |
This function sets a large selection of system log fields to empty, including the fields that are emptied out by the _pan_system_filtering_min function. | The fields listed in this function are replaced with empty values in the event’s array. For example, the following field is removed: "{17}" /*devicegroup_level1*/, "", . This means that the field at index 17 is cleared to save space. It refers the min function, so you must customize some of the fields there. |
_is_pan_sourcetype |
This function checks whether or not the extracted sourcetype value is a Palo Alto Network source type. | This function lists curently supported types. More types can be added if support is provided. |
extract_sourcetype |
This function extracts the sourcetype for each log based on the value of the field _log_type . |
Pipeline¶
The pipeline outline has the following stages:
- extract sourcetype
- branch based on sourcetype
- apply associated transformations function for each branch and just pass-thru for other events
Configuration instructions¶
Each transform
function has two pan_filtering
functions available for removing different amounts of fields from the
logs. Review each transform
function definition and choose which pan_filtering
function to use.
Be aware of the following considerations when deciding which pan_filtering
function to use:
pan_*_filtering_min
- These “min” functions reduce the log size without losing any significant information. Minimally filtered data continues to flow through the pipeline, and when ingested into a Splunk index, the effect on license saving is minimal. When you use this function for filtering, there is no need to preserve a copy of the raw data by sending it to a separate storage (such as Amazon S3).pan_*_filtering_max
- These “max” functions reduce the log size by filtering out any data that is not needed for Splunk security detections. When this filtered data is ingested into a Splunk index, the effect on license saving is significant. When you use this function for filtering, it is strongly recommended that you preserve a copy of the raw data by sending it to a separate storage (such as Amazon S3).
See the following sections for examples.
Configuration example scenarios¶
Scenario 1: Change all my filters to max¶
To achieve maximum license savings, you can change all your filters to the “max” version. This removes all fields which could be potentially not required for typical security detections. You should preserve a copy of the raw data by sending it to a separate storage (such as Amazon S3).
Perform the following steps to change traffic filtering to max:
- For each transform function, change the used filtering function from
_pan_*_filtering_min
to_pan_*_filtering_max
. - Execute the pipeline preview and confirm that reduction rate is higher for all events.
- Save the changes.
Scenario 2: Change only traffic filtering to max¶
Perform the following steps to change traffic filtering to max:
- For
transform_traffic
function, change the used filtering function from_pan_traffic_filtering_min
to_pan_traffic_filtering_max
. - Execute the pipeline preview and confirm that reduction rate is higher for traffic event.
- Save the changes.
Scenario 3: Filter additional fields on top of the default filtering¶
Perform the following steps to filter additional fields:
- Depending on used filtering functions, add the additional fields in the
_pan_*_filtering_min
or_pan_*_filtering_max
functions. - Inside
json_set(_ts_arr,
add additional fields to be removed. For example, to remove field at index 5, add"{5}" /*field_name*/, "",
to the list. Indexes of fields are zero-based and can be checked in the PAN documentation. - Execute the pipeline preview and confirm that the field is removed.
- Save the changes.
Scenario 4: Filter less fields¶
Perform the following steps to filter less fields:
- Depending on used filtering functions, make modifications to
_pan_*_filtering_min
,_pan_*_filtering_max
or both functions. - Inside
json_set(_ts_arr,
remove fields which should not be removed. For example, to keep field at index 5, remove"{5}" /*field_name*/, "",
from the list in respective function. - Execute the pipeline preview and confirm that the field is not removed.
- Save the changes.