Performance reference for the Splunk Add-on for CrowdStrike¶

This page provides reference information about Splunk’s performance testing for the Splunk Add-on for CrowdStrike.

Performance results should be used as reference information and do not represent performance in all environments. Many factors impact performance results, including:

file size
file compression
event size
deployment architecture
hardware

When preparing instances to ingest CrowdStrike data, consider Compute Optimized Instances, as ingestion is mostly CPU intensive and prefers higher clock CPUs. For each pipeline reserve 4-6 cores. Ingesting bigger volumes of CrowdStrike data also requires more indexer resources, one ingestor pipeline is sufficient to feed up to three indexing pipelines.

To increase the number of ingest pipelines in Splunk Cloud Victoria, contact the Splunk Cloud support team to request an exception.

Hardware and software environment¶

The throughput data and conclusions provided in this topic are based on performance testing using the Add ModInput functionality.

Instance type	C5 Double Extra Large (c5.4xlarge)
Memory	32 GB
Compute Units (ECU)	73
vCPU	16

Measured performance data using HF/IDM¶

The data throughput is based on the S3 key size along with events and number of configured SQS-Based S3 inputs configured on heavy forwarders and input data managers.

Data input	Compressed Batch size (GB)	HF/IDM number	inputs per HF/IDM	Managed S3 consumers	Indexer	Batch processing time [min]	Max throughput (KBs)	Max throughput (GB/day)
SQS-based S3	2.2	1	1	-	3	20	50,000	4,100
SQS based manager	2.2	1	1	1	3	37	28,000	2,300
SQS based manager	2.2	1	1	2	3	20	51,000	4,200
SQS based manager	2.2	1	1	4	3	11	92,000	7,600
SQS based manager	2.2	1	1	8	3	8	134,500	11,000

Configure parallelism on data collection nodes for SQS-based S3 input¶

To ensure that your solution scales properly, set the parallelIngestionPipelines number to match the number of SQS-based S3 inputs per HF/IDM/SH. For example, if each HF/IDM/SH has four inputs, then you set parallelIngestionPipelines = 4 in $SPLUNK_HOME/etc/system/local/server.conf.

Adding additional inputs does not change the performance if the parameter is set lower. By default this parameter is set to 1, and you should not add more than a single SQS-based S3 input per heavy forwarder or IDM with the default parallelism configuration. Each instance is managed separately, so check your desired configuration on all data collection nodes.

For Splunk Cloud Platform Victoria, search heads serve as heavy forwarders. However, input replication for Splunk Cloud version 8.2.2201 and later uses a different input replication process. CrowdStrike FDR SQS-based S3 consumer inputs are configured globally and replicated for each member in the search-head cluster. You do not need to manage each search head separately. To add more than a single input, contact Cloud Support to increase parallelIngestionPipelines on search heads.

Note

To increase the number of ingest pipelines in Splunk Cloud Victoria, contact the Splunk Cloud support team to request an exception.

When a single SQS message consists of multiple files, a single mod input processes them sequentially where one input is in use even if others are idle. In this case, use instances with better clock speed. CrowdStrike creates a new event batch every 7-10 minutes. In heavy loaded environments, event batches with hundreds of files is common. A batch containing for example 300-400 files can be processed for 4-7 hours, meaning that one input is busy during this time and not capable of ingesting the next generated batch. Make sure you have enough inputs to start processing the next batch as it appears.

Configure parallelism on data collection nodes for SQS based manager input¶

With SQS based manager input, parallelism is achieved by adding more managed consumer inputs. For any parallelIngestionPipelines configuration on the Ingestion Node, you can run a single SQS based manager, but you can configure this with multiple consumers. You should configure at least two consumers and to add more to match the number of configured parallelIngestionPipelines.

Scaling forwarders horizontally¶

Add more forwarders to safely increase the throughput in a heavy-load environment. To achieve daily throughput above 10TB per day, configure two heavy forwarders with four inputs on each instance. Processing on the indexers could create a bottleneck; forwarding data to at least six indexers achieves better performance. If adding the new Ingestion node does not increase the whole stack throughput or other nodes have lower throughput per instance, this most likely means there are not enough indexers in the stack.

Cloud Stack recommendations¶

For Enterprise Cloud Platform search head instances, CPU processing is shared between search and ingestion and more resources are reserved for search. To achieve throughput of 10TB/day without impacting search experience, create at least six search heads (c5.4xlarge) and six indexers (i3.8xlarge). When using a Splunk Classic stack with a single IDM, use at least an c6i.12xlarge instance with 8 inputs and parallelisation to achieve similar throughput.

Enabling index time host resolution¶

Depending on data, host resolution on index time can slow data ingestion by 5-35%. Performing host resolution at index time consumes CPU on the forwarder. Monitor your data ingestion before turning on this feature, and configure CrowdStrike FDR host information sync if your ingestor node reaches CPU limits. In case of any delays during ingestion, consider upgrading your setup or turning off this feature.