Performance reference for the Splunk Add-on for CrowdStrike¶
This page provides reference information about Splunk’s performance testing for the Splunk Add-on for CrowdStrike.
Performance results should be used as reference information and do not represent performance in all environments. Many factors impact performance results, including:
- file size
- file compression
- event size
- deployment architecture
- hardware
When preparing instances to ingest CrowdStrike data, consider Compute Optimized Instances, as ingestion is mostly CPU intensive and prefers higher clock CPUs. For each pipeline reserve 4-6 cores. Ingesting bigger volumes of CrowdStrike data also requires more indexer resources, one ingestor pipeline is sufficient to feed up to three indexing pipelines.
To increase the number of ingest pipelines in Splunk Cloud Victoria, contact the Splunk Cloud support team to request an exception.
Hardware and software environment¶
The throughput data and conclusions provided in this topic are based on performance testing using the Add ModInput functionality.
Instance type | C5 Double Extra Large (c5.4xlarge) |
---|---|
Memory | 32 GB |
Compute Units (ECU) | 73 |
vCPU | 16 |
Measured performance data using HF/IDM¶
The data throughput is based on the S3 key size along with events and number of configured SQS-Based S3 inputs configured on heavy forwarders and input data managers.
Data input | Compressed Batch size (GB) | HF/IDM number | inputs per HF/IDM | Managed S3 consumers | Indexer | Batch processing time [min] | Max throughput (KBs) | Max throughput (GB/day) |
---|---|---|---|---|---|---|---|---|
SQS-based S3 | 2.2 | 1 | 1 | - | 3 | 20 | 50,000 | 4,100 |
SQS based manager | 2.2 | 1 | 1 | 1 | 3 | 37 | 28,000 | 2,300 |
SQS based manager | 2.2 | 1 | 1 | 2 | 3 | 20 | 51,000 | 4,200 |
SQS based manager | 2.2 | 1 | 1 | 4 | 3 | 11 | 92,000 | 7,600 |
SQS based manager | 2.2 | 1 | 1 | 8 | 3 | 8 | 134,500 | 11,000 |
Configure parallelism on data collection nodes for SQS-based S3 input¶
To ensure that your solution scales properly, set the
parallelIngestionPipelines
number to match the number of SQS-based S3
inputs per HF/IDM/SH. For example, if each HF/IDM/SH has four inputs,
then you set parallelIngestionPipelines = 4 in
$SPLUNK_HOME/etc/system/local/server.conf.
Adding additional inputs will not change the performance if the parameter is set lower. By default this parameter is set to 1, and you should not add more than a single SQS-based S3 input per heavy forwarder or IDM with the default parallelism configuration. Each instance is managed separately, so check your desired configuration on all data collection nodes.
For Splunk Cloud Platform Victoria, search heads serve as heavy
forwarders. However, input replication for Splunk Cloud version 8.2.2201
and later uses a different input replication process. CrowdStrike FDR
SQS-based S3 consumer inputs are configured globally and replicated for
each member in the search-head cluster. You do not need to manage each
search head separately. To add more than a single input, contact Cloud
Support to increase parallelIngestionPipelines
on search heads.
To increase the number of ingest pipelines in Splunk Cloud Victoria, contact the Splunk Cloud support team to request an exception.
When a single SQS message consists of multiple files, a single mod input processes them sequentially where one input is in use even if others are idle. In this case, use instances with better clock speed. CrowdStrike creates a new event batch every 7-10 minutes. In heavy loaded environments, event batches with hundreds of files is common. A batch containing for example 300-400 files can be processed for 4-7 hours, meaning that one input will be busy during this time and not capable of ingesting the next generated batch. Make sure you have enough inputs to start processing the next batch as it appears.
Configure parallelism on data collection nodes for SQS based manager input¶
With SQS based manager input, parallelism is achieved by adding more
managed consumer inputs. For any parallelIngestionPipelines
configuration on the Ingestion Node, you can run a single SQS based
manager, but you can configure this with multiple consumers. You should
configure at least two consumers and to add more to match the number of
configured parallelIngestionPipelines.
Scaling forwarders horizontally¶
Add more forwarders to safely increase the throughput in a heavy-load environment. To achieve daily throughput above 10TB per day, configure two heavy forwarders with four inputs on each instance. Processing on the indexers could create a bottleneck; forwarding data to at least six indexers will achieve better performance. If adding the new Ingestion node does not increase the whole stack throughput or other nodes have lower throughput per instance, this most likely means there are not enough indexers in the stack.
Cloud Stack recommendations¶
For Enterprise Cloud Platform search head instances, CPU processing is shared between search and ingestion and more resources are reserved for search. To achieve throughput of 10TB/day without impacting search experience, create at least six search heads (c5.4xlarge) and six indexers (i3.8xlarge). When using a Splunk Classic stack with a single IDM, use at least an c6i.12xlarge instance with 8 inputs and parallelisation to achieve similar throughput.
Enabling index time host resolution¶
Depending on data, host resolution on index time can slow data ingestion by 5-35%. Performing host resolution at index time consumes CPU on the forwarder. Monitor your data ingestion before turning on this feature, and configure Crowdstrike FDR host information sync if your ingestor node reaches CPU limits. In case of any delays during ingestion, consider upgrading your setup or turning off this feature.\