Performance reference for the Splunk Add-on for AWS data inputs¶

Many factors impact throughput performance. The rate at which the Splunk Add-on for AWS ingests input data varies depending on a number of variables: deployment topology, number of keys in a bucket, file size, file compression format, number of events in a file, event size, and hardware and networking conditions.

This section provides measured throughput data achieved under certain operating conditions and draws from the performance testing results some rough conclusions and guidelines on tuning AWS add-on throughput performance. Use the information here as a basis for estimating and optimizing the AWS add-on throughput performance in your own production environment. As performance varies based on user characteristics, application usage, server configurations, and other factors, specific performance results cannot be guaranteed. Consult Splunk Support for accurate performance tuning and sizing.

Reference hardware and software environment¶

The throughput data and conclusions provided here are based on performance testing using Splunk platform instances (dedicated heavy forwarders and indexers) running on the following environment:


Instance type	M4 Double Extra Large (m4.4xlarge)
Memory	64 GB
Compute Units (ECU)	53.5
vCPU	16
Storage (GB)	0 (EBS only)
Arch	64-bit
EBS Optimized (Max Bandwidth)	2000 Mbps
Network performance	High

The following settings are configured in outputs.conf on the heavy forwarder:

useACK = true

maxQueueSize = 15MB

Measured performance data¶

The throughput data provided here is the maximum performance for each single input achieved in performance testing under specific operating conditions and is subject to change when any of the hardware and software variables changes. Use this data as a rough reference only.

Single-input max throughput¶

Data input	Sourcetype	Max throughput (KBs)	Max EPS (events)	Max throughput (GB/day)
Generic S3	aws:elb:accesslogs(plain text, syslog, event size 250B, S3 key size 2MB)	17,000	86,000	1,470
Generic S3	aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB)	11,000	35,000	950
Incremental S3	aws:elb:accesslogs(plain text, syslog, event size 250B, S3 key size 2MB)	11,000	43,000	950
Incremental S3	aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB)	7,000	10,000	600
SQS-Based S3	aws:elb:accesslogs (plain text, syslog, event size 250B, S3 key size 2MB)	12,000	50,000	1,000
SQS-Based S3	aws:elb:accesslogs (gz, syslog, event size 250B, S3 key size 2MB)	24,000	100,000	2,000
SQS-Based S3	aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB)	13,000	19,000	1,100
CloudWatch logs ¹	aws:cloudwatchlog:vpcflow	1,000	6,700	100
CloudWatch (ListMetric, 10,000 metrics)	aws:cloudwatch	240 (Metricss)	NA	NA
CloudTrail	aws:cloudtrail (gz, json, sqs=1000, 9K events/key)	5,000	7,000	400
Kinesis	aws:cloudwatchlog:vpcflow (json, 10 shards)	15,000	125,000	1,200
SQS	aws:sqs (json, event size 2.8K)	N/A	160	N/A

¹ API throttling error occurs if input streams > 1k

Multi-inputs max throughput¶

The following throughput data was measured with multiple inputs configured on a heavy forwarder in an indexer cluster distributed environment.

Note

Configuring more AWS accounts increases CPU usage and lowers throughput performance due to increased API calls. It is recommended that you consolidate AWS accounts when configuring the Splunk Add-on for AWS.

Data input	Sourcetype	Max throughput (KBs)	Max EPS (events)	Max throughput (GB/day)
Generic S3	aws:elb:accesslogs(plain text, syslog, event size 250B, S3 key size 2MB)	23,000	108,000	1,980
Generic S3	aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB)	45,000	130,000	3,880
Incremental S3	aws:elb:accesslogs(plain text, syslog, event size 250B, S3 key size 2MB)	34,000	140,000	2,930
Incremental S3	aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB)	45,000	65,000	3,880
SQS-Based S3¹	aws:elb:accesslogs (plain text, syslog, event size 250B, S3 key size 2MB)	35,000	144,000	3,000
SQS-Based S3¹	aws:elb:accesslogs (gz, syslog, event size 250B, S3 key size 2MB)	42,000	190,000	3,600
SQS-Based S3¹	aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB)	45,000	68,000	3,900
CloudWatch logs	aws:cloudwatchlog:vpcflow	1,000	6,700	100
CloudWatch (ListMetric)	aws:cloudwatch (10,000 metrics)	240 (metrics/s)	NA	NA
CloudTrail	aws:cloudtrail (gz, json, sqs=100, 9K events/key)	20,000	15,000	1,700
Kinesis	aws:cloudwatchlog:vpcflow (json, 10 shards)	18,000	154,000	1,500
SQS	aws:sqs (json, event size 2.8K)	N/A	670	N/A

¹ Performance testing of the SQS-Based S3 input indicates that optimal performance throughput is reached when running four inputs on a single heavy forwarder instance. To achieve higher throughput performance beyond this bottleneck, you can further scale out data collection by creating multiple heavy forwarder instances each configured with up to four SQS-Based S3 inputs to concurrently ingest data by consuming messages from the same SQS queue.

Max inputs benchmark per heavy forwarder¶

The following input number ceiling was measured with multiple inputs configured on a heavy forwarder in an indexer cluster distributed environment, where CPU and memory resources were utilized to their fullest.

If you have a smaller event size, fewer keys per bucket, or more available CPU and memory resources in your environment, you can configure more inputs than the maximum input number indicated in the table.

Data input	Sourcetype	Format	Number of keys/bucket	Event size	Max inputs
S3	aws:s3	zip, syslog	100K	100B	300
S3	aws:cloudtrail	gz, json	1,300K	1KB	30
Incremental S3	aws:cloudtrail	gz, json	1,300K	1KB	20
SQS-Based S3	aws:cloudtrail, aws:config	gz, json	1,000K	1KB	50

Memory usage benchmark for generic S3 inputs¶

Event size	Number of events per key	Total number of keys	Archive type	Number of inputs	Memory used
1K	1,000	10,000	zip	20	20G
1K	1,000	1,000	zip	20	12G
1K	1,000	10,000	zip	10	18G
100B	1,000	10,000	zip	10	15G

Performance tuning and sizing guidelines¶

If you do not achieve the expected AWS data ingestion throughput, follow these steps to tune the throughput performance:

Identify the bottleneck in your system that prevents it from achieving a higher level of throughput performance. The bottleneck in AWS data ingestion may lie in one of the following components:
- The Splunk Add-on for AWS: its capacity to pull in AWS data through API calls
- Heavy forwarder: its capacity to parse and forward data to the indexer tier, which involves the throughput of the parsing, merging, and typing pipelines
- Indexer: the index pipeline throughput To troubleshoot the indexing performance on the heavy forwarder and indexer, see Troubleshooting indexing performance in the Capacity Planning Manual. A chain is as only as strong as its weakest link. The capacity of the bottleneck is the capacity of the entire system as a whole. Only by identifying and tuning the performance of the bottleneck component can you improve the overall system performance.
Tune the performance of the bottleneck component. If the bottleneck lies in heavy forwarders or indexers, see Summary of performance recommendations in the Capacity Planning Manual. If the bottleneck lies in the Splunk Add-on for AWS, adjust the following key factors that usually impact the AWS data input throughput:
- Parallelization settings To achieve optimal throughput performance, you can set the parallelIngestionPipelines value to 2 in server.conf if your resource capacity permits. For information about parallelIngestionPipelines, see Parallelization settings in the Splunk Enterprise Capacity Planning Manual.
- AWS data inputs When there is no shortage of resources, adding more inputs in the add-on increases throughput but it also consumes more memory and CPU. Increase the number of inputs to improve throughput until memory or CPU is running short. If you are using SQS-Based S3 inputs, you can horizontally scale out data collection by configuring more inputs on multiple heavy forwarders to consume messages from the same SQS queue.
- Number of keys in a bucket For both the Generic S3 and Incremental S3 inputs, the number of keys (or objects) in a bucket is a factor that impacts initial data collection performance. The first time a Generic or Incremental S3 input collects data from a bucket, the more keys the bucket contains, the longer time it takes to complete the list operation, and the more memory is consumed. A large number of keys in a bucket require a huge amount of memory for S3 inputs in the initial data collection and limit the number of inputs you can configure in the add-on. If applicable, you can use log file prefix to subset keys in a bucket into smaller groups and configure different inputs to ingest them separately. For information about how to configure inputs to use log file prefix, see Add an S3 input for Splunk Add-on for AWS. For SQS-Based S3 inputs, the number of keys in a bucket is not a primary factor since data collection can be horizontally scaled out based on messages consumed from the same SQS queue.
- File format Compressed files consume much more memory than plain text files.
Once you resolved the bottleneck, see if the improved performance meets your requirements. If not, continue the previous steps to identify the next bottleneck in the system and address it until the expected overall throughput performance is achieved.