Performance reference for the Splunk Add-on for AWS data inputs¶
Many factors impact throughput performance. The rate at which the Splunk Add-on for AWS ingests input data varies depending on a number of variables: deployment topology, number of keys in a bucket, file size, file compression format, number of events in a file, event size, and hardware and networking conditions.
This section provides measured throughput data achieved under certain operating conditions and draws from the performance testing results some rough conclusions and guidelines on tuning AWS add-on throughput performance. Use the information here as a basis for estimating and optimizing the AWS add-on throughput performance in your own production environment. As performance varies based on user characteristics, application usage, server configurations, and other factors, specific performance results cannot be guaranteed. Consult Splunk Support for accurate performance tuning and sizing.
Reference hardware and software environment¶
The throughput data and conclusions provided here are based on performance testing using Splunk platform instances (dedicated heavy forwarders and indexers) running on the following environment:
Instance type | M4 Double Extra Large (m4.4xlarge) |
Memory | 64 GB |
Compute Units (ECU) | 53.5 |
vCPU | 16 |
Storage (GB) | 0 (EBS only) |
Arch | 64-bit |
EBS Optimized (Max Bandwidth) | 2000 Mbps |
Network performance | High |
The following settings are configured in outputs.conf
on the heavy forwarder:
useACK = true
maxQueueSize = 15MB
Measured performance data¶
The throughput data provided here is the maximum performance for each single input achieved in performance testing under specific operating conditions and is subject to change when any of the hardware and software variables changes. Use this data as a rough reference only.
Single-input max throughput¶
Data Input | Sourcetype | Max Throughput (KBs) | Max EPS (events) | Max Throughput (GB/day) |
---|---|---|---|---|
Generic S3 | aws:elb:accesslogs(plain text, syslog, event size 250B, S3 key size 2MB) | 17,000 | 86,000 | 1,470 |
Generic S3 | aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB) | 11,000 | 35,000 | 950 |
Incremental S3 | aws:elb:accesslogs(plain text, syslog, event size 250B, S3 key size 2MB) | 11,000 | 43,000 | 950 |
Incremental S3 | aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB) | 7,000 | 10,000 | 600 |
SQS-based S3 | aws:elb:accesslogs (plain text, syslog, event size 250B, S3 key size 2MB) | 12,000 | 50,000 | 1,000 |
SQS-based S3 | aws:elb:accesslogs (gz, syslog, event size 250B, S3 key size 2MB) | 24,000 | 100,000 | 2,000 |
SQS-based S3 | aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB) | 13,000 | 19,000 | 1,100 |
CloudWatch logs 1 | aws:cloudwatchlog:vpcflow | 1,000 | 6,700 | 100 |
CloudWatch (ListMetric, 10,000 metrics) | aws:cloudwatch | 240 (Metricss) | NA | NA |
CloudTrail | aws:cloudtrail (gz, json, sqs=1000, 9K events/key) | 5,000 | 7,000 | 400 |
Kinesis | aws:cloudwatchlog:vpcflow (json, 10 shards) | 15,000 | 125,000 | 1,200 |
SQS | aws:sqs (json, event size 2.8K) | N/A | 160 | N/A |
Multi-inputs max throughput¶
The following throughput data was measured with multiple inputs configured on a heavy forwarder in an indexer cluster distributed environment.
Configuring more AWS accounts increases CPU usage and lowers throughput performance due to increased API calls. It is recommended that you consolidate AWS accounts when configuring the Splunk Add-on for AWS.
Data Input | Sourcetype | Max Throughput (KBs) | Max EPS (events) | Max Throughput (GB/day) |
---|---|---|---|---|
Generic S3 | aws:elb:accesslogs(plain text, syslog, event size 250B, S3 key size 2MB) | 23,000 | 108,000 | 1,980 |
Generic S3 | aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB) | 45,000 | 130,000 | 3,880 |
Incremental S3 | aws:elb:accesslogs(plain text, syslog, event size 250B, S3 key size 2MB) | 34,000 | 140,000 | 2,930 |
Incremental S3 | aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB) | 45,000 | 65,000 | 3,880 |
SQS-based S31 | aws:elb:accesslogs (plain text, syslog, event size 250B, S3 key size 2MB) | 35,000 | 144,000 | 3,000 |
SQS-based S31 | aws:elb:accesslogs (gz, syslog, event size 250B, S3 key size 2MB) | 42,000 | 190,000 | 3,600 |
SQS-based S31 | aws:cloudtrail (gz, json, event size 720B, S3 key size 2MB) | 45,000 | 68,000 | 3,900 |
CloudWatch logs | aws:cloudwatchlog:vpcflow | 1,000 | 6,700 | 100 |
CloudWatch (ListMetric) | aws:cloudwatch (10,000 metrics) | 240 (metrics/s) | NA | NA |
CloudTrail | aws:cloudtrail (gz, json, sqs=100, 9K events/key) | 20,000 | 15,000 | 1,700 |
Kinesis | aws:cloudwatchlog:vpcflow (json, 10 shards) | 18,000 | 154,000 | 1,500 |
SQS | aws:sqs (json, event size 2.8K) | N/A | 670 | N/A |
Max inputs benchmark per heavy forwarder¶
The following input number ceiling was measured with multiple inputs configured on a heavy forwarder in an indexer cluster distributed environment, where CPU and memory resources were utilized to their fullest.
If you have a smaller event size, fewer keys per bucket, or more available CPU and memory resources in your environment, you can configure more inputs than the maximum input number indicated in the table.
Data Input | Sourcetype | Format | Number of Keys/Bucket | Event Size | Max Inputs |
---|---|---|---|---|---|
S3 | aws:s3 | zip, syslog | 100K | 100B | 300 |
S3 | aws:cloudtrail | gz, json | 1,300K | 1KB | 30 |
Incremental S3 | aws:cloudtrail | gz, json | 1,300K | 1KB | 20 |
SQS-based S3 | aws:cloudtrail, aws:config | gz, json | 1,000K | 1KB | 50 |
Memory usage benchmark for generic S3 inputs¶
Event Size | Number of Events per Key | Total Number of Keys | Archive Type | Number of Inputs | Memory Used |
---|---|---|---|---|---|
1K | 1,000 | 10,000 | zip | 20 | 20G |
1K | 1,000 | 1,000 | zip | 20 | 12G |
1K | 1,000 | 10,000 | zip | 10 | 18G |
100B | 1,000 | 10,000 | zip | 10 | 15G |
Performance tuning and sizing guidelines¶
If you do not achieve the expected AWS data ingestion throughput, follow these steps to tune the throughput performance:
-
Identify the bottleneck in your system that prevents it from achieving a higher level of throughput performance. The bottleneck in AWS data ingestion may lie in one of the following components:
- The Splunk Add-on for AWS: its capacity to pull in AWS data through API calls
- Heavy forwarder: its capacity to parse and forward data to the indexer tier, which involves the throughput of the parsing, merging, and typing pipelines
- Indexer: the index pipeline throughput To troubleshoot the indexing performance on the heavy forwarder and indexer, refer to Troubleshooting indexing performance in the Capacity Planning Manual. A chain is as only as strong as its weakest link. The capacity of the bottleneck is the capacity of the entire system as a whole. Only by identifying and tuning the performance of the bottleneck component can you improve the overall system performance.
-
Tune the performance of the bottleneck component. If the bottleneck lies in heavy forwarders or indexers, refer to the Summary of performance recommendations in the Capacity Planning Manual. If the bottleneck lies in the Splunk Add-on for AWS, adjust the following key factors that usually impact the AWS data input throughput:
- Parallelization settings
To achieve optimal throughput performance, you can set the
parallelIngestionPipelines
value to 2 inserver.conf
if your resource capacity permits. For information aboutparallelIngestionPipelines
, see Parallelization settings in the Splunk Enterprise Capacity Planning Manual. - AWS data inputs When there is no shortage of resources, adding more inputs in the add-on increases throughput but it also consumes more memory and CPU. Increase the number of inputs to improve throughput until memory or CPU is running short. If you are using SQS-based S3 inputs, you can horizontally scale out data collection by configuring more inputs on multiple heavy forwarders to consume messages from the same SQS queue.
- Number of keys in a bucket For both the Generic S3 and Incremental S3 inputs, the number of keys (or objects) in a bucket is a factor that impacts initial data collection performance. The first time a Generic or Incremental S3 input collects data from a bucket, the more keys the bucket contains, the longer time it takes to complete the list operation, and the more memory is consumed. A large number of keys in a bucket require a huge amount of memory for S3 inputs in the initial data collection and limit the number of inputs you can configure in the add-on. If applicable, you can use log file prefix to subset keys in a bucket into smaller groups and configure different inputs to ingest them separately. For information about how to configure inputs to use log file prefix, see Add an S3 input for Splunk Add-on for AWS. For SQS-based S3 inputs, the number of keys in a bucket is not a primary factor since data collection can be horizontally scaled out based on messages consumed from the same SQS queue.
- File format Compressed files consume much more memory than plain text files.
- Parallelization settings
To achieve optimal throughput performance, you can set the
-
When you have resolved the bottleneck, see if the improved performance meets your requirements. If not, continue the previous steps to identify the next bottleneck in the system and address it until the expected overall throughput performance is achieved.