Performance reference for the Kinesis input in the Splunk Add-on for AWS¶

This page provides the reference information about Splunk’s performance testing of the Kinesis input in Splunk Add-on for AWS. The testing was performed on version 4.0.0, when the Kinesis input was first introduced. Use this information to enhance the performance of your own Kinesis data collection tasks.

Note

Many factors impact performance results, including file size, file compression, event size, deployment architecture, and hardware. These results represent reference information and do not represent performance in all environments.

Summary¶

While results in different environments vary, Splunk’s performance testing of the Kinesis input showed the following:

Each Kinesis input can handle up to 6 MB/s of data, with a daily ingestion volume of 500 GB.
More shards can slightly improve the performance. Use 3 shards for large streams.

Testing architecture¶

The performance of the Kinesis input was tested using a single-instance Splunk Enterprise 6.4.0 on an m4.4xlarge AWS EC2 instance to ensure CPU, memory, storage, and network did not introduce any bottlenecks. See the following instance specs:

Instance type	M4 Quadruple Extra Large (m4.4xlarge)
Memory	64 GB
ECU	53.5
Cores	16
Storage	0 GB (EBS only)
Architecture	64-bit
Network performance	High
EBS Optimized: Max Bandwidth	250 MB/s

Test scenario¶

The following parameters tested high-volume VPC flow logs ingested through a Kinesis stream:

Shard numbers: 3, 5, and 10 shards
Event size: 120 bytes per event
Number of events: 20,000,000
Compression: gzip
Initial stream position: TRIM_HORIZON

AWS reports that each shard is limited to 5 read transactions per second, up to a maximum read rate of 2MB per second. Thus, with 10 shards, the theoretical upper limit is 20 MB per second.

Test results¶

At peak, the data ingestion rate reached 6 million events per minute (100,000 events per second). With each event measuring 120 bytes, this corresponds to a maximum throughput of 10 MB/s.

For a single Kinesis modular input, the average throughput was 6 MB/s, resulting in approximately 500 GB of daily ingestion.

Reducing the shard count from 10 to 3 decreased throughput by about 10%.

During testing, resource usage on the instance was as follows:

Normalized CPU usage of approximately 30%
Python memory usage of approximately 700 MB

The indexer is the largest consumer of CPU, and the modular input is the largest consumer of memory.

Note

AWS throws a ProvisionedThroughputExceededException if a call returns 10 MB of data and subsequent calls are made within the next 5 seconds. During testing with 3 shards, this error occurred only once every 1 to 5 minutes.