Configure Kinesis inputs for the Splunk Add-on for AWS¶
Complete the steps to configure Kinesis inputs for the Splunk Add-on for Amazon Web Services (AWS):
- You must manage accounts for the add-on as a prerequisite. See Manage accounts for the Splunk Add-on for AWS.
- Configure AWS services for the Kinesis input.
- Configure AWS permissions for the Kinesis input.
- (Optional) Configure VPC Interface Endpoints for STS and Kinesis services from your AWS Console if you want to use private endpoints for data collection and authentication. For more information, see the Interface VPC endpoints (AWS PrivateLink) topic in the Amazon Virtual Private Cloud documentation.
- Configure Kinesis inputs either through Splunk Web or configuration files.
Kinesis is the recommended input type for collecting VPC Flow Logs. This input type also supports the collection of custom data types through Kinesis streams.
This data source is available only in a subset of AWS regions. For a full list of supported regions, see https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/.
The Kinesis data input only supports gzip compression or plaintext data. It cannot ingest data with other encodings, nor can it ingest data with a mix of gzip and plaintext in the same input. Create separate Kinesis inputs for gzip data and plaintext data.
See the Performance for the Kinesis input in the Splunk Add-on for AWS section of this page for reference data to enhance the performance of your own Kinesis data collection task.
Configure AWS permissions for the Kinesis input¶
Required permission for Amazon Kinesis:
Get*
DescribeStream
ListStreams
See the following sample inline policy to configure Kinesis input permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kinesis:Get*",
"kinesis:DescribeStream",
"kinesis:ListStreams"
],
"Resource": "*"
}
]
}
Configure a Kinesis input using Splunk Web¶
To configure inputs in Splunk Web:
- Click Splunk Add-on for AWS in the navigation bar on Splunk Web home.
- Choose one of the following menu paths depending on which data type you want to collect:
- Create New Input > VPC Flow Logs > Kinesis
- Create New Input > Others > Kinesis
Use the following table to complete the fields for the new input in the .conf file or in Splunk Web:
Argument in configuration file |
Field in Splunk Web |
Description |
---|---|---|
|
AWS Account |
The AWS account or EC2 IAM role the Splunk platform uses to access your Kinesis data. In Splunk Web, select an account from the drop-down list. In aws_kinesis_tasks.conf, enter the friendly name of one of the AWS accounts that you configured on the Configuration page or the name of the automatically discovered EC2 IAM role. |
|
Assume Role |
The IAM role to assume, see Manage accounts for the Splunk Add-on for AWS. |
|
AWS Region |
The AWS region that contains the Kinesis streams. In aws_kinesis_tasks.conf, enter the region ID. See https://docs.aws.amazon.com/general/latest/gr/rande.html#d0e371. |
|
Use Private Endpoints |
Check the checkbox to use private endpoints of AWS Security Token
Service (STS) and AWS Simple Cloud Storage (S3) services for
authentication and data collection. In inputs.conf, enter |
|
Private Endpoint (S3) |
Private Endpoint (Interface VPC Endpoint) of your Kinesis
service, which can be configured from your AWS console. |
|
Private Endpoint (STS) |
Private Endpoint (Interface VPC Endpoint) of your STS service,
which can be configured from your AWS console. |
|
Stream Names |
The Kinesis stream name |
|
Encoding |
The encoding of the stream data. Set to |
|
Initial Stream Position |
LATEST or TRIM_HORIZON. LATEST starts data collection from the point the input is enabled. TRIM_HORIZON starts collecting with the oldest data record. |
|
Record Format |
CloudWatchLogs or none. If you choose CloudWatchLogs, this add-on parses the data in CloudWatchLogs format. |
|
Use Metric Index? |
Whether to use metric index or event index. The default value is No (use event index). This field is only visible when creating VPC Flow Logs -> Kinesis inputs. |
|
Source type |
A source type for the events.
aws:kinesis if you are
collecting any other Kinesis data. |
|
Index |
The index name where the Splunk platform puts the Kinesis data. The default is main. |
Configure a Kinesis input using configuration files¶
To configure the input using configuration files, create
$SPLUNK_HOME/etc/apps/Splunk_TA_aws/local/aws_kinesis_tasks.conf
using
the following template:
[<name>]
account = <value>
aws_iam_role=<value>
region = <value>
private_endpoint_enabled = <value>
kinesis_private_endpoint_url = <value>
sts_private_endpoint_url = <value>
stream_names = <value>
encoding = <value>
init_stream_position = <value>
format = <value>
sourcetype = <value>
index = <value>
metric_index_flag = <value>
Here is an example stanza that collects Kinesis data for all streams available in the region:
[splunkapp2:us-east-1]
account = splunkapp2
region = us-east-1
encoding =
init_stream_position = LATEST
index = aws
format = CloudWatchLogs
sourcetype = aws:kinesis
metric_index_flag = 0
Performance for the Kinesis input in the Splunk Add-on for AWS¶
This page provides the reference information about the performance testing of the Kinesis input in Splunk Add-on for AWS. The testing was performed on version 4.0.0, when the Kinesis input was first introduced. You can use this information to enhance the performance of your own Kinesis data collection tasks.
Many factors impact performance results, including file size, file compression, event size, deployment architecture, and hardware. These results represent reference information and do not represent performance in all environments.
Summary¶
While results in different environments will vary, the performance testing of the Kinesis input showed the following:
- Each Kinesis input can handle up to 6 MB/s of data, with a daily ingestion volume of 500 GB.
- More shards can slightly improve the performance. Three shards are recommended for large streams.
Testing architecture¶
Splunk tested the performance of the Kinesis input using a single-instance Splunk Enterprise 6.4.0 on an m4.4xlarge AWS EC2 instance to ensure CPU, memory, storage, and network did not introduce any bottlenecks. See the following instance specs:
Instance type | M4 Quadruple Extra Large (m4.4xlarge) |
Memory | 64 GB |
ECU | 53.5 |
Cores | 16 |
Storage | 0 GB (EBS only) |
Architecture | 64-bit |
Network performance | High |
EBS Optimized: Max Bandwidth | 250 MB/s |
Test scenario¶
Splunk tested the following parameters to target the use case of high-volume VPC flow logs ingested through a Kinesis stream:
- Shard numbers: 3, 5, and 10 shards
- Event size: 120 bytes per event
- Number of events: 20,000,000
- Compression: gzip
- Initial stream position: TRIM_HORIZON
AWS reports that each shard is limited to 5 read transactions per second, up to a maximum read rate of 2 MB per second. Thus, with 10 shards, the theoretical upper limit is 20 MB per second.
Test results¶
Splunk observed a data ingestion rate of 6 million events per minute at peak, which is 100,000 events per second. Because each event is 120 bytes, this indicates a maximum throughput of 10 MB/s.
Splunk observed an average throughput of 6 MB/s for a single Kinesis modular input, or a daily ingestion throughput of approximately 500 GB.
After reducing the shard number from 10 shards to 3 shards, Splunk observed a throughput downgrade of approximately 10%.
During testing, Splunk observed the following resource usage on the instance:
- Normalized CPU usage of approximately 30%
- Python memory usage of approximately 700 MB
The indexer is the largest consumer of CPU, and the modular input is the largest consumer of memory.
AWS throws a ProvisionedThroughputExceededException if a call returns 10 MB of data and subsequent calls are made within the next 5 seconds. Splunk observed this error while testing with three shards only every 1 to 5 minutes.