Configure Kinesis inputs for the Splunk Add-on for AWS¶

Complete the steps to configure Kinesis inputs for the Splunk Add-on for Amazon Web Services (AWS):

You must manage accounts for the add-on as a prerequisite. See Manage accounts for the Splunk Add-on for AWS.
Configure AWS services for the Kinesis input.
Configure AWS permissions for the Kinesis input.
(Optional) Configure VPC Interface Endpoints for STS and Kinesis services from your AWS Console if you want to use private endpoints for data collection and authentication. For more information, see the Interface VPC endpoints (AWS PrivateLink) topic in the Amazon Virtual Private Cloud documentation.
Configure Kinesis inputs either through Splunk Web or configuration files.

Kinesis is the recommended input type for collecting VPC Flow Logs. This input type also supports the collection of custom data types through Kinesis streams.

This data source is available only in a subset of AWS regions. For a full list of supported regions, see https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/.

Note

The Kinesis data input only supports gzip compression or plaintext data. It cannot ingest data with other encodings, nor can it ingest data with a mix of gzip and plaintext in the same input. Create separate Kinesis inputs for gzip data and plaintext data.

Note

Kinesis modular input does not support following configurations:

Data streams operating in On-Demand mode.
Provisioned mode data streams when performing manual resharding.

See the Performance for the Kinesis input in the Splunk Add-on for AWS section of this page for reference data to enhance the performance of your own Kinesis data collection task.

Configure AWS permissions for the Kinesis input¶

AWS Service	Permissions
Kinesis	`Get*` `DescribeStream` `ListStreams`

See the following sample inline policy to configure Kinesis input permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kinesis:Get*",
                "kinesis:DescribeStream",
                "kinesis:ListStreams"
            ],
            "Resource": "*"
        }
    ]
}

Configure a Kinesis input using Splunk Web¶

To configure inputs in Splunk Web:

Click Splunk Add-on for AWS in the navigation bar on Splunk Web home.
Choose one of the following menu paths depending on which data type you want to collect:

Create New Input > VPC Flow Logs > Kinesis
Create New Input > Others > Kinesis

Use the following table to complete the fields for the new input in the .conf file or in Splunk Web:

Argument in configuration file	Field in Splunk Web	Description
`account`	AWS Account	The AWS account or EC2 IAM role the Splunk platform uses to access your Kinesis data. In Splunk Web, select an account from the drop-down list. In aws_kinesis_tasks.conf, enter the friendly name of one of the AWS accounts that you configured on the Configuration page or the name of the automatically discovered EC2 IAM role.
`aws_iam_role`	Assume Role	The IAM role to assume, see Manage AWS IAM Roles for Splunk Add-on for AWS
`region`	AWS Region	The AWS region that contains the Kinesis streams. In aws_kinesis_tasks.conf, enter the region ID. See https://docs.aws.amazon.com/general/latest/gr/rande.html#d0e371.
`private_endpoint_enabled`	Use Private Endpoints	Check the checkbox to use private endpoints of AWS Security Token Service (STS) and AWS Simple Cloud Storage (S3) services for authentication and data collection. In inputs.conf, enter `0` or `1` to respectively disable or enable use of private endpoints.
`kinesis_private_endpoint_url`	Private Endpoint (S3)	Private Endpoint (Interface VPC Endpoint) of your Kinesis service, which you can configure from your AWS console. Supported Formats : `://vpce--.kinesis..vpce.amazonaws.com://vpce---.kinesis.`.vpce.amazonaws.com`
`sts_private_endpoint_url`	Private Endpoint (STS)	Private Endpoint (Interface VPC Endpoint) of your STS service, which you can configure from your AWS console. Supported Formats : `://vpce--.sts.`.vpce.amazonaws.com`
`stream_names`	Stream Names	The Kinesis stream name
`encoding`	Encoding	The encoding of the stream data. Set to `gzip` or leave blank, which defaults to Base64. All stream data that you collect in a single input must have the same encoding. If you are collecting VPC Flow Logs data through this input, encoding is typically gzip.
`init_stream_position`	Initial Stream Position	LATEST or TRIM_HORIZON. LATEST starts data collection from the point the input is enabled. TRIM_HORIZON starts collecting with the oldest data record.
`format`	Record Format	CloudWatchLogs or none. If you choose CloudWatchLogs, this add-on parses the data in CloudWatchLogs format.
`metric_index_flag`	Use Metric Index?	Whether to use metric index or event index. The default value is No (use event index). This field is only visible when creating VPC Flow Logs -> Kinesis inputs.
`sourcetype`	Source type	A source type for the events. If you are indexing VPC Flow Log data through Kinesis: If using event index, the sourcetype value is `aws:cloudwatchlogs:vpcflow`. If using metric index, the sourcetype value is `aws:cloudwatchlogs:vpcflow:metric`. Enter `aws:kinesis` if you are collecting any other Kinesis data.
`index`	Index	The index name where the Splunk platform puts the Kinesis data. The default is main.

Configure a Kinesis input using configuration files¶

To configure the input using configuration files, create $SPLUNK_HOME/etc/apps/Splunk_TA_aws/local/aws_kinesis_tasks.conf using the following template:

[<name>]
account = <value>
aws_iam_role=<value>
region = <value>
private_endpoint_enabled = <value>
kinesis_private_endpoint_url = <value>
sts_private_endpoint_url = <value>
stream_names = <value>
encoding = <value>
init_stream_position = <value>
format = <value>
sourcetype = <value>
index = <value>
metric_index_flag = <value>

The following example shows stanza of Kinesis data for all streams available in the region:

[kinesis_cloudwatch_data]
account = splunkapp2
format = CloudWatchLogs
index = default
init_stream_position = LATEST
metric_index_flag = 0
private_endpoint_enabled = 0
region = ap-south-1
sourcetype = aws:kinesis
stream_names = test1

Performance for the Kinesis input in the Splunk Add-on for AWS¶

This page provides the reference information about the performance testing of the Kinesis input in Splunk Add-on for AWS. The testing was performed on version 4.0.0, when the Kinesis input was first introduced. You can use this information to enhance the performance of your own Kinesis data collection tasks.

Note

Many factors impact performance results, including file size, file compression, event size, deployment architecture, and hardware. These results represent reference information and do not represent performance in all environments.

Summary¶

While results in different environments will vary, the performance testing of the Kinesis input showed the following:

Each Kinesis input can handle up to 6 MB/s of data, with a daily ingestion volume of 500 GB.
More shards can slightly improve the performance. Use three shards for large streams.

Testing architecture¶

Splunk tested the performance of the Kinesis input using a single-instance Splunk Enterprise 6.4.0 on an m4.4xlarge AWS EC2 instance to ensure CPU, memory, storage, and network did not introduce any bottlenecks. See the following instance specs:


Instance type	M4 Quadruple Extra Large (m4.4xlarge)
Memory	64 GB
ECU	53.5
Cores	16
Storage	0 GB (EBS only)
Architecture	64-bit
Network performance	High
EBS Optimized: Max Bandwidth	250 MB/s

Test scenario¶

Splunk tested the following parameters to target the use case of high-volume VPC flow logs ingested through a Kinesis stream:

Shard numbers: 3, 5, and 10 shards
Event size: 120 bytes per event
Number of events: 20,000,000
Compression: gzip
Initial stream position: TRIM_HORIZON

AWS reports that each shard is limited to 5 read transactions per second, up to a maximum read rate of 2 MB per second. Thus, with 10 shards, the theoretical upper limit is 20 MB per second.

Test results¶

Splunk observed a data ingestion rate of 6 million events per minute at peak, which is 100,000 events per second. Because each event is 120 bytes, this indicates a maximum throughput of 10 MB/s.

Splunk observed an average throughput of 6 MB/s for a single Kinesis modular input, or a daily ingestion throughput of approximately 500 GB.

After reducing the shard number from 10 shards to 3 shards, Splunk observed a throughput downgrade of approximately 10%.

During testing, Splunk observed the following resource usage on the instance:

Normalized CPU usage of approximately 30%
Python memory usage of approximately 700 MB

The indexer is the largest consumer of CPU, and the modular input is the largest consumer of memory.

Note

AWS throws a ProvisionedThroughputExceededException if a call returns 10 MB of data and subsequent calls are made within the next 5 seconds. Splunk observed this error while testing with three shards only every 1 to 5 minutes.