Skip to content

Configure Kinesis inputs for the Splunk Add-on for AWS

Complete the steps to configure Kinesis inputs for the Splunk Add-on for Amazon Web Services (AWS):

  1. You must manage accounts for the add-on as a prerequisite. See Manage accounts for the Splunk Add-on for AWS.
  2. Configure AWS services for the Kinesis input.
  3. Configure AWS permissions for the Kinesis input.
  4. (Optional) Configure VPC Interface Endpoints for STS and Kinesis services from your AWS Console if you want to use private endpoints for data collection and authentication. For more information, see the Interface VPC endpoints (AWS PrivateLink) topic in the Amazon Virtual Private Cloud documentation.
  5. Configure Kinesis inputs either through Splunk Web or configuration files.

Kinesis is the recommended input type for collecting VPC Flow Logs. This input type also supports the collection of custom data types through Kinesis streams.

This data source is available only in a subset of AWS regions. For a full list of supported regions, see https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/.

The Kinesis data input only supports gzip compression or plaintext data. It cannot ingest data with other encodings, nor can it ingest data with a mix of gzip and plaintext in the same input. Create separate Kinesis inputs for gzip data and plaintext data.

See the Performance for the Kinesis input in the Splunk Add-on for AWS section of this page for reference data to enhance the performance of your own Kinesis data collection task.

Configure AWS permissions for the Kinesis input

Required permission for Amazon Kinesis:

  • Get*
  • DescribeStream
  • ListStreams

See the following sample inline policy to configure Kinesis input permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kinesis:Get*",
                "kinesis:DescribeStream",
                "kinesis:ListStreams"
            ],
            "Resource": "*"
        }
    ]
}

Configure a Kinesis input using Splunk Web

To configure inputs in Splunk Web:

  1. Click Splunk Add-on for AWS in the navigation bar on Splunk Web home.
  2. Choose one of the following menu paths depending on which data type you want to collect:
  • Create New Input > VPC Flow Logs > Kinesis
  • Create New Input > Others > Kinesis

Use the following table to complete the fields for the new input in the .conf file or in Splunk Web:

Argument in configuration file

Field in Splunk Web

Description

account

AWS Account

The AWS account or EC2 IAM role the Splunk platform uses to access your Kinesis data. In Splunk Web, select an account from the drop-down list. In aws_kinesis_tasks.conf, enter the friendly name of one of the AWS accounts that you configured on the Configuration page or the name of the automatically discovered EC2 IAM role.

aws_iam_role

Assume Role

The IAM role to assume, see Manage accounts for the Splunk Add-on for AWS.

region

AWS Region

The AWS region that contains the Kinesis streams. In aws_kinesis_tasks.conf, enter the region ID. See https://docs.aws.amazon.com/general/latest/gr/rande.html#d0e371.

private_endpoint_enabled

Use Private Endpoints

Check the checkbox to use private endpoints of AWS Security Token Service (STS) and AWS Simple Cloud Storage (S3) services for authentication and data collection. In inputs.conf, enter 0 or 1 to respectively disable or enable use of private endpoints.

kinesis_private_endpoint_url

Private Endpoint (S3)

Private Endpoint (Interface VPC Endpoint) of your Kinesis service, which can be configured from your AWS console.
Supported Formats :
://vpce--.kinesis..vpce.amazonaws.com ://vpce---.kinesis..vpce.amazonaws.com

sts_private_endpoint_url

Private Endpoint (STS)

Private Endpoint (Interface VPC Endpoint) of your STS service, which can be configured from your AWS console.
Supported Formats :
://vpce--.sts..vpce.amazonaws.com

stream_names

Stream Names

The Kinesis stream name

encoding

Encoding

The encoding of the stream data. Set to gzip or leave blank, which defaults to Base64. All stream data that you collect in a single input must have the same encoding. If you are collecting VPC Flow Logs data through this input, encoding is typically gzip.

init_stream_position

Initial Stream Position

LATEST or TRIM_HORIZON. LATEST starts data collection from the point the input is enabled. TRIM_HORIZON starts collecting with the oldest data record.

format

Record Format

CloudWatchLogs or none. If you choose CloudWatchLogs, this add-on parses the data in CloudWatchLogs format.

metric_index_flag

Use Metric Index?

Whether to use metric index or event index. The default value is No (use event index). This field is only visible when creating VPC Flow Logs -> Kinesis inputs.

sourcetype

Source type

A source type for the events.
If you are indexing VPC Flow Log data through Kinesis:

  1. If using event index, the sourcetype value is aws:cloudwatchlogs:vpcflow.
  2. If using metric index, the sourcetype value is aws:cloudwatchlogs:vpcflow:metric.
Enter aws:kinesis if you are collecting any other Kinesis data.

index

Index

The index name where the Splunk platform puts the Kinesis data. The default is main.

Configure a Kinesis input using configuration files

To configure the input using configuration files, create $SPLUNK_HOME/etc/apps/Splunk_TA_aws/local/aws_kinesis_tasks.conf using the following template:

[<name>]
account = <value>
aws_iam_role=<value>
region = <value>
private_endpoint_enabled = <value>
kinesis_private_endpoint_url = <value>
sts_private_endpoint_url = <value>
stream_names = <value>
encoding = <value>
init_stream_position = <value>
format = <value>
sourcetype = <value>
index = <value>
metric_index_flag = <value>

Here is an example stanza that collects Kinesis data for all streams available in the region:

[splunkapp2:us-east-1]
account = splunkapp2
region = us-east-1
encoding =
init_stream_position = LATEST
index = aws
format = CloudWatchLogs
sourcetype = aws:kinesis
metric_index_flag = 0

Performance for the Kinesis input in the Splunk Add-on for AWS

This page provides the reference information about the performance testing of the Kinesis input in Splunk Add-on for AWS. The testing was performed on version 4.0.0, when the Kinesis input was first introduced. You can use this information to enhance the performance of your own Kinesis data collection tasks.

Many factors impact performance results, including file size, file compression, event size, deployment architecture, and hardware. These results represent reference information and do not represent performance in all environments.

Summary

While results in different environments will vary, the performance testing of the Kinesis input showed the following:

  • Each Kinesis input can handle up to 6 MB/s of data, with a daily ingestion volume of 500 GB.
  • More shards can slightly improve the performance. Three shards are recommended for large streams.

Testing architecture

Splunk tested the performance of the Kinesis input using a single-instance Splunk Enterprise 6.4.0 on an m4.4xlarge AWS EC2 instance to ensure CPU, memory, storage, and network did not introduce any bottlenecks. See the following instance specs:

Instance type M4 Quadruple Extra Large (m4.4xlarge)
Memory 64 GB
ECU 53.5
Cores 16
Storage 0 GB (EBS only)
Architecture 64-bit
Network performance High
EBS Optimized: Max Bandwidth 250 MB/s

Test scenario

Splunk tested the following parameters to target the use case of high-volume VPC flow logs ingested through a Kinesis stream:

  • Shard numbers: 3, 5, and 10 shards
  • Event size: 120 bytes per event
  • Number of events: 20,000,000
  • Compression: gzip
  • Initial stream position: TRIM_HORIZON

AWS reports that each shard is limited to 5 read transactions per second, up to a maximum read rate of 2 MB per second. Thus, with 10 shards, the theoretical upper limit is 20 MB per second.

Test results

Splunk observed a data ingestion rate of 6 million events per minute at peak, which is 100,000 events per second. Because each event is 120 bytes, this indicates a maximum throughput of 10 MB/s.

Splunk observed an average throughput of 6 MB/s for a single Kinesis modular input, or a daily ingestion throughput of approximately 500 GB.

After reducing the shard number from 10 shards to 3 shards, Splunk observed a throughput downgrade of approximately 10%.

During testing, Splunk observed the following resource usage on the instance:

  • Normalized CPU usage of approximately 30%
  • Python memory usage of approximately 700 MB

The indexer is the largest consumer of CPU, and the modular input is the largest consumer of memory.

AWS throws a ProvisionedThroughputExceededException if a call returns 10 MB of data and subsequent calls are made within the next 5 seconds. Splunk observed this error while testing with three shards only every 1 to 5 minutes.