Configure Inputs for the Splunk Add-on for AWS¶

Input configuration overview¶

You can use the Splunk Add-on for AWS to collect data from AWS. For each supported data type, one or more input types are provided for data collection.

Follow these steps to plan and perform your AWS input configuration:

Users adding new inputs must have the admin_all_objects role enabled.

Click input type to go to the input configuration details.
Follow the steps described in the input configuration details to complete the configuration.

Supported Data Types and Corresponding AWS Input Types¶

The following matrix lists all the data types that can be collected using the Splunk Add-on for AWS and the corresponding input types that you can configure to collect this data.

For some data types, the Splunk Add-on for AWS provides you with the flexibility to choose from multiple input types based on specific requirements. For example, collect historical logs as opposed to only collect newly created logs. SQS-based S3 is the best practice input type to use for all of its collectible data types.

Data Type	Source type	Supported Input Types	Best practice Input Type
Billing	`aws:billing`	Billing	Billing
CloudWatch	`aws:cloudwatch`	CloudWatch	CloudWatch
CloudFront Access Logs	`aws:cloudfront:accesslogs`	Generic S3 Incremental S3 SQS-based S3	SQS-based S3
Config	`aws:config`, `aws:config:notification`	SQS-based S3 AWS Config	SQS-based S3
Config Rules	`aws:config:rule`	Config Rules	Config Rules
Description	`aws:description`	Description	Description
ELB Access Logs	`aws:elb:accesslogs`	SQS-based S3 Generic S3 Incremental S3	SQS-based S3
Inspector	`aws:inspector`	Inspector	Inspector
CloudTrail	`aws:cloudtrail`	SQS-based S3 Generic S3 Incremental S3	SQS-based S3
S3 Access Logs	`aws:s3:accesslogs`	SQS-based S3 Generic S3 Incremental S3	SQS-based S3
VPC Flow Logs	`aws:cloudwatchlogs:vpcflow` `aws:cloudwatchlogs:vpcflow:metric`	SQS-based S3 CloudWatch Logs Kinesis	SQS-based S3
SQS	`aws:sqs`	SQS	SQS
Others	Custom sourcetypes	SQS-based S3 Generic S3 CloudWatch Logs Kinesis SQS	SQS-based S3

AWS Input types¶

The Splunk Add-on for AWS provides two categories of input types to gather useful data from your AWS environment:

Dedicated, or single-purpose input types. Designed to ingest one specific data type
Multi-purpose input types to collect multiple data types from the S3 bucket

Some data types can be ingested using either a dedicated input type or a multi-purpose input type. For example, CloudTrail logs can be collected using any of the following input types: CloudTrail, S3, or SQS-based S3. The SQS-based S3 input type is the recommended option because it is more scalable and provides higher ingestion performance.

Dedicated Input types¶

To ingest a specific type of log, configure the corresponding dedicated input designed to collect the log type. Click the input type name in the following table for instructions on how to configure it.

Input	Description
AWS Config	Configuration snapshots, historical configuration data, and change notifications from the AWS Config service.
Config Rules	Compliance details, compliance summary, and evaluation status of your AWS Config Rules.
Inspector	Assessment Runs and Findings data from the Amazon Inspector service.
CloudTrail	AWS API call history from the AWS CloudTrail service.
CloudWatch Logs	Logs from the CloudWatch Logs service, including VPC Flow Logs. VPC Flow Logs allow you to capture IP traffic flow data for the network interfaces in your resources.
CloudWatch	Performance and billing metrics from the AWS CloudWatch service.
Description	Metadata about your AWS environment.
Billing	Billing data from the billing reports that you collect in the Billing & Cost Management console.
Kinesis	Data from your Kinesis streams. Note:It is a best practice to collect VPC flow logs and CloudWatch logs through Kinesis streams. However, the AWS Kinesis input has the following limitations: Multiple inputs collecting data from a single stream cause duplicate events in the Splunk platform. Does not support monitoring of dynamic shards repartition, which means when there is a shard split or merge, the add-on cannot automatically discover and collect data in the new shards until it is restarted. After you repartition shards, you must restart your data collection node to collect data from the partitions. You can also collect data from Kinesis streams using the Splunk Add-on for Amazon Kinesis Firehose. The Splunk Add-on for Amazon Kinesis Firehose simplifies some of the configuration steps, but the same limitations about collecting data from streams apply. For more information, see About the Splunk Add-on for Amazon Kinesis Firehose.
SQS	Data from your AWS SQS.

Multi-purpose Input types¶

Configure multi-purpose inputs to ingest supported log types.

Use the SQS-based input type to collect its supported log types. If you are already collecting logs using generic S3 inputs, you can still create SQS-based inputs and migrate your existing generic S3 inputs to the new inputs. For detailed migration steps, see Migrate from the S3 input to the SQS-based input in this manual.

If the log types you want to collect are not supported by the SQS-based input type, use the generic S3 input type instead.

Read the multi-purpose input types comparison table to view the differences between the multi-purpose S3 collection input types.

Click the input type name in the table below for instructions on how to configure it.

Input	Description
SQS-based S3 (best practice)	A more scalable and higher-performing alternative to the generic and incremental S3 inputs, the SQS-based S3 input polls messages from SQS that subscribes to SNS notification events from AWS services and collects the corresponding log files - generic log data, CloudTrail API call history, Config logs, and access logs - from your S3 buckets in real time. Unlike the other S3 input types, the SQS-based S3 input type takes advantage of the SQS visibility timeout setting and enables you to configure multiple inputs to scale out data collection from the same folder in an S3 bucket without ingesting duplicate data. Also, the SQS-based S3 input automatically switches to multipart, in-parallel transfers when a file is over a specific size threshold, thus preventing timeout errors caused by large file size.
Generic S3	General-purpose input type that can collect any log type from S3 buckets: CloudTrail API call history, access logs, and even custom non-AWS logs. The generic S3 input lists all the objects in the bucket and examines the modified date of each file every time it runs to pull uncollected data from an S3 bucket. When the number of objects in a bucket is large, this can be a very time-consuming process with low throughput.
Incremental S3	The incremental S3 input type collects four AWS service log types. There are four types of logs you can collect using the Incremental S3 input: CloudTrail Logs: The add-on searches for the cloudtrail logs under `<bucket_name>/<log_file_prefix>/AWSLogs/<Account ID>/CloudTrail/<Region ID>/<YYYY/MM/DD>/<file_name>.json.gz`. ELB Access Logs: The add-on searches the elb access logs under `<bucket_name>/<log_file_prefix>/AWSLogs/<Account ID>/elasticloadbalancing/<Region ID>/<YYYY/MM/DD>/<file_name>.log.gz`. S3 Access Logs: The add-on searches the S3 access logs under `<bucket_name>/<log_file_prefix><YYYY-mm-DD-HH-MM-SS><UniqueString>`. CloudFront Access Logs: The add-on searches the cloudfront access logs under `<bucket_name>/<log_file_prefix><distributionID><YYYY/MM/DD>.<UniqueID>.gz` The incremental S3 input only lists and retrieves objects that have not been ingested from a bucket by comparing datetime information included in filenames against the checkpoint record, which significantly improves ingestion performance.

Multi-purpose Input types comparison table¶

Metric	Generic S3	Incremental S3	SQS-based S3 (best practice)
Supported log types	Any log type, including non-AWS custom logs.	4 AWS services log types: CloudTrail logs, S3 access logs, CloudFront access logs, ELB access logs.	5 AWS services log types (Config logs, CloudTrail logs, S3 access logs, CloudFront access logs, ELB access logs), as well as non-AWS custom logs.
Data collection method	Lists all objects in the bucket and compares modified date against the checkpoint.	Directly retrieves AWS log files whose filenames are distinguished by datetime.	Decodes SQS messages and ingests corresponding logs from the S3 bucket.
Ingestion performance	Low	High	High
Can ingest historical logs (logs generated in the past)?	Yes	Yes	No
Scalable?	No	No	Yes You can scale out data collection by configuring multiple inputs to ingest logs from the same S3 bucket without creating duplicate data
Fault-tolerant?	No Each generic S3 input is a single point of failure.	No Each incremental S3 input is a single point of failure.	Yes Takes advantage of the SQS visibility timeout setting. Any SQS message not successfully processed in time by the SQS-based S3 input will reappear in the queue and will be retrieved and processed again. In addition, data collection can be horizontally scaled out so that if one SQS-based S3 input fails, other inputs can still continue to pick up messages from the SQS queue and ingest corresponding data from the S3 bucket.