Configure SQS-Based S3 inputs for the Splunk Add-on for AWS¶

Complete the steps to configure SQS-Based S3 inputs for the Splunk Add-on for Amazon Web Services (AWS):

Prerequisites You must manage accounts for the add-on. See Manage accounts for the Splunk Add-on for AWS.

Configure AWS services for the SQS-Based S3 input.
Configure AWS permissions for the SQS-Based S3 input.
(Optional) Configure VPC Interface Endpoints for STS, SQS, and S3 services from your AWS Console if you want to use private endpoints for data collection and authentication. For more information, see https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html.
Configure SQS-Based S3 inputs either through Splunk Web or configuration files.

Configuration prerequisites¶

Delimited Files parsing prerequisites if parse_csv_with_header is enabled

The SQS-Based S3 custom data types input processes Delimited Files(.csv, .psv, .tsv) according to the status of the fields parse_csv_with_header and parse_csv_with_delimiter.
- When parse_csv_with_header is enabled, all files ingested by the input, whether delimited or not, are processed as if they were delimited files with the value of parse_csv_with_delimiter used to split the fields. The first line of each file will be considered the header.
- When you disable parse_csv_with_header, events are indexed line by line without ant CSV processing.
The field parse_csv_with_delimiter is comma by default, but you can edit it to a different delimiter. This delimiter can be any character except alphanumeric, single, or double quote.

This data input supports the following compression types:
- Single delimited files file OR delimited files files in ZIP, GZIP, TAR, or TAR.GZ formats.
Ensure that each delimited file contains a header. The CSV parsing functionality takes the first non-empty line of the file as a header before parsing.
Ensure that all files have a carriage return at the end of each file. Otherwise, the last line of the CSV file will not be indexed.
Ensure there are no duplicate values in the header of the CSV files to avoid missing data.
Some illegal sequences of string characters throw an UnicodeDecodeError, for example VI,Visa,Cabela�s

Note

Starting in version 6.3.0 of the Splunk Add-on for AWS, the VPC Flow log extraction format has been updated to include v3-v5 fields. Before upgrading to versions 6.3.0 and higher of the Splunk Add-on for AWS, Splunk platform deployments ingesting AWS VPC Flow Logs must update the log format in AWS VPC to include v3-v5 fields in order to ensure successful field extractions. For more information on updating the log format in AWS VPC, see the Create a flow log section of the Work with flow logs topic in the AWS documentation. For more information on the list of v1-v5 fields to add in the given order when selecting Custom Format, or selecting Custom Format and Select All, see the Available fields section of the Logging IP traffic using VPC Flow Logs topic in the AWS documentation.

Processing outcomes

The end result after CSV parsing is a JSON object with the header values mapped to the subsequent row values.

Configure AWS services for the SQS-Based S3 input¶

Configure SQS-Based S3 inputs to collect events¶

Configure SQS-Based S3 inputs to collect the following events:

CloudFront Access Logs
Config
ELB Access logs
CloudTrail
S3 Access Logs
VPC Flow Logs
Transit Gateway Flow Logs
Custom data types

AWS Service Configuration prerequisites¶

Before you configure SQS-Based S3 inputs, perform the following tasks:

Create an SQS Queue to receive notifications and a second SQS Queue to serve as a dead letter queue.
Create an SNS Topic.
Configure S3 to send notifications for All object create events to an SNS Topic. This lets S3 notify the add-on that new events were written to the S3 bucket.
Subscribe the main SQS Queue to the corresponding SNS Topic.

Best practices¶

Keep the following in mind as you configure your inputs:

The SQS-Based S3 input only collects in AWS service logs that meet the following criteria:
- Near-real time
- Newly created
- Stored into S3 buckets
- Has event notifications sent to SQS

Events that occurred in the past, or events with no notifications sent through SNS to SQS end up in the Dead Letter Queue (DLQ), and no corresponding event is created by the Splunk Add-on for AWS. To collect historical logs stored into S3 buckets, use the generic S3 input instead. The S3 input lets you set the initial scan time parameter to collect data generated after a specified time in the past.

To collect the same types of logs from multiple S3 buckets, even across regions, set up one input to collect data from all the buckets. To do this, configure these buckets to send notifications to the same SQS queue from which the SQS-Based S3 input polls message.
To achieve high throughput data ingestion from an S3 bucket, configure multiple SQS-Based S3 inputs for the S3 bucket to scale out data collection.
After configuring an SQS-Based S3 input, you might need to wait for a few minutes before new events are ingested and can be searched. Also, a more verbose logging level causes longer data digestion time. Debug mode is extremely verbose and is not recommended on production systems.
The SQS-based input allows you to ingest data from S3 buckets by optimizing the API calls made by the add-on and relying on SQS/SNS to collect events upon receipt of notification.
The SQS-Based S3 input is stateless, which means that when multiple inputs are collecting data from the same bucket, if one input goes down, the other inputs continue to collect data and take over the load from the failed input. This lets you enhance fault tolerance by configuring multiple inputs to collect data from the same bucket.
The SQS-Based S3 input supports signature validation. If S3 notifications are set up to send through SNS, AWS will create a signature for every message. The SQS-Based S3 input will validate each message with the associated certificate, provided by AWS. For more information, see the Verifying the signatures of Amazon SNS messages topic in the AWS documentation.
If any messages with a signature are received, all following messages will require valid SNS signatures, no matter your input’s SNS signature setting.
Set up a Dead Letter Queue for the SQS queue to be used for the input for storing invalid messages. For information about SQS Dead Letter Queues and how to configure it, see the Amazon SQS dead-letter queues topic in the AWS documentation.
Configure the SQS visibility timeout to prevent multiple inputs from receiving and processing messages in a queue more than once. Set your SQS visibility timeout to 5 minutes or longer. If the visibility timeout for a message is reached before the message is fully processed by the SQS-Based S3 input, the message reappears in the queue and is retrieved and processed again, resulting in duplicate data.

For information about SQS visibility timeout and how to configure it, see the Amazon SQS visibility timeout topic in the AWS documentation.

Supported message types for the SQS-Based S3 input¶

The following message types are supported by the SQS-Based S3 input

ConfigurationHistoryDeliveryCompleted
ConfigurationSnapshotDeliveryCompleted

Configure AWS permissions for the SQS-Based S3 input¶

Configure AWS permissions¶

AWS Service	Permissions
SQS	`GetQueueUrl` `ReceiveMessage` `SendMessage` `DeleteMessage` `ChangeMessageVisibility` `GetQueueAttributes` `ListQueues`
S3	`GetObject` (if Bucket Versioning is disabled) `GetObjectVersion` (if Bucket Versioning is enabled).
KMS	`Decrypt`

See the following sample inline policy to configure input permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
        "Effect": "Allow",
        "Action": [
            "sqs:GetQueueUrl",
            "sqs:ReceiveMessage",
            "sqs:SendMessage",
            "sqs:DeleteMessage",
            "sqs:ChangeMessageVisibility",
            "sqs:GetQueueAttributes",
            "sqs:ListQueues",
            "s3:GetObject",
            "s3:GetObjectVersion",
            "kms:Decrypt"
        ],
        "Resource": "*"
        }
    ]
}

For more information and sample policies, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/security_iam_service-with-iam.html.

See the following sample inline SNS policy to allow your S3 bucket to send notifications to an SNS topic.

    {
        "Version": "2008-10-17",
        "Id": "example-ID",
        "Statement": [
            {
                "Sid": "example-statement-ID",
                "Effect": "Allow",
                "Principal": {"AWS":"*" },
                "Action": ["SNS:Publish"],
                "Resource": "<SNS-topic-ARN>",
                "Condition": {
                    "ArnLike":
                    {
                        "aws:SourceArn": "arn:aws:s3:*:*:<bucket-name>"
                    }
                }
            }
        ]
    }

For more information and sample policies, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/security_iam_service-with-iam.html.

If you plan to use the SQS-Based S3 input, you must enable Amazon S3 bucket events to send notification messages to an SQS queue whenever the events occur. This queue cannot be first-in-first-out. For instructions on setting up S3 bucket event notifications, see the AWS documentation: - https://docs.aws.amazon.com/AmazonS3/latest/UG/SettingBucketNotifications.html - https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventNotifications.html

Configure an SQS-Based S3 input using Splunk Web¶

To configure inputs in Splunk Web, click Splunk Add-on for AWS in the navigation bar on Splunk Web home, then choose one of the following menu paths depending on which data type you want to collect:

Create New Input > CloudTrail > SQS-Based S3
Create New Input > CloudFront Access Log > SQS-Based S3
Create New Input > Config > SQS-Based S3
Create New Input > ELB Access Logs > SQS-Based S3
Create New Input > S3 Access Logs > SQS-Based S3
Create New Input > VPC Flow Logs > SQS-Based S3
Create New Input > Transit Gateway Flow Logs > SQS-Based S3
Create New Input > Custom Data Type > SQS-Based S3
Create New Input > Custom Data Type > SQS-Based S3 > Delimited Files S3 File Decoder

Note

You must have the admin_all_objects role enabled in order to add new inputs.

Choose the menu path that corresponds to the data type you want to collect. The system automatically sets the source type and display relevant field settings in the subsequent configuration page.

Use the following table to complete the fields for the new input in the .conf file or in Splunk Web:

Argument in configuration file	Field in Splunk Web	Description
`aws_account`	AWS Account	The AWS account or EC2 IAM role the Splunk platform uses to access the keys in your S3 buckets. In Splunk Web, select an account from the drop-down list. In inputs.conf, enter the friendly name of one of the AWS accounts that you configured on the Configuration page or the name of the automatically discovered EC2 IAM role. If the region of the AWS account you select is GovCloud, you may encounter errors such as “Failed to load options for S3 Bucket”. You need to manually add AWS GovCloud Endpoint in the S3 Host Name field. See https://docs.aws.amazon.com/govcloud-us/latest/UserGuide/using-govcloud-endpoints.htmlfor more information.
`aws_iam_role`	Assume Role	The IAM role to assume, see Manage AWS IAM Roles for Splunk Add-on for AWS.
`using_dlq`	Force using DLQ (Recommended)	Check the checkbox to remove the checking of DLQ (Dead Letter Queue) for ingestion of specific data. In inputs.conf, enter `0` or `1` to respectively disable or enable the checking. (Default value is `1`)
`sqs_queue_region`	AWS Region	AWS region that the SQS queue is in.
`private_endpoint_enabled`	Use Private Endpoints	Check the checkbox to use private endpoints of AWS Security Token Service (STS) and AWS Simple Cloud Storage (S3) services for authentication and data collection. In inputs.conf, enter `0` or `1` to respectively disable or enable use of private endpoints.
`sqs_private_endpoint_url`	Private Endpoint (SQS)	Private Endpoint (Interface VPC Endpoint) of your SQS service, which you can configure from your AWS console. Supported Formats : `://vpce--.sqs..vpce.amazonaws.com://.vpce---.sqs.`.vpce.amazonaws.com`
`sqs_sns_validation`	SNS Signature Validation	SNS validation of your SQS messages, which you can configure from your AWS console. If selected, all messages will be validated. If unselected, then messages will not be validated until receiving a signed message. Thereafter, all messages will be validated for an SNS signature. For new SQS-Based S3 inputs, this feature is enabled, by default. Supported Formats : `1` is enabled, `0` is disabled. Default is`1`.
`parse_firehose_error_data`	Parse Firehose Error Data	Parse raw data(All events) or failed Kinesis Firehose stream error data to the Splunk HTTP Event Collector (HEC). Decoding of error data will be done for failed Kinesis Firehose streams. For new SQS-Based S3 inputs, this feature is disabled, by default. Versions 7.4.0 and higher of this add-on support the collection of data in the default uncompressed text format. Supported Formats : `1` is enabled, `0` is disabled. Default is`0`.
`s3_private_endpoint_url`	Private Endpoint (S3)	Private Endpoint (Interface VPC Endpoint) of your S3 service, which you can configure from your AWS console. Supported Formats : `://bucket.vpce--.s3..vpce.amazonaws.com://bucket.vpce---.s3.`.vpce.amazonaws.com`
`sts_private_endpoint_url`	Private Endpoint (STS)	Private Endpoint (Interface VPC Endpoint) of your STS service, which you can configure from your AWS console. Supported Formats : `://vpce--.sts..vpce.amazonaws.com://vpce---.sts.`.vpce.amazonaws.com`
`sqs_queue_url`	SQS Queue Name	The SQS queue URL.
`sqs_batch_size`	SQS Batch Size	The maximum number of messages to pull from the SQS queue in one batch. Enter an integer between 1 and 10 inclusive. Set a larger value for small files, and a smaller value for large files. The default SQS batch size is 10. If you are dealing with large files and your system memory is limited, set this to a smaller value.
`s3_file_decoder`	S3 File Decoder	The decoder to use to parse the corresponding log files. The decoder is set according to the Data Type you select. If you select a Custom Data Type, choose one from `Config`, `Cloudtrail`, `S3 Access Logs`, `CloudFront Access Logs`, `ELB Access Logs`, `VPC Flow Logs`, `Delimited Files`, `Custom Logs`, `Amazon Security Lake` or `Transit Gateway Flow Logs`.
`metric_index_flag`	Use Metric Index?	Whether to use metric index or event index. The default value is No (use event index). This field is only visible when creating VPC Flow Logs -> SQS based S3 inputs.
`sourcetype`	Source Type	The source type for the events to collect, automatically filled in based on the decoder chosen for the input. For the VPC Flow Logs -> SQS based S3 inputs: If using event index, the sourcetype value is `aws:cloudwatchlogs:vpcflow`. If using metric index, the sourcetype value is `aws:cloudwatchlogs:vpcflow:metric`. This add-on does not support custom sourcetypes for `Cloudtrail`, `Config`, `ELB Access Logs`, `S3 Access Logs`, `VPC Flow Logs`, `Transit Gateway Flow Logs`, and `CloudFront Access Logs`.
`interval`	Interval	The length of time in seconds between two data collection runs. The default is 300 seconds.
`index`	Index	The index name where the Splunk platform puts the SQS-Based S3 data. The default is main.
`parse_csv_with_header`	Parse all files as CSV	If selected, all files will be parsed as a delimited file with the first line of each file considered the header. Set this checkbox to disabled for delimited files without a header. For new SQS-Based S3 inputs, this feature is disabled, by default. Supported Formats: `1` is enabled. `0` is disabled and default.
`parse_csv_with_delimiter`	CSV field delimiter	Delimiter must be one character. The character cannot be alphanumeric, single quote, or double quote. By default, the delimiter is a comma. For the Tab-delimited files, Enter `\t` as CSV field delimiter. For the Single Space-delimited files, Enter `\s` as CSV field delimiter.

Configure an SQS-Based S3 input using configuration files¶

When you configure inputs manually in inputs.conf, create a stanza using the following template and add it to $SPLUNK_HOME/etc/apps/Splunk_TA_aws/local/inputs.conf. If the file or path does not exist, create it.

    [aws_sqs_based_s3://<stanza_name>]
    aws_account = <value>
    using_dlq = <value>
    private_endpoint_enabled = <value>
    sqs_private_endpoint_url = <value>
    s3_private_endpoint_url = <value>
    sts_private_endpoint_url = <value>
    parse_firehose_error_data = <value>
    interval = <value>
    s3_file_decoder = <value>
    sourcetype = <value>
    sqs_batch_size = <value>
    sqs_queue_region = <value>
    sqs_queue_url = <value>
    metric_index_flag = <value>

The following example shows input stanza for SQS-Based S3 to collect cloudtrail data

[aws_sqs_based_s3://cloudtrail_with_sqs_based_s3]
aws_account = splunkapp2
interval = 300
metric_index_flag = 0
parse_csv_with_delimiter = ,
parse_csv_with_header = 0
parse_firehose_error_data = 0
private_endpoint_enabled = 0
s3_file_decoder = CloudTrail
sourcetype = aws:cloudtrail
sqs_batch_size = 10
sqs_queue_region = ap-south-1
sqs_queue_url = https://sqs.ap-south-1.amazonaws.com/111111111111/cloudtrail-data-queue
sqs_sns_validation = 1
using_dlq = 1

Valid values for s3_file_decoder are CustomLogs, CloudTrail, ELBAccessLogs, CloudFrontAccessLogs, S3AccessLogs, Config, DelimitedFilesDecoder, TransitGatewayFlowLogs.

If you want to ingest custom logs other than the natively supported AWS log types, you must set s3_file_decoder = CustomLogs. This setting lets you ingest custom logs into the Splunk platform instance, but it does not parse the data. To process custom logs into meaningful events, you need to perform additional configurations in props.conf and transforms.conf to parse the collected data to meet your specific requirements.

The CustomLogs S3 parser supports the following archived file formats: .tar, .tar.gz, .tar.bz2, .tgz, .gz, .gzip, and .zip.

For more information on these settings, see /README/inputs.conf.spec under your add-on directory.

Sourcetype mapping with selected S3 file decoder¶

S3 File Decoder	Sourcetype	CIM Compliance
Config	`aws:config`	Yes
CloudTrail	`aws:cloudtrail`	Yes
S3 Access Logs	`aws:s3:accesslogs`	Yes
CloudFront Access Logs	`aws:cloudfront:accesslogs`	Yes
ELB Access Logs	`aws:elb:accesslogs`	Yes
VPC Flow Logs	`aws:cloudwatchlogs:vpcflow`	Yes
Delimited Files	`aws:s3:csv`	No Basic configuration is provided for the JSON Data.
Custom Logs	`aws:s3`	Yes
Amazon Security Lake	`aws:asl`	No
Transit Gateway Flow Logs	`aws:transitgateway:flowlogs`	Yes

Configure an SQS based S3 input for CrowdStrike Falcon Data Replicator (FDR) events using Splunk Web¶

To configure an SQS based S3 input for CrowdStrike Falcon Data Replicator (FDR) events, perform the following steps:

On the Inputs page, select “Create New Input” > “Custom Data Type” > “SQS-Based S3”.
Select your AWS Account the account from the dropdown list.
Uncheck the check box Force Using DLQ (Recommended).
Select the region in which the SQS Queue is present from the AWS Region dropdown.
In the SQS Queue Name box, enter the full SQS queue URL. This will create a option for the SQS queue URL in the dropdown menu.
Select the newly created SQS queue URL option from the SQS Queue Name dropdown menu.
Use the table in the Configure an SQS-Based S3 input using Splunk Web section of this topic to add any additional configuration file arguments.
Save your changes.

Migrate from the generic S3 input to the SQS-Based S3 input¶

Use the SQS-Based S3 input type for real-time data collection from S3 buckets because it is scalable and provides better ingestion performance than the other S3 input types.

If you are already using a generic S3 input to collect data, use the following steps to switch to the SQS-Based S3 input:

Perform prerequisite configurations of AWS services: a. Set up an SQS queue with a Dead Letter Queue and proper visibility timeout configured. See https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html. b. Set up the S3 bucket with the S3 key prefix, if specified, from which you are collecting data to send notifications to the SQS queue. See Configure Alerts for the Splunk Add-on for AWS.
Add an SQS-Based S3 input using the SQS queue you just configured. After the setup, make sure the new input is enabled and starts collecting data from the bucket. a. Edit your old generic S3 input and set the End Date/Time field to the current system time to phase it out. b. Wait until all the task executions of the old input are complete. As a best practice, wait at least double your polling frequency. c. Disable the old generic S3 input. d. Run the following searches to delete any duplicate events collected during the transition:

For CloudTrail events:

Search 1

index=<index_name> sourcetype=aws:cloudtrail
| streamstats count by source, eventID
| search count > 1
| eval indexed_time=strftime(_indextime, "%+")
| eval dup_id=source.eventID.indexed_time
| table dup_id
| outputcsv dupes.csv

Search 2

index=<index_name> sourcetype=aws:cloudtrail
| eval indexed_time=strftime(_indextime, "%+")
| eval dup_id=source.eventID.indexed_time
| search [|inputcsv dupes.csv |format "(" "" "" "" "OR" ")"]
| delete

For S3 access logs:

Search 1

index=<index_name> sourcetype=aws:s3:accesslogs
| streamstats count by source, request_id
| search count > 1
| eval indexed_time=strftime(_indextime, "%+")
| eval dup_id=source.request_id.indexed_time
| table dup_id
| outputcsv dupes.csv

Search 2

index=<index_name> sourcetype=aws:s3:accesslogs
| eval indexed_time=strftime(_indextime, "%+")
| eval dup_id=source.request_id.indexed_time
| search [|inputcsv dupes.csv| format "(" "" "" "" "OR" ")"]
| delete

For CloudFront access logs:

Search 1

index=<index_name> sourcetype=aws:cloudfront:accesslogs
| streamstats count by source, x_edge_request_id
| search count > 1
| eval indexed_time=strftime(_indextime, "%+")
| eval dup_id=source.x_edge_request_id.indexed_time
| table dup_id
| outputcsv dupes.csv

Search 2

index=<index_name> sourcetype=aws:cloudfront:accesslogs
| eval indexed_time=strftime(_indextime, "%+")
| eval dup_id=source.x_edge_request_id.indexed_time
| search [|inputcsv dupes.csv | format "(" "" "" "" "OR" ")"]
| delete

For classic load balancer (elb) access logs:

Because events do not have unique IDs, use the hash function to remove duplication.

Search 1

index=<index_name> sourcetype=aws:elb:accesslogs
| eval hash=sha256(_raw)
| streamstats count by source, hash
| search coun t> 1
| eval indexed_time=strftime(_indextime, "%+")
| eval dup_id=source.hash.indexed_time
| table dup_id | outputcsv dupes.csv

Search 2

index=<index_name> sourcetype=aws:elb:accesslogs
| eval hash=sha256(_raw)
| eval indexed_time=strftime(_indextime, "%+")
| eval dup_id=source.hash.indexed_time
| search [|inputcsv dupes.csv | format "(" "" "" "" "OR" ")"]
| delete

Optionally, delete the old generic S3 input.

Automatically scale data collection with SQS-Based S3 inputs¶

With the SQS-Based S3 input type, you can take full advantage of the auto-scaling capability of the AWS infrastructure to scale out data collection by configuring multiple inputs to ingest logs from the same S3 bucket without creating duplicate events. This is particularly useful if you are ingesting logs from a very large S3 bucket and hit a bottleneck in your data collection inputs.

Create an AWS auto scaling group for your heavy forwarder instances where the SQS-Based S3 inputs is running. To create an auto-scaling group, you can either specify a launch configuration or create an AMI to provision new EC2 instances that host heavy forwarders, and use bootstrap script to install the Splunk Add-on for AWS and configure SQS-Based S3 inputs. For detailed information about the auto-scaling group and how to create it, see https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-groups.html.
Set CloudWatch alarms for one of the following Amazon SQS metrics:
- ApproximateNumberOfMessagesVisible: The number of messages available for retrieval from the queue.
- ApproximateAgeOfOldestMessage: The approximate age (in seconds) of the oldest non-deleted message in the queue.
For instructions on setting CloudWatch alarms for Amazon SQS metrics, see https://docs.aws.amazon.com/cdk/v1/guide/how-to-set-cw-alarm.html
Use the CloudWatch alarm as a trigger to provision new heavy forwarder instances with SQS-Based S3 inputs configured to consume messages from the same SQS queue to improve ingestion performance.

Configure SQS-Based S3 inputs for the Splunk Add-on for AWS¶

Configuration prerequisites¶

Configure AWS services for the SQS-Based S3 input¶

Configure SQS-Based S3 inputs to collect events¶

AWS Service Configuration prerequisites¶

Best practices¶

Supported message types for the SQS-Based S3 input¶

Configure AWS permissions for the SQS-Based S3 input¶

Configure AWS permissions¶

Configure SNS policy to receive notifications from S3 buckets¶

Configure AWS services for SNS alerts¶

Configure an SQS-Based S3 input using Splunk Web¶

Configure an SQS-Based S3 input using configuration files¶

Sourcetype mapping with selected S3 file decoder¶

Configure an SQS based S3 input for CrowdStrike Falcon Data Replicator (FDR) events using Splunk Web¶

Migrate from the generic S3 input to the SQS-Based S3 input¶

Automatically scale data collection with SQS-Based S3 inputs¶