Skip to content

Data Generator

To ingest samples into Splunk, plugin takes pytest-splunk-addon-data.conf as input. The sample generation & ingestion takes place before executing the testcases. For index-time test cases, there are multiple metadata required about the sample file for which pytest-splunk-addon-data.conf must be created and provided to the pytest command.

Generate sample files

Pytest splunk addon supports 2 different formats of sample files

1. Text format

  • Sample events can be generated from the live events by replacing the field values with the tokens. Token name should be enclosed with ##.
  • Token replacement setting for this samples should be added to pytest-splunk-addon-data.conf to generate events
  • For example,
    Example live event:
    <111> 2020-02-12T03:27:09+10:00 10.0.0.3 RT_FLOW: RT_FLOW_SESSION_CREATE: action=allowed transport=ICMP dest=10.0.0.1 dest_ip=10.0.0.1 dest_port=5048 dest_zone=DUMMY_ZONE app=app1
    
    Sample created from live event:
    <111> ##timestamp_token## ##src_ip_token## RT_FLOW: RT_FLOW_SESSION_CREATE: action=##action_token## transport=##transport_type_token## dest=##dest_ip_token## dest_ip=##dest_ip_token## dest_port=##dest_port_token## dest_zone=##dest_zone_token## app=##app_token##
    

2. XML format

  • Generate sample xml file using the format provided here
  • Using xml format for the sample events, user can also execute requirement tests for the event for fields mentioned in the cim_fields and other_mappings
  • For example,
Example live event:
<111> 2020-02-12T03:27:09+10:00 10.0.0.3 RT_FLOW: RT_FLOW_SESSION_CREATE: action=allowed transport=ICMP dest=10.0.0.1 dest_ip=10.0.0.1 dest_port=5048 dest_zone=DUMMY_ZONE app=app1
Sample xml created from live event:
<?xml version="1.0" encoding="UTF-8"?>
<device>
     <vendor>Test</vendor>
     <product>Test product</product>
     <version id="8.21.0"/>
     <event code="" name="SyslogEvent" format="syslog" >
          <transport type="syslog" host="10.0.0.3"/>
          <source>
               <comment/>
          </source>
          <raw>
               <![CDATA[<111> 2020-02-12T03:27:09+10:00 10.0.0.3 RT_FLOW: RT_FLOW_SESSION_CREATE: action=allowed transport=ICMP dest=10.0.0.1 dest_ip=10.0.0.1 dest_port=5048 dest_zone=DUMMY_ZONE app=app1]]>
          </raw>
          <cim>
               <models>
                    <model>Network Traffic</model>
               </models>
               <cim_fields>
                    <field name="action" value="allowed"/>
                    <field name="dest" value="10.0.0.1"/>
                    <field name="dest_ip" value="10.0.0.1"/>
                    <field name="dest_port" value="5048"/>
                    <field name="dest_zone" value="DUMMY_ZONE"/>
                    <field name="app" value="app1"/>
                    <field name="transport" value="ICMP"/>
               </cim_fields>
               <missing_recommended_fields>
                    <field>bytes</field>
                    <field>bytes_in</field>
                    <field>bytes_out</field>
                    <field>dvc</field>
                    <field>rule</field>
                    <field>session_id</field>
                    <field>src</field>
                    <field>src_ip</field>
                    <field>src_port</field>
                    <field>src_zone</field>
                    <field>src_interface</field>
                    <field>user</field>
                    <field>protocol</field>
                    <field>vendor_product</field>
               </missing_recommended_fields>
               <exceptions>
                    <field name="vendor_product" value="Incorrect vendor product" reason="testing exceptions"/>
               </exceptions>
          </cim>
          <other_mappings>
               <field name="event_name" value="RT_FLOW_SESSION_CREATE">
          </other_mappings>
     </event>
</device>

XML Transport Node vs. Conf File Settings

The XML <transport> node and pytest-splunk-addon-data.conf settings serve different purposes. Understanding this distinction is important to avoid confusion:

Setting Source Purpose
<transport type=""> XML file Field tests only - Used for syslog header stripping in field extraction tests. NOT used for ingestion.
<transport host=""> XML file Requirement tests - Overrides conf host for requirement test searches.
<transport source=""> XML file Requirement tests - Overrides conf source for requirement test searches.
<transport sourcetype=""> XML file Field tests only - Used in field extraction test searches. NOT used for ingestion.
input_type Conf file Ingestion - Controls how events are parsed and ingested into Splunk.
host Conf file Ingestion - Base host value for events. Defaults to sample file name.
sourcetype Conf file Ingestion - The sourcetype assigned when ingesting events.
sourcetype_to_search Conf file All test searches - The sourcetype used in search queries.

Key Points:

  1. The conf file’s input_type always controls how events are ingested
  2. The XML’s <transport type=""> only affects field test behavior (e.g., stripping syslog headers)
  3. Use sourcetype_to_search in the conf file when your add-on transforms the sourcetype at index-time (via TRANSFORMS)
  4. The XML’s host and source attributes override conf values only for requirement tests

pytest-splunk-addon-data.conf.spec

Default Values:

[default]
host_type = plugin
input_type = default
index = main
sourcetype = pytest-splunk-addon
source = pytest-splunk-addon:{{input_type}}
sourcetype_to_search = {{sourcetype}}
sample_count = 1
requirement_test_sample = 1
timestamp_type = event
count = 0
earliest = now
latest = now
timezone = 0000
breaker = {{regex}}
host_prefix = {{host_prefix}}

[<sample file name>]

  • The stanza can contain the sample File Name or Regex to match multiple sample files.
  • The sample file should be located in samples folder under the Add-on package.
  • Example1: [sample_file.samples] would collect samples from file sample_file.samples
  • Example2: [sample_*.samples] would collect samples from both sample_file.samples and sample_sample.samples.

Ingestion Settings

These settings control how events are ingested into Splunk.

sourcetype = <sourcetype>

  • Purpose: Sourcetype assigned when ingesting events into Splunk
  • This is the sourcetype that Splunk receives at index-time
  • If your add-on uses TRANSFORMS to change the sourcetype, this should be the original sourcetype before transformation

source = <source>

  • Purpose: Source assigned when ingesting events into Splunk
  • default value: pytest-splunk-addon:{{input_type}}

input_type = modinput | scripted_input | syslog_tcp | file_monitor | windows_input | uf_file_monitor | default

  • Purpose: Controls how events are parsed and ingested into Splunk
  • This determines how sample files are processed:
  • modinput, windows_input: One event per line in the sample file
  • file_monitor, scripted_input, syslog_tcp, syslog_udp, default: Entire file as single event (unless breaker is specified)
  • The ingestion method is chosen to match how data flows in production for accurate index-time testing
  • For example, if sourcetype “alert” is ingested through syslog in production, use input_type=syslog_tcp

Note: This is different from the XML’s <transport type=""> which only affects field test behavior (syslog header stripping). The conf file’s input_type always controls actual ingestion.

warning: uf_file_monitor input_type will only work with splunk-type=docker.

index = <index>

  • Purpose: The index where events are ingested
  • The index must be configured beforehand
  • If the index is not available, data will not be ingested and a warning will be printed
  • Custom index is not supported for syslog_tcp or syslog_udp

host = <host>

  • Purpose: Base host value assigned when ingesting events
  • If not specified, defaults to the sample file name
  • When host_type = plugin, the plugin appends _{count} to make each event’s host unique (e.g., myhost_1, myhost_2)
  • Can be overridden per-event in XML samples via <transport host="..."> for requirement tests

Search Settings

These settings control how tests search for events in Splunk.

sourcetype_to_search = <sourcetype>

  • Purpose: The sourcetype used in search queries during tests
  • Use this when your add-on transforms the sourcetype at index-time via TRANSFORMS
  • Example: If you ingest with sourcetype=raw:data but TRANSFORMS changes it to sourcetype=parsed:data, set:
  • sourcetype = raw:data (for ingestion)
  • sourcetype_to_search = parsed:data (for searching)
  • If not specified, defaults to the value of sourcetype

Test Behavior Settings

These settings control test generation and execution.

host_type = plugin | event

  • Purpose: Determines how the host field is assigned
  • plugin: The plugin generates unique hosts with format “stanza_{count}” to identify events
  • event: The host is extracted from a token using “token.<n>.field = host”

sample_count = <count>

  • Purpose: Number of events present in the sample file
  • Used to calculate total events generated from the sample file
  • If input_type = modinput, do not provide this parameter (each line is an event)

requirement_test_sample = 1

  • Purpose: Enables requirement tests for XML sample files
  • When set to 1, the plugin parses the XML format and runs requirement tests using cim_fields and other_mappings
  • Only supported with XML sample files

expected_event_count = <count>

  • Purpose: Expected number of events for index-time line-breaking tests
  • Calculated as: (events in sample) × (values in replacementType=all tokens)
  • For example, if sample has 3 lines and a token has replacement_type=all with 2 values, then 6 events are generated
  • Optional - if not provided, calculated automatically

timestamp_type = plugin | event

  • Purpose: Determines how _time is assigned for index-time tests
  • plugin: The plugin assigns timestamp during ingestion
  • event: Timestamp is extracted from the event; requires a token with token.<n\>.field = _time

breaker = <regex>

  • Purpose: Regex pattern to split sample file into multiple events
  • Optional - if not provided, events are parsed according to input_type

host_prefix = <host_prefix>

  • Purpose: Prefix for host field identification when using SC4S ingestion

Token replacement settings

The following replacementType -> replacement values are supported

ReplacementType Replacement
static <string>
timestamp <strptime>
random ipv4
random ipv6
random mac
random guid
random integer[<start>:<end>]
random float[<start.numzerosforprecision>:<end.numzerosforprecision>]
random list[< “,” separated list>]
random hex([integer])
random file[<replacment file name, CSV file supported>:<column number / CSV header>]
random dest[“host”, “ipv4”, “ipv6”, “fqdn”]
random src[“host”, “ipv4”, “ipv6”, “fqdn”]
random host[“host”, “ipv4”, “ipv6”, “fqdn”]
random dvc[“host”, “ipv4”, “ipv6”, “fqdn”]
random user[“name”, “email”, “domain_user”, “distinquised_name”]
random url[“ip_host”, “fqdn_host”, “path”, “query”, “protocol”]
random email
random src_port
random dest_port
file <replacment file name, CSV file supported>:<column number / CSV header>
all integer[<start>:<end>]
all list[< , separated list>]
all file[<replacment file name, CSV file supported>:<column number / CSV header>]

token.<n>.token = <regular expression>

  • “n” is a number starting at 0, and increasing by 1.
  • PCRE expression used to identify segment for replacement.
  • If one or more capture groups are present the replacement will be performed on group 1.

token.<n>.replacementType = static | timestamp | random | all | file

  • “n” is a number starting at 0, and increasing by 1.
  • For static, the token will be replaced with the value specified in the replacement setting.
  • For timestamp, the token will be replaced with the strptime specified in the replacement setting. Strptime directive: https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
  • For random, the token will be replaced with a randomly picked type-aware value
  • For all, For each possible replacement value, a new event will be generated and the token will be replaced with it. The configuration can be used where a token replacement contains multiple templates/values and all of the values are important and should be ingested at least once. The number of events will be multiplied by the number of values in the replacement. For example, if sample contains 3 lines & a token replacement has list of 2 values, then 6 events will be generated. For a replacement if replacementType=’all’ is not supported, then be default plugin will consider replacementType=”random”.
  • For file, the token will be replaced with a random value retrieved from a file specified in the replacement setting.

token.<n>.replacement = <string> | <strptime> | [“list”,”of”,”values”] | guid | ipv4 | ipv6 | mac | integer[<start>:<end>] | float[<start>:<end>] | hex(<i>) | <file name> | <file name>:<column number> | host | src | dest | dvc | user | url | email | src_port | dest_port

  • “n” is a number starting at 0, and increasing by 1.

  • For <string>, the token will be replaced with the value specified.

  • For <strptime>, a strptime formatted string to replace the timestamp with

  • For guid, the token will be replaced with a random GUID value.

  • For ipv4, the token will be replaced with a random valid IPv4 Address (i.e. 10.10.200.1).

  • For ipv6, the token will be replaced with a random valid IPv6 Address (i.e. c436:4a57:5dea:1035:7194:eebb:a210:6361).

  • For mac, the token will be replaced with a random valid MAC Address (i.e. 6e:0c:51:c6:c6:3a).

  • For integer[<start>:<end>], the token will be replaced with a random integer between start and end values where <start> is a number greater than 0 and <end> is a number greater than 0 and greater than or equal to <start>. For replacement=all, one event will be generated for each value of integer within range <start> and <end>.

  • For float[<start>:<end>], the token will be replaced with a random float between start and end values where <end> is a number greater than or equal to <start>. For floating point numbers, precision will be based off the precision specified in <start>. For example, if we specify 1.0, precision will be one digit, if we specify 1.0000, precision will be four digits.

  • For hex(<i>), the token will be replaced with i number of Hexadecimal characters [0-9A-F] where “i” is a number greater than 0.

  • For list, the token will be replaced with a random member of the JSON list provided. For replacement=all, one event will be generated for each value within the list

  • For <replacement file name>, the token will be replaced with a random line in the replacement file.

    • Replacement file name should be a fully qualified path (i.e. $SPLUNK_HOME/etc/apps/windows/samples/users.list).
    • Windows separators should contain double forward slashes “\” (i.e. $SPLUNK_HOME\etc\apps\windows\samples\users.list).
    • Unix separators will work on Windows and vice-versa.
    • Column numbers in mvfile references are indexed at 1, meaning the first column is column 1, not 0.
  • For host[“host”, “ipv4”, “ipv6”, “fqdn”], 4 types of host replacement are supported. Either one or multiple from the list can be provided to randomly replace the token.

  • For host[“host”], the token will be replaced with a sequential host value with pattern “host_sample_host_<number>“.

  • For host[“ipv4”], the token will be replaced with a random valid IPv4 Address.
  • For host[“ipv6”], the token will be replaced with a random valid IPv6 Address from fdee:1fe4:2b8c:3264:0:0:0:0 range.
  • For host[“fqdn”], the token will be replaced with a sequential fqdn value with pattern “host_sample_host.sample_domain<number>.com”.

  • For src[“host”, “ipv4”, “ipv6”, “fqdn”], 4 types of src replacement are supported. Either one or multiple from the list can be provided to randomly replace the token.

    • For src[“host”], the token will be replaced with a sequential host value with pattern “src_sample_host_<number>“.
    • For src[“ipv4”], the token will be replaced with a random valid IPv4 Address from 10.1.0.0 range.
    • For src[“ipv6”], the token will be replaced with a random valid IPv6 Address from fdee:1fe4:2b8c:3261:0:0:0:0 range.
    • For src[“fqdn”], the token will be replaced with a sequential fqdn value with pattern “src_sample_host.sample_domain<number>.com”.
  • For dest[“host”, “ipv4”, “ipv6”, “fqdn”], 4 types of dest replacement are supported. Either one or multiple from the list can be provided to randomly replace the token.

    • For dest[“host”], the token will be replaced with a sequential host value with pattern “dest_sample_host_<number>“.
    • For dest[“ipv4”], the token will be replaced with a random valid IPv4 Address from 10.100.0.0 range.
    • For dest[“ipv6”], the token will be replaced with a random valid IPv6 Address from fdee:1fe4:2b8c:3262:0:0:0:0 range.
    • For dest[“fqdn”], the token will be replaced with a sequential fqdn value with pattern “dest_sample_host.sample_domain<number>.com”.
  • For dvc[“host”, “ipv4”, “ipv6”, “fqdn”], 4 types of dvc replacement are supported. Either one or multiple from the list can be provided to randomly replace the token.

    • For dvc[“host”], the token will be replaced with a sequential host value with pattern “dvc_sample_host_<number>“.
    • For dvc[“ipv4”], the token will be replaced with a random valid IPv4 Address from 172.16.0-50.0 range.
    • For dvc[“ipv6”], the token will be replaced with a random valid IPv6 Address from fdee:1fe4:2b8c:3263:0:0:0:0 range.
    • For dvc[“fqdn”], the token will be replaced with a sequential fqdn value with pattern “dvc_sample_host.sample_domain<number>.com”.
  • For user[“name”, “email”, “domain_user”, “distinquised_name”], 4 types of user replacement are supported. Either one or multiple from the list can be provided to randomly replace the token.

    • For user[“name”], the token will be replaced with a random name with pattern “user<number>“.
    • For user[“email”], the token will be replaced with a random email with pattern “user<number>@email.com”.
    • For user[“domain_user”], the token will be replaced with a random domain user pattern sample_domain.comuser<number>.
    • For user[“distinquised_name”], the token will be replaced with a distinquised user with pattern CN=user<number>.
  • For url[“full”, “ip_host”, “fqdn_host”, “path”, “query”, “protocol”], 6 types of url replacement are supported. Either one or multiple from the list can be provided to randomly replace the token.

    • For url[“ip_host”], the url to be replaced will contain ip based address.
    • For url[“fqdn_host”], the url to be replaced will contain fqdn address.
    • For path[“path”], the url to be replaced will contain path with pattern “<path>“.
    • For url[“query”], the url to be replaced will contain query with pattern “?<query>=<value>“.
    • For url[“protocol”], the url to be replaced will contain protocol with pattern “<https or http>://”.
    • For url[“full”], the url contain all the parts mentioned above i.e. ip_host, fqdn_host, path, query, protocol.
    • Example 1: url[“ip_host”, “path”, “query”], will be replaced with pattern <ip_address>/<path>?<query>=<value>
    • Example 2: url[“fqdn_host”, “path”, “protocol”], will be replaced with pattern <https or http>://<fqdn_address>/<path>
    • Example 3: url[“ip_host”, “fqdn_host”, “path”, “query”, “protocol”], will be replaced with pattern <https or http>://<ip_address or fqdn_address>/<path>?<query>=<value>
    • Example 4: url[“full”], will be replaced same as example 3.
  • For email, the token will be replaced with a random email. If the same sample has a user token as well, the email and user tokens will be replaced with co-related values.

  • For src_port, the token will be replaced with a random source port value between 4000 and 5000

  • For dest_port, the token will be replaced with a random dest port value from (80,443,25,22,21)

token.<n>.field = <field_name>

  • “n” is a number starting at 0, and increasing by 1.
  • Assign the field_name for which the tokenized value will be extracted.
  • For this key field, the index time test cases will be generated.
  • Make sure props.conf contains extractions to extract the value from the field.
  • If this parameter is not provided, the default value will be same as the token name.

NOTE: Make sure token name is not same as that any of key field values.

Example

[sample_file.samples]

sourcetype = juniper:junos:secintel:structured
sourcetype_to_search = juniper:junos:secintel:structured
source = pytest-splunk-addon:syslog_tcp
host_type = plugin
input_type = syslog_tcp
index = main
timestamp_type = event
sample_count = 10

token.0.token = (\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z)
token.0.replacementType = timestamp
token.0.replacement = %Y-%m-%dT%H:%M:%S

token.1.token = ##token1##
token.1.replacementType = static
token.1.replacement = sample_value

token.2.token = ##Src_Addr##
token.2.replacementType = random
token.2.replacement = src["ipv4"]
token.2.field = src

token.3.token = ##Dest_Addr##
token.3.replacementType = random
token.3.replacement = dest["ipv4"]

token.4.token = ##Src_Port##
token.4.replacementType = random
token.4.replacement = src_port
token.4.field = src_port

token.5.token = ##Dest_Port##
token.5.replacementType = random
token.5.replacement = dest_port

token.6.token = ##dvc##
token.6.replacementType = random
token.6.replacement = dvc["fqdn","host"]
token.6.field = dvc

token.7.token = ##User##
token.7.replacementType = random
token.7.replacement = user["name"]

token.8.token = ##HTTP_Host##
token.8.replacementType = random
token.8.replacement = host["fqdn"]

token.9.token = ##ReferenceIDhex##
token.9.replacementType = random
token.9.replacement = hex(8)

token.10.token = ##Ip##
token.10.replacementType = random
token.10.replacement = ipv4

token.11.token = ##Ipv6##
token.11.replacementType = random
token.11.replacement = ipv6

token.12.token = ##Name##
token.12.replacementType = random
token.12.replacement = list["abc.exe","def.exe","efg.exe"]

token.13.token = ##Name##
token.13.replacementType = all
token.13.replacement = list["abc.exe","def.exe","efg.exe"]

token.14.token = ##email##
token.14.replacementType = random
token.14.replacement = email

token.15.token = ##mac##
token.15.replacementType = random
token.15.replacement = mac

token.16.token = ##memUsedPct##
token.16.replacementType = random
token.16.replacement = float[1.0:99.0]

token.17.token = ##guid##
token.17.replacementType = random
token.17.replacement = guid

token.18.token = ##size##
token.18.replacementType = random
token.18.replacement = integer[1:10]

token.19.token = ##integer_all##
token.19.replacementType = all
token.19.replacement = integer[1:5]

token.20.token = ##url##
token.20.replacementType = random
token.20.replacement = url["ip_host", "fqdn_host", "path", "query", "protocol"]

token.21.token = ##DHCP_HOST##
token.21.replacementType = random
token.21.replacement = file[/path/linux.host.sample]

token.22.token = ##DHCP_HOST_all##
token.22.replacementType = all
token.22.replacement = file[/path/linux.host.sample]