Parse Messages¶
Prerequisites
Before reading this section, make sure you are familiar with Sources and Read First.
This section covers how to create block parser. Every application block references a block parser that defines how to process the matched message. Within the parser, you extract fields and set Splunk metadata.
Structure¶
block parser <parser-name>() {
channel {
parser { <parsing expressions> };
rewrite { <setting new fields> };
};
};
A block parser contains a channel { } with two stages:
- Parsing (optional) — extract fields from the message using
kv-parser,csv-parser,regexp-parser,json-parser,date-parser, orsyslog-parser. - Rewrite — set Splunk destination metadata (index, sourcetype, vendor, product, template).
Rewrite functions¶
There are two rewrite functions for setting Splunk metadata:
r_set_splunk_dest_default — sets all base Splunk metadata. Includes index, sourcetype, vendor, product, and optionally source and template.
rewrite {
r_set_splunk_dest_default(
index("netops")
sourcetype("alcatel:switch")
vendor("alcatel")
product("switch")
template("t_hdr_msg")
);
};
r_set_splunk_dest_update_v2 — overrides specific fields already set by r_set_splunk_dest_default. It accepts index, source, sourcetype, class, and template options. You can also use the condition option for a conditional expression.
rewrite {
r_set_splunk_dest_update_v2(
sourcetype('citrix:netscaler:appfw')
condition(message(':(\s+\S+)?\s+APPFW(\s+\S+){3}\s+:'))
);
};
Templates¶
The template parameter in r_set_splunk_dest_default controls what part of the message is forwarded to Splunk. Templates are defined in package/etc/conf.d/conflib/_common/t_templates.conf. The most common ones:
| Template | Content | Use case |
|---|---|---|
t_hdr_msg |
${MSGHDR}${MESSAGE} |
Default for most parsers |
t_msg_only |
${MSGONLY} |
When header is not needed (e.g. Palo Alto) |
t_program_msg |
${PROGRAM}[${PID}]: ${MESSAGE} |
Program with PID and message |
t_hdr_sdata_msg |
${MSGHDR}${MSGID} ${SDATA} ${MESSAGE} |
RFC5424 with structured data |
t_kv_values |
.values.*, .SDATA.*, .metadata.* as key=value pairs |
When used, Splunk can automatically extract those values from the message |
t_json_values |
.values.*, .SDATA.*, .metadata.* as a JSON object |
Same as t_kv_values but JSON format |
t_json_values_msg |
Same as t_json_values + original message field |
When extracted fields and the raw original message are both needed |
Parsing methods¶
Parsers in syslog-ng control how incoming log messages are broken down and structured for further processing and field extraction. They analyze the message content, extracting meaningful data into named fields that can later be used in Splunk. Different log sources and formats require different parsing strategies:
kv-parser — use when logs contain key/value pairs (key=value).
Options:
prefix()— string prepended to every extracted key name. Prevents collisions with built-in syslog-ng macros and keeps parsed fields in their own namespace. For example, withprefix(".values."), a keysrc=10.0.0.1becomes the field.values.src.template()— specifies which part of the message to parse. By default,kv-parseroperates on${MESSAGE}.pair-separator()— custom delimiter between key=value pairs (default is whitespace).value-separator()— custom separator between key and value (default is=).
# parsing HDR + MSG of the log and prepending parsed values with `.values.`
parser {
kv-parser(prefix(".values.") template("$(template t_hdr_msg)"));
};
csv-parser — use when logs are consistently delimited with stable column order.
Options:
columns()— comma-separated list of column names. Each column maps positionally to a delimited field in the message. Names become field keys (prefixed ifprefix()is set).prefix()— string prepended to each column name (e.g.,prefix(".values.")turns column"src"into.values.src).delimiters()— character(s) used to split the message into columns (e.g.,','for CSV,'\t'for TSV). By default, space is used.quote-pairs()— characters used for quoting values that contain the delimiter (e.g.,'""'for double-quote pairs).template()— specifies which part of the message to parse. By default,csv-parseroperates on${MESSAGE}.flags()— parsing behavior flags:escape-double-char— treat doubled characters as a literal char inside a value, e.g., use,,to escape a single comma,.greedy— assign all remaining text to the last column instead of discarding it.drop-invalid— drop the message if the number of fields does not match the column count.
parser {
csv-parser(
columns("col1","col2","col3","col4")
prefix(".values.")
delimiters(',')
quote-pairs('""')
flags(escape-double-char)
);
};
regexp-parser — use to parse message content with regular expressions.
Options:
patterns()— one or more regular expressions with named capture groups ((?<name>...)). Each named group becomes a field.prefix()— string prepended to each captured group name.template()— which part of the message to match against (default is${MESSAGE}).
parser {
regexp-parser(
template("${MESSAGE}")
patterns('^(?<field1>\d+) (?<field2>[^ ]+) (?<field3>.*)')
prefix(".parsed.")
);
};
json-parser — use when logs are JSON-formatted.
Options:
prefix()— string prepended to each JSON key. Nested JSON objects are flattened with dots (e.g.,{"event":{"src":"10.0.0.1"}}withprefix(".values.")becomes.values.event.src).template()— which field to parse as JSON (default is${MESSAGE}).
parser {
json-parser(prefix('.values.'));
};
date-parser — use when the timestamp needs to be explicitly parsed from a field, e.g., from a non-syslog message.
Options:
format()— one or more strptime format strings to try in order (e.g.,'%s.%f'for epoch with fractional seconds,'%Y-%m-%dT%H:%M:%S%z'for ISO 8601). Multiple formats can be passed as a comma-separated list; the first one that matches wins.template()— which field contains the timestamp string to parse.
parser {
date-parser(
format('%s.%f', '%s')
template("${.tmp.timestamp}")
);
};
syslog-parser — re-parses a reconstructed syslog line. Used in almost-syslog parsers when the original message has a non-standard header that needs to be normalized first with regexp-parser and then re-parsed.
Options:
template()— the string to parse as a syslog message. Typically composed from.tmp.*fields extracted by a priorregexp-parser.flags()— parsing behavior flags:assume-utf8— assume the message is UTF-8 encoded without verification.guess-timezone— attempt to guess the timezone if not explicitly present in the timestamp.no-header— parse only the PRI field; put the rest into${MSG}.
# after regexp-parser extracted parts into .tmp.* fields
parser {
syslog-parser(
flags(assume-utf8, guess-timezone)
template("${.tmp.pri} $S_ISODATE ${.tmp.message}")
);
};
Extracted fields in Splunk¶
In most cases, Splunk extracts fields from the message at search time, using TAs or props.conf/transforms.conf files.
However, sometimes — due to the message format or the unavailability of a TA — Splunk cannot extract fields from the log body.
In that case, you can use SC4S parsers to extract the necessary fields and send them to Splunk.
There are two ways fields extracted by a parser can appear in Splunk search results.
1. Via the event body template¶
Extract fields into the .values.* namespace and set template('t_kv_values') or template('t_json_values'). These templates serialize fields from .values.*, .SDATA.*, and .metadata.* into the event body.
block parser app-syslog-vendor_product() {
channel {
parser {
kv-parser(prefix(".values."));
};
rewrite {
r_set_splunk_dest_default(
index('netfw')
sourcetype('vendor:product')
vendor('vendor')
product('product')
template('t_json_values')
);
};
};
};
If you extract into a different namespace (e.g. .parsed.*, .tmp.*), those fields will not be picked up by these templates, either use .values.* as the prefix or write a custom template.
2. Via fields.* mapping¶
Extract fields directly into the fields.* namespace. SC4S includes all fields.* name-value pairs in the HEC payload as indexed fields.
parser {
kv-parser(prefix("fields."));
};
When to use raw message templates with field extraction parsers¶
Some parsers extract fields into a temporary namespace purely for SC4S routing decisions, branching on field values to set the correct sourcetype or index, without needing those fields in Splunk. In that case, templates like t_hdr_msg or t_msg_only are appropriate, and the raw message is forwarded as-is.
Rewrite operations¶
Beyond r_set_splunk_dest_default and r_set_splunk_dest_update_v2, block parsers commonly use these rewrite operations:
set() — sets a field to a value. Supports macro expansion and conditions.
rewrite {
# copy a parsed field into HOST
set("${.values.hostname}", value("HOST"));
# conditional set
set("new_value", value("PROGRAM") condition(program('old_value' type(string))));
};
subst() — performs string substitution on a field value. Supports regex.
rewrite {
# strip leading ": " from MESSAGE
subst('^: ', "", value("MESSAGE"));
# strip path prefix from PROGRAM (e.g., "/usr/bin/app" -> "app")
subst('^\/(?:[^\/]+\/)+', "", value("PROGRAM"));
# global flag to replace all occurrences
subst('\t', " ", value("MESSAGE") flags(global));
};
unset() — removes a field entirely.
rewrite {
unset(value("PROGRAM"));
unset(value("PID"));
};
r_set_dest_splunk_null_queue — tags the message for dropping (null queue). Used in post-filters to discard noise or incomplete events.
rewrite(r_set_dest_splunk_null_queue);
map-value-pairs — remaps existing name-value pairs to a different set of names.
# map all .values.* to .SDATA.sc4sfields@27389.*, e.g., .values.src -> .SDATA.sc4sfields@27389.src
rewrite {
map-value-pairs(
key('.values.*' rekey(shift-levels(2) add-prefix(".SDATA.sc4sfields@27389.")))
);
};
Examples¶
Simple parser — no field extraction, just sets Splunk metadata:
block parser app-syslog-alcatel_switch() {
channel {
rewrite {
r_set_splunk_dest_default(
index('netops')
sourcetype('alcatel:switch')
vendor("alcatel")
product("switch")
template('t_hdr_msg')
);
};
};
};
Parser with field extraction — extracts key/value pairs, validates a required field:
block parser app-syslog-vendor_product() {
channel {
parser {
kv-parser(prefix(".values.") template("$(template t_hdr_msg)"));
};
filter {
"${.values.some_required_field}" ne ""
};
rewrite {
r_set_splunk_dest_default(
index('netfw')
sourcetype('vendor:product')
vendor("vendor")
product("product")
template('t_kv_values')
);
};
};
};
Parser with conditional branches — routes different message types to different sourcetypes:
block parser app-syslog-vendor_product() {
channel {
rewrite {
r_set_splunk_dest_default(
index("netops")
sourcetype('vendor:log')
vendor("vendor")
product('product')
template('t_msg_only')
);
};
if (message(',TRAFFIC,' type(string) flags(substring))) {
parser {
csv-parser(
delimiters(chars('') strings('|'))
columns('version', 'device_vendor', 'device_product', 'device_version', 'device_event_class', 'name', 'severity', 'ext')
prefix('.metadata.cef.')
flags(greedy));
};
rewrite {
r_set_splunk_dest_update_v2(
index('netfw')
class('traffic')
sourcetype('vendor:traffic')
);
};
} elif (message(',SYSTEM,' type(string) flags(substring))) {
parser { csv-parser(columns(...) prefix(".values.") delimiters(',')); };
rewrite {
r_set_splunk_dest_update_v2(
index('netops')
class('system')
sourcetype('vendor:system')
);
};
} else { };
};
};