Architecture¶
Two-Package Model¶
The SAP LogServ solution for Splunk is delivered as two separately installable packages:
| Package | App ID | Purpose |
|---|---|---|
| Data TA | splunk_ta_sap_logserv |
Data collection, index-time filtering, deployment server automation, configuration UI, ships the indexes.conf for sap_logserv_logs (SAP data) and _ai_assistant_audit (AI Assistant audit log) |
| LogServ App | splunk_app_sap_logserv |
Dashboards, AI Assistant, Environment Topology view, search-time field extractions, macros |
The Data TA handles everything that happens at index time: ingesting data from S3, routing events to the correct sourcetype, applying index-time filters, and defining the two indexes the solution writes to. It includes Python scripts, REST handlers, a configuration UI built with Splunk’s UCC framework, and default/indexes.conf so Splunk auto-creates sap_logserv_logs and _ai_assistant_audit on first install. Both index names are macro-configurable via sap_logserv_idx_macro (SAP data) and sap_logserv_audit_idx_macro (audit log); customers who rename either index update the matching macro definition. The _ai_assistant_audit index is required for the AI Assistant’s audit log to function — without it, audit events have no destination index.
The LogServ App handles everything that happens at search time: field extractions, field aliases, computed fields, and the dashboards you use to visualize and analyze the data. It contains no Python code and no data collection components.
Why two packages?
The split follows Splunk best practices for distributed deployments. The Data TA runs at the ingest tier (Heavy Forwarders for distributed deployments, plus the Indexer where the bundled indexes.conf provisions storage) and the LogServ App runs on the Search Head where users interact with dashboards. Keeping these tiers separate means search-only logic (field extractions, dashboards, the AI Assistant chat panel) doesn’t bloat the forwarders, and ingest logic (sourcetype routing, REST handlers, UCC config UI) doesn’t leak into the Search Head.
Install Matrix¶
Where you install each package depends on your Splunk topology:
| Topology | Data TA | LogServ App |
|---|---|---|
| Single instance | Same instance | Same instance |
| DS + HFs + on-prem SH | Deployment Server + each HF + Indexer | Search Head only |
| DS + HFs + Splunk Cloud | Deployment Server + each HF (Splunk Cloud admin handles the indexer tier — Data TA installed there provides the index defs) | Splunk Cloud SH only |
Important
- The Data TA is never installed directly on Heavy Forwarders when using a Deployment Server – the DS distributes it automatically.
- The LogServ App is never installed on Heavy Forwarders or the Deployment Server.
- On Splunk Cloud, the customer’s Splunk Cloud admin handles the indexer tier separately. The Data TA installed on that indexer provides the bundled index definitions.
- For single-instance deployments, both packages are installed on the same instance and Splunk merges their configurations at runtime.
Data Flow¶
The diagram below shows how SAP LogServ data flows from the SAP ECS environment into Splunk:
SAP ECS Environment
|
v
SAP LogServ S3 Bucket (SAP-managed)
|
v (S3 event notifications via SQS)
Customer's AWS Account
+-----------------------------------------------+
| Destination S3 Bucket --> SQS Queue |
| (S3 events trigger SQS messages) |
+-----------------------------------------------+
|
v (Splunk AWS Add-on reads from SQS)
Splunk Heavy Forwarders
+-----------------------------------------------+
| 1. Ingest NDJSON from S3 via SQS |
| 2. Route to sourcetype (TRANSFORMS) |
| 3. Apply index-time filters (nullQueue) |
| 4. Forward to indexer |
+-----------------------------------------------+
|
v
Splunk Indexer
+-----------------------------------------------+
| Stores events in sap_logserv_logs index |
+-----------------------------------------------+
|
v
Splunk Search Head
+-----------------------------------------------+
| LogServ App: dashboards + field extractions |
+-----------------------------------------------+
Index-Time Filtering¶
The Data TA provides built-in index-time filtering that lets you control which log types are indexed. Filtering happens on the Heavy Forwarders using TRANSFORMS-based queue routing:
- Include patterns – Only ingest log types that match the pattern (e.g.,
linux/*to include only Linux logs) - Exclude patterns – Drop specific log types (e.g.,
linux/cronto exclude cron logs) - Days in past – Drop data older than a specified number of days based on the S3 object path date
Filtered events are routed to nullQueue and never consume Splunk license. Filter settings are configured on the Deployment Server and pushed to Heavy Forwarders automatically.
See Configuring Filters for detailed setup instructions.
Sourcetype Routing¶
All SAP LogServ data arrives in a single generic format. The Data TA examines each event’s metadata during index-time parsing and routes it to the appropriate Splunk sourcetype. Routing is defined in transforms.conf using regex-based sourcetype assignment.
Two routing strategies are used:
- Source-path matching – For log types with unique
sourcefield values (e.g.,/var/log/messagesfor Linux syslog,/var/log/squid/access.logfor Squid proxy) - Classification field matching – For SAP application logs that share similar source paths, routing matches the
clz_dirandclz_subdirfields in the NDJSON envelope. When the sameclz_subdirvalue appears under multipleclz_dirpaths (e.g.,auditexists under bothabap/andscc/), compound lookahead regexes match both fields simultaneously to avoid collisions.
For the complete list of supported log types and their sourcetype mappings, see Supported Log Types.
v0.0.5.0 React App Architecture¶
The LogServ App was rewritten as a React application in v0.0.5.0. The Data TA architecture is unchanged from v0.0.4.x — only the UI tier changed.
Stack:
- Build pipeline: webpack-based bundle build atop
@splunk/webpack-configs. Each Splunk app page resolves to a single React bundle. - UI primitives:
@splunk/react-ui(forms, tables, modals),@splunk/visualizations(charts),@xyflow/react(Topology graph),styled-componentsfor theming. - State management: React context for cross-cutting concerns (AI Assistant, time range, refresh ticker);
useState/useReducerfor component-local state. No Redux. - Data fetching: custom
useSearchhook wraps@splunk/search-jobfor SPL dispatches; results exposerows / loading / errorto consuming components. - Routing: React Router 7 with
HashRouterso URLs survive Splunk Web’s app-namespace routing; query strings carry time-range hydration (?earliest=...&latest=...).
Build-time feature flags:
The build supports compile-time variants. The first such flag is TEMPLATES_ONLY: when set, the resulting bundle has the AI Assistant’s free-form / LLM-driven flow disabled at compile time, NOT runtime — there is no runtime setting that could re-enable it. See the Templates-only Build page for the user-facing implications.
Static-asset cache busting:
Splunk Web caches the React bundle’s asset URL by an integer [install] build field in app.conf. Every meaningful code change bumps this number; without bumping, browsers serve stale bytes after deploy. The 3-part SemVer in [id] version is independent of the build number and changes only on user-facing version bumps.
AI Assistant Architecture¶
The AI Assistant is a chat-style panel embedded in the React UI App that lets analysts run pre-canned investigations + free-form prompts against their Splunk data. It has two distinct paths and a strong privacy invariant.
Privacy invariant — type-system-enforced, not policy-enforced:
User question --> AI vendor --> AI picks tools --> MCP server --> Splunk
| |
| <---- Hidden<MCPToolResult> <------+
|
v (sanitize chokepoint: count + timing only — Tier 1)
| (or aggregated metadata — Tier 2)
v
AI synthesizes narrative reply
Tool results from the Splunk MCP Server are typed Hidden<MCPToolResult> in TypeScript. The compiler refuses to put a Hidden<T> value into the outbound vendor payload — the only way to convert it is via sanitize(hidden, summarizer), which forces the caller to provide a non-data summary. The summarizer is gated by the active privacy tier:
- Tier 0 (Ollama, future) — air-gapped local LLM; no vendor traffic at all.
- Tier 1 (default) — summary is
count + execution_timeonly. AI sees no values. - Tier 2 (admin opt-in) — summary adds aggregated metadata (per-column cardinality, top-N values + counts for categorical, min/max/avg/sum for numeric, time range when
_timeis present). Still no raw rows.
Two paths:
- Predefined prompts (no LLM call): the user opens the prompt browser and clicks one of the 48 cataloged prompts. The orchestrator dispatches the saved search via the Splunk MCP Server, renders the result tile in the right pane, and appends a static interpretation + suggested-next-steps card. No vendor LLM is invoked. This is the path used in the templates-only build.
- Free-form prompts (LLM-driven): the user types a natural-language question. The orchestrator sends the system primer + user message + tool definitions to the active vendor (Anthropic / OpenAI / Azure OpenAI / AWS Bedrock). The vendor picks tools, the orchestrator dispatches them in parallel via MCP, the vendor sees only the privacy-tier summary, and the vendor synthesizes a narrative response.
Audit log:
Every AI Assistant action — both paths — produces audit events into a dedicated _ai_assistant_audit index. Categories include local_only (canned-prompt dispatches), vendor_tier1 / vendor_tier2 (LLM calls with token counts + USD cost estimate), security_blocked_spl, user_prompt_jailbreak_flag, session_tool_cap_hit, daily_spend_cap_hit, audit_forwarder_failure, plus three legal-acknowledgement categories. The Audit Log tab in Settings provides an in-app browser; an optional HEC forwarder can stream events to a separate Splunk / SIEM destination for tamper-evidence.
Splunk MCP Server prerequisite:
The AI Assistant requires Splunk MCP Server (Splunkbase App 7931) v1.1.0 or later, installed on the same search head as the LogServ App. Cookie auth from the same Splunk Web session works by default on HTTP-only Splunk; an optional bearer token can be configured for OAuth-strict environments. See Splunk MCP Setup for end-to-end configuration, troubleshooting, and the auto-mint roadmap.
Environment Topology Architecture¶
The Environment Topology view (also accessible via URL slug integration-topology for backward compatibility) is a graph-based visualization of SAP systems, integration partners, and endpoints across the SAP landscape. It is implemented as a React component on top of @xyflow/react.
Data sources:
The view assembles its node + edge inventory from a union of six SPL searches against the existing sourcetypes (no new ingest required):
sap:abap:gateway— RFC peer/local IPs (P=<peer>/L=<local>fields)sap:abap:icm— ICM peer/local IPssap:hana:tracelogs— HANA host + tenant SID extracted from the source path (/usr/sap/<HANA_SID>/HDB<inst>/<host>/trace/DB_<TENANT_SID>/)sap:saprouter— peer hostname extracted from the parens afterhost <ip>/<service> (<resolved.host>)linux_messages_syslogwith osquery cpu_brand events — host inventory (CPU, RAM, OS, region, AZ, instance ID)- Default Splunk
hostfield — fallback for hosts not surfaced in the above
Self-derived IP→SID inventory:
The “which IP belongs to which SID” mapping is derived from a multi-source union SPL with a mvcount(sids)=1 filter — a host whose multiple sourcetypes all agree on a single SAP SID is unambiguously attributed; otherwise it’s surfaced as “unknown”. Resolution depends on what your data exposes: unique hostname/IP appearances across multiple SAP sourcetypes (HANA tracelogs, ABAP gateway L=, ICM peer fields, saprouter peer hostnames) attribute cleanly, while shared NAT IPs and external partners typically remain unknown. Additional inventory sources can be added by appending another union arm — the inventory framework is extensible per-customer without new ingest.
Saved layouts:
User-arranged graph layouts are persisted via Splunk KV Store collection logserv_topology_layouts. The schema (currently v4) carries node positions, panel state, viewport zoom + pan, enabled integration types, selected node, active right-sidebar tab, and snap mode. Layouts are per-user-named (an admin can save a default layout that other users see; users can save their own variants). Schema migration is in-memory: v1 / v2 / v3 records still load.
Live mode auto-refresh:
The toolbar’s Live mode toggle drives a 30-second auto-refresh that re-runs all SPL queries on the topology view; saved layouts are preserved across ticks. Coexists with the per-dashboard auto-refresh picker — currently both contribute additively to the refresh nonce; consolidation is planned for a future release.