Free-form Prompts¶

v0.0.5 release: this entire feature is disabled pending review

The free-form / LLM-driven path is disabled at compile time in the v0.0.5 release pending internal review. Chat input is greyed out, Send button is disabled, model picker and Power Mode toggle are hidden, and the Provider Credentials Settings tab is hidden. None of the behavior described on this page is reachable in a v0.0.5 build. This page is preserved as the design reference for the future release that re-enables the LLM path. See Templates-only Build for the build mechanism and OWASP LLM Top 10 Compliance for the security posture under review.

Free-form prompts are the LLM-driven path through the AI Assistant. The user types a natural-language question; the orchestrator sends a system primer + the question + tool definitions to one of four supported LLM providers; the vendor picks tools, the orchestrator dispatches them via the Splunk MCP Server, and the vendor synthesizes a narrative response from the privacy-tier-bounded summaries. The narrative ends up in the chat panel on the left; the actual data lands in tool-result tiles on the right.

This path requires a configured LLM provider credential, is governed by the active privacy tier, and is disabled at compile time in the Templates-only build variant.

The Four Supported Providers¶

Provider	API Endpoint	Auth	Models
Anthropic	`api.anthropic.com` (direct)	API key	Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5
OpenAI	`api.openai.com` (direct)	API key	GPT-4o / GPT-5 family (per OpenAI catalog)
Azure OpenAI	Customer’s Azure deployment URL	Azure auth + URL pattern	Per Azure deployment configuration
AWS Bedrock	Bedrock API (Claude on Bedrock)	Bedrock API Keys (no SigV4 signing)	Claude on Bedrock — same models as Anthropic direct

Active provider + default model are admin-configured in Settings → General. Per-user model picker in the chat panel’s privacy banner lets users switch within the active provider’s models[] list.

For AWS Bedrock customers using IAM-only credentials (no Bedrock API Keys available), a future release will add a server-side proxy that signs requests with SigV4. See Auto-Mint MCP Token Roadmap. Currently roadmap, not yet shipped.

The Free-form Flow¶

   User types question  -->  Orchestrator builds system primer + tools + history
                                                          |
                                                          v
                                       LLM vendor (streamed response)
                                                          |
                                  AI emits zero or more tool_use blocks
                                                          |
                                                          v
                          For each tool_use, orchestrator dispatches via MCP
                                                          |
                                              MCP returns Hidden<MCPToolResult>
                                                          |
                                  sanitize() chokepoint -> Tier 1 / Tier 2 summary
                                                          |
                                                          v
                                  AI continues, may emit more tool_use blocks
                                                          |
                          (loop until AI emits a final assistant_text response)
                                                          |
                                                          v
                                  Narrative renders in chat; tiles render in right pane

Tool dispatch is parallelized — when the AI emits multiple tool_use blocks in a single turn (e.g., dispatching 5 saved searches at once for a “find the top issues” question), the orchestrator dispatches all of them concurrently via Promise.all. The slowest one bounds the turn latency, not the sum.

The Two MCP Tools the AI Sees¶

The AI’s tool definitions on every free-form request are:

`splunk_run_saved_search`¶

Dispatch a saved search from the LogServ App’s catalog (one of the 48 prompts described in Predefined Prompts).

Arg	Type	Description
`name`	string (required)	Saved-search name from the catalog (e.g., `logserv_hana_failed_auth`)
`earliest_time`	string (optional)	Splunk earliest token (e.g., `-24h`, `-7d`). Falls back to the dashboard’s TimeRange picker.
`latest_time`	string (optional)	Splunk latest token (e.g., `now`).
`render_hint`	string (optional)	One of `table` / `timechart` / `kpi` / `pie`. Falls back to the catalog’s per-prompt `renderHint`.
`top_n`	integer (optional, default 10, max 50)	Width of categorical aggregates the AI receives in Tier 2 summary. The AI passes this when the user asks for “top 25 X” or similar.

`splunk_run_query`¶

Dispatch ad-hoc SPL written by the AI. Used when no saved search fits the user’s question.

Arg	Type	Description
`query`	string (required)	SPL string. Must start with the LogServ macro `\`sap_logserv_idx_macro`` (the AI’s primer enforces this) and use only read-only commands.
`earliest_time`	string (optional)	Same as above.
`latest_time`	string (optional)	Same as above.
`render_hint`	string (optional)	Same as above.
`top_n`	integer (optional)	Same as above.

SPL static-analysis guard (LLM06 — Excessive Agency): the orchestrator runs every splunk_run_query SPL through a guard that blocks collect, outputlookup, outputcsv, delete, sendalert, sendemail, script, run, tscollect. Blocked SPL produces a synthetic error tool_result + a security_blocked_spl audit event. The AI sees the error and can recover by writing a different query.

The System Primer¶

A system-message prelude is sent on every free-form request. The primer teaches the AI:

A data-boundary rule that distinguishes customer data from instructions (mitigates LLM01 / LLM04)
The catalog of 48 saved searches (so the AI prefers splunk_run_saved_search when a saved search fits)
The LogServ data model (sourcetypes + key fields) for ad-hoc SPL
The read-only-operators list for splunk_run_query
The time-window reasoning rules that kick in for severity claims
Synthesis rules — lettered findings, severity dots, citation format

Two primer variants ship — one for each cloud tier — and the active tier picks which one is sent. For the primer’s full architecture (boundary block, primer constants, per-variant content), see AI Assistant Implementation Reference.

Citation + Drill-Down Chips in the Narrative¶

The AI’s narrative response uses a citation format [→ saved_search_name] to attribute each finding to its dispatched tool. Each citation in the chat becomes a clickable scroll-to-tile span (clicking it scrolls the right pane to the matching tile). The parser also auto-appends ↗ Dashboard chips (one per related OOTB dashboard) and a ↗ Run SPL chip on the same line — see Drill-down Chips.

Severity markers ([severity:critical|high|medium|low]) are rendered as glossy colored dots inline next to the finding’s alpha letter. The finding format is enforced by the primer:

A. [severity:high] Cross-stack auth failures concentrated on Windows.
[→ logserv_cross_stack_auth_failures] 7 of the top-10 failing-stack rows are
Windows; one user account hit 4,732 cumulative attempts — verify-query (-24h)
confirmed 412 of those landed today, ~17/hr, an active rate.

Per-User Rate Limiting¶

Free-form prompts are subject to a per-user rolling-1-hour rate limit, configurable in Settings → General (default 30 prompts/hour, 0 = disabled). Canned-prompt dispatches are intentionally NOT rate-limited — they bypass the AI vendor entirely (no token cost) and are bounded by Splunk’s own search-quota controls.

When a user hits the cap, the next prompt:

Renders the user message in chat as usual (preserves history continuity).
Surfaces a system_notice in chat: “Rate limit reached: N prompts in the last hour. Try again at HH:MM.”
Records a rate_limited_prompt audit event with the user’s identity, the cap value, and the timestamp.
Does NOT invoke the LLM vendor.

The cap is per-user, not per-session — opening a new browser tab doesn’t reset it. Maps to LLM10 — Unbounded Consumption.

Token-Usage Audit + USD Cost Estimate¶

Every Tier 1 / Tier 2 vendor call records a vendor_tier1 or vendor_tier2 audit event with:

Provider (anthropic / openai / azure_openai / bedrock)
Model
Input tokens, output tokens, total tokens
USD cost estimate (per-vendor pricing table — see AI Assistant Implementation Reference)
Outbound bytes
Prompt length (chars)
Number of tool turns in this dispatch
For Tier 2: number of PII redactions applied

The Audit Log tab in Settings provides aggregate views; the USD cost estimate is a sticker price (not the customer’s negotiated rate, which we don’t know) and should be treated as an order-of-magnitude indicator. See Audit Log.

Daily Spend Cap¶

To prevent runaway vendor spend, the orchestrator enforces a per-app-instance daily USD spend cap (configurable in Settings, default $X / day; admin sets per organization risk tolerance). When a vendor call would push cumulative cost over the cap, the orchestrator:

Refuses the dispatch.
Records a daily_spend_cap_hit audit event.
Surfaces a system_notice: “Daily spend cap reached ($X.XX of $Y.YY budgeted). Resets at 00:00 UTC. Contact your admin to raise the cap.”

The cap is cumulative-cost-based, not request-count-based — a single high-token-count free-form turn counts against the budget the same way many cheap turns do. Maps to LLM10 — Unbounded Consumption.

Streaming + Abort¶

The vendor response is streamed token-by-token. The chat-side rendering shows partial responses as they arrive, including partial tool_use blocks. The status indicator switches between three states:

streaming — AI is generating text or about to emit a tool_use
tool_executing — orchestrator is dispatching a tool the AI requested
idle — turn complete

A red Stop button appears in the chat input toolbar during streaming / tool_executing states. Clicking it aborts the in-flight vendor call (closes the SSE stream) and any pending tool dispatches. The conversation history preserves what was emitted up to the abort point so the user can iterate from there.

Jailbreak Pattern Detection¶

Every user prompt is run through a jailbreak-pattern analyzer before dispatch. The analyzer is flag-and-proceed: a match fires a user_prompt_jailbreak_flag audit event but does NOT block the prompt. The defense-in-depth chain (type-system enforcement + Tier 2 sanitizer + tool-result sentinel + primer + vendor-side defenses) already covers the threat; the analyzer adds SOC observability for the user-prompt vector. Each flag captures a hash of the prompt (so SOC can correlate without archiving plain text), which patterns matched, and a character-class fingerprint to surface unusual encodings.

For the analyzer’s pattern groups + audit event schema, see AI Assistant Implementation Reference. Maps to LLM01 — Prompt Injection.

Session Tool-Call Cap¶

In addition to the per-user rate limit, every chat session has a per-session tool-dispatch cap to prevent infinite tool loops (a misbehaving model emitting tool_use after tool_use without ever producing a final assistant_text). Counter resets on chat clear. When exceeded, dispatch is refused with a synthetic error tool_result and a session_tool_cap_hit audit event records the hit. Maps to LLM06 — Excessive Agency.