Guide for data sources and getting data in¶
Important Data Sources¶
The OT environment is a combination of traditional and legacy IT technologies (e.g. firewalls, servers, workstations) combined with OT specific technologies (e.g. PLC's, RTU's). The following table outlines common data sources that should be integrated to provide full functionality with the existing OT Security Add-on, along with related data models.
The following data sources often produce value and are recommended to be collected
Data Source | Criticality to App | Data Models |
---|---|---|
Windows Security Events | Critical | Authentication, Change |
LDAP (e.g. Active Directory) | Critical | Authentication, Change |
Firewall Traffic & System Logs | Critical | Network Traffic, Network Session, Change, Authentication |
OT Security Solutions | High | Authentication, Intrusion Detection, OT Asset, Vulnerability |
Endpoint Protection | High | OT Asset, Malware |
Network Traffic & System Logs | Medium | Network Traffic, Network Session, Change, Authentication |
Patching Logs | Medium | Updates, OT Asset |
Host Information (Application, Services, OS) | Low | Inventory, Change, OT Asset, OT Software |
Other Important Information¶
There are a number of data sources which may provide contextual value around the OT environment. These data sources can help with macros and lookups which the OT Add-on leverages or to ensure the dashboards are reporting accurately. While not all of these data sources are required, having them can help enhance reporting.
Data Sources | Value |
---|---|
OT VLAN's and Subnets | In many organizations, the use of particular subnets is common practice for OT environments. This information can be used for some of the following purposes: Identification of sites, prohibited and allowed network traffic, identify locations of devices based on subnet. |
Identification of Perimeter Assets and Characteristics | Within the OT Add-on several macros attempt to identify devices classified as perimeter or boundary devices. Identifying these assets either based on asset type, ip, or other characteristics is essential for the perimeter monitoring dashboards |
List of prohibitted traffic types and flow direction | For the OT Prohibited Traffic Dashboards, the dashboards rely on building a simplified lookup that identifies all traffic that should be explicitly prohibited (for example, http or https) including direction (inbound vs outbound) and is crucial in identifying this suspicious traffic |
List of internal IP ranges (IT and OT) | This list is used to identify not only the environment traffic sources and destinations, but also to identify traffic that might be outside a organization. I list of known IP's ranges can help in configuring macros when identifying OT, IT, or External traffic |
Normal operating hours per site_id | Several reports attempt to identify activity normal working hours for sites. Having the normal operating hours including time offsets from GMT helps those reports to be accurate when reporting activity after normal working hours. |
List of permitted External Media devices | To help identify external media devices that are allowed within the OT environment, having information on those devices perimeted included parameters like host restrictions can help in identifying allowed and prohibbited activity |
National Vulnerability Database - CVEs | CVE defintions from the National Vulnerability Database (NVD) are used to correlate against detected vulnerabilities as well as identifying potential vulnerabilities. Various plugins exist to pull in this data into Splunk. |
Getting Data In (OT Specific Considerations)¶
In OT environments and use of agents and similar mechanisms may not be approved by the system vendor. The following table outlines various mechanisms to pull in data from OT environments based on customer implementations. For more details
Method | Data Source | Notes | Access Type |
---|---|---|---|
Universal Forwarder | Host bases logs including Windows events, applications installed, service configuration, ICS logs, performance monitoring | More versatility and control is data types collected from hosts | Agent based |
Existing agents | Depending on agent but could include malware, security events, and asset information | Examples include (but not limited to) Snare, Endpoint Protection, WhatsUp Gold, SCCM, and SCOM | Agent based |
SFTP/FTP | Typically text based logs | SFTP (preferred) and FTP can be used to export logs periodically to systems using a Universal Forwarder to forward the logs | Agent based |
Windows Event Forwarding | Windows Events | Can be used to collect Security, System, Application, or other specific windows events | Configuration |
Syslog | Network and firewall logs, netflow, security alerts from other products | Best practice to leverage a syslog server rather than sending directly to Splunk | Configuration |
Zeek | Networking information | Zeek can collect information on network activity such as network traffic and in some cases may support industrial protocols | Configuration |
OT Security Solutions | asset information, alerts, vulnerabilities | Most OT Security Solutions have the ability to send asset, alerts, and vulnerability information to Splunk but may change as capabilities mature. In most cases they provide this information via syslog and REST API's | Remote |
REST API's | Depending on application but could include malware, security events, vulnerability information, and asset information | Common mechanism leveraged by OT Security Products to collect alerts, asset info, and vulnerabilities | Remote |
DBConnect | ICS Logs, Alarm Information, Configuration, Patching Info, Host Based Information | Leveraged by data historians, patching solutions, ICS systems, and other systems | Remote |
WMI Collector | OS components, process & service information, applications, user accounts, security settings | Consideration should be given to scaling of WMI for large environments | Remote |
HTTP Event Collector (HEC) | System health and state information | Newer products are providing methods to collect via HEC but depends on the application | Remote |
For more general information about indexing data in Splunk Enterprise, please refer to the following documentation: Getting Data In.
Integrating with specific OT Data Models¶
Splunk for OT Security includes several data models that can be leveraged to automatically generate asset lookups. In addition, OT partners of Splunk should populate any hardware and software data captured or created by their add-ons to these data models.
Two data models have been created to facilitate populating assets into Splunk for Enterprise Security. The most critical model for asset information in the Splunk OT Security Solution is the OT Asset
data model contained in the Splunk for OT Security app. This data model is designed to be used with hardware assets such as servers, PLC's, workstations, etc. and contains all fields in the OT Asset Framework. An additional data model also exists called OT Software Asset
which is used to populate additional information regarding firmware, operating system, and software present on each OT asset. Together data from each can be combined to provide additional context around an asset as well as components installed on each asset.
The OT Add-on for Splunk does have specific requirements for parts of the ES Asset Framework field values and formats. These fields are used to tag and identify assets as belonging to OT systems or specific classifications. The following outline these requirements
Field | Restriction\Format | Sample |
---|---|---|
asset_system | Asset systems are often collections of site that may refer to a grouping of assets. While not require it is suggested for filtering purposes. | Western Operations |
asset_type | Asset types classification the purpose or function of the asset | Historian |
category | The use of static text "ot" (without quotes) is used broadly to denote which assets are part of the OT Environment. | ot | windows | nerc |
classification | Classifications related to specific frameworks should follow the format - <framework>:<value> | cip:high | cip:BCA |
site_id | Ideally this should be populated with a name of a facility of site where the asset may reside. It is used on multiple dashboards as a filter. | Johnson Refinery |
zone | Purdue zone mappings should following the following format -- purdue:level<level #> | purdue:level3 |