Identifying Polling and Walk Issues¶
Check when SNMP WALK was executed last time for the device¶
- Configure Splunk OpenTelemetry Collector for Kubernetes or Configure Docker Logs for Splunk.
- Go to your Splunk and execute search:
index="em_logs" "Sending due task" "sc4snmp;<IP_ADDRESS>;walk"
and replacewith the pertinent IP Address.
“Empty SNMP response message” problem¶
If you see the following line in the worker’s logs:
[2022-01-04 11:44:22,553: INFO/ForkPoolWorker-1] Task splunk_connect_for_snmp.snmp.tasks.walk[8e62fc62-569c-473f-a765-ff92577774e5] retry: Retry in 3489s: SnmpActionError('An error of SNMP isWalk=True for a host 192.168.10.20 occurred: Empty SNMP response message')
worker.ignoreEmptyVarbinds
parameter to values.yaml
and set it to true.
An example configuration for a worker in values.yaml
is:
worker:
ignoreEmptyVarbinds: true
“OID not increasing” problem¶
In case you see the following line in worker’s logs:
[2022-01-04 11:44:22,553: INFO/ForkPoolWorker-1] Task splunk_connect_for_snmp.snmp.tasks.walk[8e62fc62-569c-473f-a765-ff92577774e5] retry: Retry in 3489s: SnmpActionError('An error of SNMP isWalk=True for a host 192.168.10.20 occurred: OID not increasing')
worker.ignoreNotIncreasingOid
array to values.yaml
and fill with the addresses of hosts where the problem appears.
An example configuration for a worker in values.yaml
is:
worker:
ignoreNotIncreasingOid:
- "127.0.0.1:164"
- "127.0.0.6"
If you put in only the IP address (for example, 127.0.0.1
), then errors will be ignored for all of its devices (like 127.0.0.1:161
,
127.0.0.1:163
…). If you put the IP address and host as {host}:{port}
, that means the error will be ignored only for this device.
Walking a device takes too much time¶
See Configure small walk profile to enable the small walk functionality.
An error of SNMP isWalk=True blocks traffic on the SC4SNMP instance¶
If you see many An error of SNMP isWalk=True
errors in your logs, that means that there is a connection problem
with the hosts you are polling from.
Walk will retry multiple times, which will eventually cause a worker to be blocked while it retries. In that case, you might want to limit
the maximum retry time. You can do this by setting the variable worker.walkRetryMaxInterval
, for example:
worker:
walkRetryMaxInterval: 60
With the previous configuration, ‘walk’ will retry exponentially from 30 seconds until it reaches 60 seconds. The default value for worker.walkRetryMaxInterval
is 180.
SNMP Rollover¶
The Rollover problem is due to a finite stored integer value (especially when the value is 32-bit).
When it reaches its maximum, it gets rolled down to 0 again. This causes a strange drop in Analytics data.
The most common case of this issue is interface speed on high speed ports. As a solution to this problem, SNMPv2 SMI defined a new object type, counter64, for 64-bit counters, see https://www.cisco.com/c/en/us/support/docs/ip/simple-network-management-protocol-snmp/26007-faq-snmpcounter.html.
Not all the devices support it, but if they do, poll the counter64 type OID instead of the counter32 one.
For example, use ifHCInOctets
instead of ifInOctets
.
If 64-bit counter is not supported on your device, you can write your own Splunk queries that calculate the shift based on the maximum integer value and the current state. The same works for values large enough that they don’t fit into a 64-bit value. An example for an appropriate Splunk query would be the following:
| streamstats current=f last(ifInOctets) as p_ifInOctets last(ifOutOctets) as p_ifOutOctets by ifAlias
| eval in_delta=(ifInOctets - p_ifInOctets)
| eval out_delta=(ifOutOctets - p_ifOutOctets)
| eval max=pow(2,64)
| eval out = if(out_delta<0,((max+out_delta)*8/(5*60*1000*1000*1000)),(out_delta)*8/(5*60*1000*1000*1000))
| timechart span=5m avg(in) AS in, avg(out) AS out by ifAlias
Polling authentication errors¶
Unknown USM user¶
In case of polling SNMPv3 devices, Unknown USM user
error suggests wrong username. Verify
that the kubernetes secret with the correct username has been created (SNMPv3 configuration).
Wrong SNMP PDU digest¶
In case of polling SNMPv3 devices, Wrong SNMP PDU digest
error suggests wrong authentication key. Verify
that the kubernetes secret with the correct authentication key has been created (SNMPv3 configuration).
No SNMP response received before timeout¶
No SNMP response received before timeout
error might have several root causes. Some of them are:
- wrong device IP or port
- SNMPv2c wrong community string
- SNMPv3 wrong privacy key
“Field is immutable” error during helm upgrade¶
microk8s helm3 upgrade --install snmp -f values.yaml splunk-connect-for-snmp/charts/splunk-connect-for-snmp/ --namespace=sc4snmp --create-namespace
Error: UPGRADE FAILED: cannot patch "snmp-splunk-connect-for-snmp-inventory" with kind Job: Job.batch "snmp-splunk-connect-for-snmp-inventory" is invalid: (...) : field is immutable
The immutable error is due to the limitation placed on an inventory job. As the SC4SNMP requires several checks before applying updates, it is designed to allow changes in the inventory task after 5 minutes.
The status of the inventory can be checked with the following command:
microk8s kubectl -n sc4snmp get pods | grep inventory
If the changes are required to be applied immediately, the previous inventory job can be deleted with the following command:
microk8s kubectl delete job/snmp-splunk-connect-for-snmp-inventory -n sc4snmp
“The following profiles have invalid configuration” or “The following groups have invalid configuration” errors¶
Following errors are examples of wrong configuration:
The following groups have invalid configuration and won't be used: ['group1']. Please check indentation and keywords spelling inside mentioned groups configuration.
The following profiles have invalid configuration and won't be used: ['standard_profile', 'walk_profile']. Please check indentation and keywords spelling inside mentioned profiles configuration.
- kubernetes: Configuring profiles or Configuring Groups
- docker: Scheduler configuration
sections to check how the correct configuration should look like.