Worker configuration¶

The worker is responsible for the actual execution of polling, processing trap messages, and sending data to Splunk.

Worker types¶

SC4SNMP has three types of workers:

The poller worker consumes all the tasks related to polling.
The trap worker consumes all the trap related tasks produced by the trap service.
The sender worker handles sending data to Splunk. You need to always have at least one sender running.

SC4SNMP also has a discovery functionality which is handled by the worker below:

The discovery worker consumes all the tasks related to discovery.

Configuration¶

microk8sdocker compose

Worker configuration is kept in the worker section of values.yaml:

worker:
  poller:
    replicaCount: 2
    concurrency: 4
    prefetch: 1
    maxTasksPerChild: 0
    autoscaling:
      enabled: false
      minReplicas: 2
      maxReplicas: 10
      targetCPUUtilizationPercentage: 80

    resources:
      limits:
        cpu: 500m
      requests:
        cpu: 250m

  trap:
    replicaCount: 2
    resolveAddress:
      enabled: false
      cacheSize: 500 
      cacheTTL: 1800 
    enableIncludeUnresolvedVarbinds: false
    concurrency: 4
    prefetch: 30
    maxTasksPerChild: 0
    autoscaling:
      enabled: false
      minReplicas: 2
      maxReplicas: 10
      targetCPUUtilizationPercentage: 80
    resources:
      limits:
        cpu: 500m
      requests:
        cpu: 250m
  sender:
    replicaCount: 1
    concurrency: 4
    prefetch: 30
    maxTasksPerChild: 0
    autoscaling:
      enabled: false
      minReplicas: 2
      maxReplicas: 10
      targetCPUUtilizationPercentage: 80
    resources:
      limits:
        cpu: 500m
      requests:
        cpu: 250m
  discovery:
    replicaCount: 1
    concurrency: 4
    prefetch: 30
    maxTasksPerChild: 0
    autoscaling:
      enabled: false
      minReplicas: 2
      maxReplicas: 10
      targetCPUUtilizationPercentage: 80
    resources:
      limits:
        cpu: 500m
      requests:
        cpu: 250m
  livenessProbe:
    enabled: false
    exec:
      command:
        - sh
        - -c
        - test $(($(date +%s) - $(stat -c %Y /tmp/worker_heartbeat))) -lt 10
    initialDelaySeconds: 80
    periodSeconds: 10
  readinessProbe:
    enabled: false
    exec:
      command:
        - sh
        - -c
        - test -e /tmp/worker_ready
    initialDelaySeconds: 30
    periodSeconds: 5
  taskTimeout: 2400
  walkRetryMaxInterval: 180
  walkMaxRetries: 5
  ignoreNotIncreasingOid: []
  logLevel: "INFO"
  disableMongoDebugLogging: true
  podAntiAffinity: soft
  udpConnectionTimeout: 3
  udpConnectionRetries: 5
  ignoreEmptyVarbinds: false

To apply changes, run the upgrade command:

microk8s helm3 upgrade --install snmp -f values.yaml splunk-connect-for-snmp/splunk-connect-for-snmp --namespace=sc4snmp --create-namespace

Worker parameters¶

Variable	Description	Default
worker.poller.replicaCount	Number of poller worker replicas	2
worker.poller.concurrency	Minimum number of threads in a poller worker pod	4
worker.poller.prefetch	Number of tasks consumed from the queue at once	1
worker.poller.maxTasksPerChild	Max number of tasks a poller worker child process can execute before being recycled. `0` disables recycling	0
worker.poller.autoscaling.enabled	Enabling autoscaling for poller worker pods	false
worker.poller.autoscaling.minReplicas	Minimum number of running poller worker pods when autoscaling is enabled	2
worker.poller.autoscaling.maxReplicas	Maximum number of running poller worker pods when autoscaling is enabled	10
worker.poller.autoscaling.targetCPUUtilizationPercentage	CPU % threshold that must be exceeded on poller worker pods to spawn another replica	80
worker.poller.resources.limits	The resources limits for poller worker container	cpu: 500m
worker.poller.resources.requests	The requested resources for poller worker container	cpu: 250m
worker.trap.replicaCount	Number of trap worker replicas	2
worker.trap.concurrency	Minimum number of threads in a trap worker pod	4
worker.trap.prefetch	Number of tasks consumed from the queue at once	30
worker.trap.maxTasksPerChild	Max number of tasks a trap worker child process can execute before being recycled. `0` disables recycling	0
worker.trap.resolveAddress.enabled	Enable reverse dns lookup of the IP address of the processed trap	false
worker.trap.resolveAddress.cacheSize	Maximum number of reverse dns lookup result records stored in cache	500
worker.trap.resolveAddress.cacheTTL	Time to live of the cached reverse dns lookup record in seconds	1800
worker.trap.enableIncludeUnresolvedVarbinds	Include trap varbinds that could not be MIB-translated under `sc4snmp::unresolved` in Splunk events	false
worker.trap.autoscaling.enabled	Enabling autoscaling for trap worker pods	false
worker.trap.autoscaling.minReplicas	Minimum number of running trap worker pods when autoscaling is enabled	2
worker.trap.autoscaling.maxReplicas	Maximum number of running trap worker pods when autoscaling is enabled	10
worker.trap.autoscaling.targetCPUUtilizationPercentage	CPU % threshold that must be exceeded on trap worker pods to spawn another replica	80
worker.trap.resources.limits	The resource limits for trap worker pod	cpu: 500m
worker.trap.resources.requests	The requested resources for trap worker pod	cpu: 250m
worker.sender.replicaCount	The number of sender worker replicas	1
worker.sender.concurrency	Minimum number of threads in a sender worker pod	4
worker.sender.prefetch	Number of tasks consumed from the queue at once	30
worker.sender.maxTasksPerChild	Max number of tasks a sender worker child process can execute before being recycled. `0` disables recycling	0
worker.sender.autoscaling.enabled	Enabling autoscaling for sender worker pods	false
worker.sender.autoscaling.minReplicas	Minimum number of running sender worker pods when autoscaling is enabled	2
worker.sender.autoscaling.maxReplicas	Maximum number of running sender worker pods when autoscaling is enabled	10
worker.sender.autoscaling.targetCPUUtilizationPercentage	CPU % threshold that must be exceeded on sender worker pods to spawn another replica	80
worker.sender.resources.limits	The resource limits for sender worker pod	cpu: 500m
worker.sender.resources.requests	The requested resources for sender worker pod	cpu: 250m
worker.discovery.replicaCount	Number of discovery worker replicas	1
worker.discovery.concurrency	Minimum number of threads in a discovery worker pod	4
worker.discovery.prefetch	Number of tasks consumed from the queue at once	30
worker.discovery.maxTasksPerChild	Max number of tasks a discovery worker child process can execute before being recycled. `0` disables recycling	0
worker.discovery.autoscaling.enabled	Enabling autoscaling for discovery worker pods	false
worker.discovery.autoscaling.minReplicas	Minimum number of running discovery worker pods when autoscaling is enabled	2
worker.discovery.autoscaling.maxReplicas	Maximum number of running discovery worker pods when autoscaling is enabled	10
worker.discovery.autoscaling.targetCPUUtilizationPercentage	CPU % threshold that must be exceeded on discovery worker pods to spawn another replica	80
worker.discovery.resources.limits	The resources limits for discovery worker container	cpu: 500m
worker.discovery.resources.requests	The requested resources for discovery worker container	cpu: 250m
worker.livenessProbe.enabled	Whether the liveness probe is enabled	false
worker.livenessProbe.exec.command	The exec command for the liveness probe to run in the container	Check values.yaml
worker.livenessProbe.initialDelaySeconds	Number of seconds after the container has started before liveness probe is initiated	80
worker.livenessProbe.periodSeconds	Frequency of performing the probe in seconds	10
worker.readinessProbe.enabled	Whether the readiness probe should be turned on or not	false
worker.readinessProbe.exec.command	The exec command for the readiness probe to run in the container	Check values.yaml
worker.readinessProbe.initialDelaySeconds	Number of seconds after the container has started before readiness probe is initiated	30
worker.readinessProbe.periodSeconds	Frequency of performing the probe in seconds	5
worker.taskTimeout	Task timeout in seconds when process takes a long time	2400
worker.walkRetryMaxInterval	Maximum time interval between walk attempts	180
worker.walkMaxRetries	Maximum number of walk retries	5
worker.ignoreNotIncreasingOid	Ignoring `occurred: OID not increasing` issues for hosts specified in the array	[]
worker.logLevel	Logging level, possible options: DEBUG, INFO, WARNING, ERROR, CRITICAL, or FATAL	INFO
worker.disableMongoDebugLogging	Disable extensive MongoDB and pymongo debug logging on SC4SNMP workers	true
worker.udpConnectionTimeout	Timeout for SNMP operations in seconds	3
worker.udpConnectionRetries	Number of SNMP UDP retries per operation	5
worker.ignoreEmptyVarbinds	Ignores “Empty SNMP response message” in responses	false
worker.podAntiAffinity	Kubernetes documentation	soft

Worker configuration is set via environment variables in .env. The key variables for each worker type are:

# Workers configuration
WALK_RETRY_MAX_INTERVAL=180
WALK_MAX_RETRIES=5
METRICS_INDEXING_ENABLED=false
POLL_BASE_PROFILES=true
IGNORE_NOT_INCREASING_OIDS=
WORKER_LOG_LEVEL=INFO
WORKER_DISABLE_MONGO_DEBUG_LOGGING=true
UDP_CONNECTION_TIMEOUT=3
MAX_OID_TO_PROCESS=70
MAX_REPETITIONS=10

# Worker Poller
WORKER_POLLER_CONCURRENCY=4
PREFETCH_POLLER_COUNT=1
WORKER_POLLER_REPLICAS=2
WORKER_POLLER_CPU_LIMIT=1
WORKER_POLLER_MEMORY_LIMIT=500M
WORKER_POLLER_CPU_RESERVATIONS=0.5
WORKER_POLLER_MEMORY_RESERVATIONS=250M
WORKER_POLLER_MAX_TASKS_PER_CHILD=0
ENABLE_WORKER_POLLER_SECRETS=false

# Worker Sender
WORKER_SENDER_CONCURRENCY=4
PREFETCH_SENDER_COUNT=30
WORKER_SENDER_REPLICAS=1
WORKER_SENDER_CPU_LIMIT=1
WORKER_SENDER_MEMORY_LIMIT=500M
WORKER_SENDER_CPU_RESERVATIONS=0.5
WORKER_SENDER_MEMORY_RESERVATIONS=250M
WORKER_SENDER_MAX_TASKS_PER_CHILD=0

# Worker Trap
WORKER_TRAP_CONCURRENCY=4
PREFETCH_TRAP_COUNT=30
RESOLVE_TRAP_ADDRESS=false
INCLUDE_UNRESOLVED_TRAP_VARBINDS=false
MAX_TRAP_VARBINDS_TO_DECODE=0
MAX_DNS_CACHE_SIZE_TRAPS=500
TTL_DNS_CACHE_TRAPS=1800
WORKER_TRAP_REPLICAS=2
WORKER_TRAP_CPU_LIMIT=1
WORKER_TRAP_MEMORY_LIMIT=500M
WORKER_TRAP_CPU_RESERVATIONS=0.5
WORKER_TRAP_MEMORY_RESERVATIONS=250M
WORKER_TRAP_MAX_TASKS_PER_CHILD=0

# Worker Discovery
WORKER_DISCOVERY_CONCURRENCY=4
PREFETCH_DISCOVERY_COUNT=30
WORKER_DISCOVERY_REPLICAS=1
WORKER_DISCOVERY_CPU_LIMIT=1
WORKER_DISCOVERY_MEMORY_LIMIT=500M
WORKER_DISCOVERY_CPU_RESERVATIONS=0.5
WORKER_DISCOVERY_MEMORY_RESERVATIONS=250M
WORKER_DISCOVERY_MAX_TASKS_PER_CHILD=0
ENABLE_WORKER_DISCOVERY_SECRETS=false

To apply changes, recreate the worker containers:

sudo docker compose up -d

Worker parameters¶

General¶

Variable	Description
`WALK_RETRY_MAX_INTERVAL`	Maximum time interval between walk attempts
`WALK_MAX_RETRIES`	Maximum number of walk retries
`METRICS_INDEXING_ENABLED`	Details can be found in append oid index part to the metrics
`POLL_BASE_PROFILES`	Enable polling base profiles (with IF-MIB and SNMPv2-MIB)
`IGNORE_NOT_INCREASING_OIDS`	Ignoring `occurred: OID not increasing` issues for hosts specified in the array, ex: IGNORE_NOT_INCREASING_OIDS=127.0.0.1:164,127.0.0.6
`WORKER_LOG_LEVEL`	Logging level of the workers, possible options: DEBUG, INFO, WARNING, ERROR, CRITICAL, or FATAL
`UDP_CONNECTION_TIMEOUT`	Timeout in seconds for SNMP operations
`UDP_CONNECTION_RETRIES`	Number of SNMP UDP retries per operation
`MAX_OID_TO_PROCESS`	Sometimes SNMP Agent cannot accept more than X OIDs per once, so if the error “TooBig” is visible in logs, decrease the number of MAX_OID_TO_PROCESS
`MAX_REPETITIONS`	The amount of requested next oids in response for each of varbinds in one request sent

Worker Poller¶

Variable	Description
`WORKER_POLLER_CONCURRENCY`	Minimum number of threads in the poller container
`PREFETCH_POLLER_COUNT`	How many tasks are consumed from the queue at once in the poller container
`WORKER_POLLER_REPLICAS`	Number of docker replicas of worker poller container
`WORKER_POLLER_CPU_LIMIT`	Limit of cpu that worker poller container can use
`WORKER_POLLER_MEMORY_LIMIT`	Limit of memory that worker poller container can use
`WORKER_POLLER_CPU_RESERVATIONS`	Dedicated cpu resources for worker poller container
`WORKER_POLLER_MEMORY_RESERVATIONS`	Dedicated memory resources for worker poller container
`WORKER_POLLER_MAX_TASKS_PER_CHILD`	Max number of tasks a poller worker child process can execute before being recycled. `0` disables recycling
`ENABLE_WORKER_POLLER_SECRETS`	Enable usage of secrets for poller

Worker Sender¶

Variable	Description
`WORKER_SENDER_CONCURRENCY`	Minimum number of threads in the sender container
`PREFETCH_SENDER_COUNT`	How many tasks are consumed from the queue at once in the sender container
`WORKER_SENDER_REPLICAS`	Number of docker replicas of worker sender container
`WORKER_SENDER_CPU_LIMIT`	Limit of cpu that worker sender container can use
`WORKER_SENDER_MEMORY_LIMIT`	Limit of memory that worker sender container can use
`WORKER_SENDER_CPU_RESERVATIONS`	Dedicated cpu resources for worker sender container
`WORKER_SENDER_MEMORY_RESERVATIONS`	Dedicated memory resources for worker sender container
`WORKER_SENDER_MAX_TASKS_PER_CHILD`	Max number of tasks a sender worker child process can execute before being recycled. `0` disables recycling

Worker Trap¶

Variable	Description
`WORKER_TRAP_CONCURRENCY`	Minimum number of threads in the trap container
`PREFETCH_TRAP_COUNT`	How many tasks are consumed from the queue at once in the trap container
`RESOLVE_TRAP_ADDRESS`	Use reverse dns lookup for trap IP address and send the hostname to Splunk
`INCLUDE_UNRESOLVED_TRAP_VARBINDS`	Include trap varbinds that could not be MIB-translated under `sc4snmp::unresolved` in Splunk events
`MAX_TRAP_VARBINDS_TO_DECODE`	Maximum varbinds to decode per trap (`0` = unlimited, default `0`)
`MAX_DNS_CACHE_SIZE_TRAPS`	If RESOLVE_TRAP_ADDRESS is set to true, this is the maximum number of records in cache
`TTL_DNS_CACHE_TRAPS`	If RESOLVE_TRAP_ADDRESS is set to true, this is the time to live of the cached record in seconds
`WORKER_TRAP_REPLICAS`	Number of docker replicas of worker trap container
`WORKER_TRAP_CPU_LIMIT`	Limit of cpu that worker trap container can use
`WORKER_TRAP_MEMORY_LIMIT`	Limit of memory that worker trap container can use
`WORKER_TRAP_CPU_RESERVATIONS`	Dedicated cpu resources for worker trap container
`WORKER_TRAP_MEMORY_RESERVATIONS`	Dedicated memory resources for worker trap container
`WORKER_TRAP_MAX_TASKS_PER_CHILD`	Max number of tasks a trap worker child process can execute before being recycled. `0` disables recycling

Worker Discovery¶

Variable	Description
`WORKER_DISCOVERY_CONCURRENCY`	Minimum number of threads in the discovery worker container
`PREFETCH_DISCOVERY_COUNT`	How many tasks are consumed from the queue at once in the discovery worker container
`WORKER_DISCOVERY_REPLICAS`	Number of docker replicas of worker discovery container
`WORKER_DISCOVERY_CPU_LIMIT`	Limit of cpu that worker discovery container can use
`WORKER_DISCOVERY_MEMORY_LIMIT`	Limit of memory that worker discovery container can use
`WORKER_DISCOVERY_CPU_RESERVATIONS`	Dedicated cpu resources for worker discovery container
`WORKER_DISCOVERY_MEMORY_RESERVATIONS`	Dedicated memory resources for worker discovery container
`WORKER_DISCOVERY_MAX_TASKS_PER_CHILD`	Max number of tasks a discovery worker child process can execute before being recycled. `0` disables recycling
`ENABLE_WORKER_DISCOVERY_SECRETS`	Enable usage of SNMPv3 secrets for the discovery worker

Worker scaling¶

microk8sdocker compose

You can adjust worker pods in two ways: set a fixed value in replicaCount, or enable autoscaling, which scales pods automatically.

Real life scenario: I use SC4SNMP for only trap monitoring¶

If you do not use polling at all, set worker.poller.replicaCount to 0. To monitor traps, adjust worker.trap.replicaCount depending on your needs and worker.sender.replicaCount to send traps to Splunk. Usually, you need significantly fewer sender pods than trap pods.

worker:
  trap:
    replicaCount: 4
  sender:
    replicaCount: 1
  poller:
    replicaCount: 0
  discovery:
    replicaCount: 0

With autoscaling:

worker:
  trap:
    autoscaling:
      enabled: true
      minReplicas: 4
      maxReplicas: 10
      targetCPUUtilizationPercentage: 80
  sender:
    autoscaling:
      enabled: true
      minReplicas: 2
      maxReplicas: 5
      targetCPUUtilizationPercentage: 80
  poller:
    replicaCount: 0
  discovery:
    replicaCount: 0

In the above example, both trap and sender pods are autoscaled. During an upgrade process, the number of pods is created through minReplicas, and then new ones are created only if the CPU threshold exceeds the targetCPUUtilizationPercentage, which by default is 80%. This solution helps you to keep resources usage adjusted to what you actually need.

After the helm upgrade process, you will see horizontalpodautoscaler in microk8s kubectl get all -n sc4snmp:

NAME                                                                             REFERENCE                                               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/snmp-mibserver                               Deployment/snmp-mibserver                               1%/80%    1         3         1          97m
horizontalpodautoscaler.autoscaling/snmp-splunk-connect-for-snmp-worker-sender   Deployment/snmp-splunk-connect-for-snmp-worker-sender   1%/80%    2         5         2          28m
horizontalpodautoscaler.autoscaling/snmp-splunk-connect-for-snmp-worker-trap     Deployment/snmp-splunk-connect-for-snmp-worker-trap     1%/80%    4         10        4          28m

If you see <unknown>/80% in the TARGETS section instead of the CPU percentage, you probably do not have the metrics-server add-on enabled. Enable it using microk8s enable metrics-server.

Real life scenario: I have a significant delay in polling¶

Sometimes when polling is configured to be run frequently and on many devices, workers get overloaded and there is a delay in delivering data to Splunk. To avoid these situations, scale poller and sender pods. Because of the walk cycles, (walk is a costly operation that is only run once in a while), poller workers require more resources for a short time. For this reason, enabling autoscaling is recommended.

See the following example of values.yaml with autoscaling:

worker:
  trap:
    autoscaling:
      enabled: true
      minReplicas: 4
      maxReplicas: 10
      targetCPUUtilizationPercentage: 80
  sender:
    autoscaling:
      enabled: true
      minReplicas: 2
      maxReplicas: 5
      targetCPUUtilizationPercentage: 80
  poller:
    autoscaling:
      enabled: true
      minReplicas: 2
      maxReplicas: 20
      targetCPUUtilizationPercentage: 80
  discovery:
    replicaCount: 0

Remember that the system will not scale itself infinitely. There is a finite amount of resources that you can allocate. By default, every worker has configured the following resources:

    resources:
      limits:
        cpu: 500m
      requests:
        cpu: 250m

I have autoscaling enabled and experience problems with Mongo and Redis pod¶

If MongoDB and Redis pods are crushing, and some of the pods are in an infinite Pending state, that means you have exhausted your resources and SC4SNMP cannot scale more. You should decrease the number of maxReplicas in workers, so that it is not going beyond the available CPU.

I do not know how to set autoscaling parameters and how many replicas I need¶

The best way to see if pods are overloaded is to run the following command:

microk8s kubectl top pods -n sc4snmp

NAME                                                          CPU(cores)   MEMORY(bytes)   
snmp-mibserver-7f879c5b7c-nnlfj                               1m           3Mi             
snmp-mongodb-869cc8586f-q8lkm                                 18m          225Mi           
snmp-redis-master-0                                           10m          2Mi             
snmp-splunk-connect-for-snmp-scheduler-558dccfb54-nb97j       2m           136Mi           
snmp-splunk-connect-for-snmp-trap-5878f89bbf-24wrz            2m           129Mi           
snmp-splunk-connect-for-snmp-trap-5878f89bbf-z9gd5            2m           129Mi           
snmp-splunk-connect-for-snmp-worker-poller-599c7fdbfb-cfqjm   260m         354Mi           
snmp-splunk-connect-for-snmp-worker-poller-599c7fdbfb-ztf7l   312m         553Mi           
snmp-splunk-connect-for-snmp-worker-sender-579f796bbd-vmw88   14m           257Mi           
snmp-splunk-connect-for-snmp-worker-trap-5474db6fc6-46zhf     3m           259Mi           
snmp-splunk-connect-for-snmp-worker-trap-5474db6fc6-mjtpv     4m           259Mi

Here you can see how much CPU and Memory is being used by the pods. If the CPU is close to 500m, which is the limit for one pod by default, enable autoscaling/increase maxReplicas or increase replicaCount with autoscaling off.

See Horizontal Autoscaling and Scaling with Microk8s for more information.

Worker scaling is controlled via replica and resource variables in .env. To disable a worker type, set its replica count to 0:

WORKER_POLLER_REPLICAS=0
WORKER_TRAP_REPLICAS=4
WORKER_SENDER_REPLICAS=1

Docker Compose does not support autoscaling. To handle higher load, increase the replica counts and concurrency values manually.

Reverse DNS lookup in trap worker¶

If you want to see the hostname instead of the IP address of the incoming traps in Splunk:

microk8sdocker compose

worker:
  trap:
    resolveAddress:
      enabled: true
      cacheSize: 500
      cacheTTL: 1800

Set in .env:

RESOLVE_TRAP_ADDRESS=true
MAX_DNS_CACHE_SIZE_TRAPS=500
TTL_DNS_CACHE_TRAPS=1800

Trap worker uses in memory cache to store the results of the reverse dns lookup. If you restart the worker, the cache will be cleared.