Finetune SC4S¶
This section provides guidance on improving SC4S performance by tuning configuration settings.
You can apply these settings to your infrastructure to improve SC4S performance. After making adjustments, run the performance tests and retain the changes that result in performance improvements.
Disable features that reduce performance¶
Some SC4S features may negatively impact performance. If you experience performance issues, consider disabling some of the settings like name cache and message grouping:
Edit /opt/sc4s/env_file:
SC4S_USE_NAME_CACHE=no
SC4S_SOURCE_VMWARE_VSPHERE_GROUPMSG=no
Restart SC4S for the changes to take effect.
Dedicated sc4s instances¶
If one of the logs sources produces a large percentage of the overall traffic, create an additional dedicated sc4s service on a separate host.
Tune the receiving buffer¶
Increasing the receive buffer allows the kernel to queue more incoming data before the application processes it, reducing packet loss during traffic bursts. This requires changes at both the OS level and within SC4S. Start turning the receiving buffer with increasing the kernel buffer OS limits. Perform the following steps to change the buffer size:
- Edit
/etc/sysctl.confand set the receive buffer size to 512 MB:
net.core.rmem_default = 536870912
net.core.rmem_max = 536870912
- Apply the changes:
sudo sysctl -p
Next configure SC4S to use the larger buffer:
- Add the following line to
/opt/sc4s/env_file:
SC4S_SOURCE_TCP_SO_RCVBUFF=536870912
- Apply the same buffer tuning to each syslog transport you have enabled:
SC4S_SOURCE_TCP_SO_RCVBUFF=536870912 # Generic syslog over TCP
SC4S_SOURCE_UDP_SO_RCVBUFF=536870912 # Generic syslog over UDP
SC4S_SOURCE_RFC5426_SO_RCVBUFF=536870912 # RFC 5426 (syslog over UDP)
SC4S_SOURCE_RFC6587_SO_RCVBUFF=536870912 # RFC 6587 (syslog over TCP)
SC4S_SOURCE_RFC5425_SO_RCVBUFF=536870912 # RFC 5425 (syslog over TLS)
- Restart SC4S for the changes to take effect.
Additional considerations¶
- Sending buffer: In rare cases, you may also need to increase the sending buffer size by modifying
net.core.wmem_maxandnet.core.wmem_defaultusing the same approach. - Buffer size limits: Setting buffers too large can actually decrease performance. Start with the recommended values and adjust based on your testing results.
- Hardware constraints: Network driver limitations should be considered when tuning these values. Consult your NIC documentation for maximum supported buffer sizes.
Impact on performance¶
Test performed in the lab environment show that tuning receive buffer can improve performance, depending on the connection type.
TCP connections
- Receive buffer tuning produced the best results for TCP connections.
- In some cases, message throughput doubled.
UDP connections The performance impact was smaller, especially at higher throughputs.
Two important observations were made:
- The best results are achieved with a 64 MB receive buffer.
- According to syslog documentation and blog posts, higher receive buffer values reduce the probability of small UDP packet losses at relatively low message throughput.
Observed behaviour:
- In our environment, we observed message losses of approximately 0.2% at a throughput of 9,000 msg/sec.
- These losses disappeared after increasing the receive buffer to 64 MB.
For UDP connections you should:
- Start with a relatively small receive buffer value (for example, the default 16 MB or 64 MB).
- Increase the value only if needed.
- First try other tuning options, such as eBPF, before increasing receive buffer.
Tune static input window size¶
Input window provides flow‑control at the application level. Syslog‑ng uses this feature to temporarily buffer messages when outputs are slow. The mechanism works by pulling messages from the kernel’s receive buffer and placing them into an application buffer.
- The window size defines how many messages this internal buffer can hold.
- Each message fetched from the kernel buffer increases the window counter.
- Each message successfully forwarded to the output decreases the counter.
To change the window size, modify the following options in /opt/sc4s/env_file:
for UDP:
SC4S_SOURCE_UDP_IW_USE=yes
SC4S_SOURCE_UDP_IW_SIZE=1000000
for TCP:
You can also modify the input window size for TCP, however note that this option is enabled by default and the input window size is set to relatively high value of 20000000.
SC4S_SOURCE_TCP_IW_SIZE=1000000 # lower value than the default one
Restart SC4S for the changes to take effect.
In the example above, the input window can store up to 1,000,000 messages. Note that increasing the window size will increase the application’s memory usage.
For UDP, if the output becomes slow and this window fills up, syslog‑ng will stop reading from the kernel buffer. As a result, the kernel buffer will begin to fill, and once it becomes full, incoming UDP packets will be dropped by the kernel.
A single UDP message can be up to approximately 1 KB. With a window size of 1,000,000 messages, this may require up to 1 GB of additional memory for buffering.
Impact on performance
If you increase the window size, you will not improve the baseline throughput of SC4S, but it will allow ingesting an increased rate of messages for a limited time until the input window fills up. Keep in mind that this will also add latency to the time of events arriving in Splunk.
For example, in our UDP test scenario on an EC2 machine with 16 threads and eBPF enabled, with 150,000 messages per second arriving over the span of 30 seconds, we managed to reduce the 50% loss rate to 0% by setting input window to 1000000 (1 GB). Similarly, for the test scenario with 350,000 messages per second, without input window we achieved a loss rate of around 80%, which corresponds to only 75,000 messages per second. With input window enabled, we managed to lower this to 61.49%, which is almost 150,000 messages per second. However, once the buffer fills up, there will be no improvement in performance.
The default value for TCP input window size is already large enough and in most cases does not require further tuning.
Fetch limit¶
When increasing the input window size, you may also need to increase the fetch limit. The fetch limit controls the maximum number of messages retrieved from the source in a single read operation.
- Too high: Should not exceed the input window size, as this would fill the entire buffer in one read cycle.
- Too low: Results in underutilizing the buffer capacity, requiring more read cycles to process the same volume.
To change the fetch limit, modify the following options in /opt/sc4s/env_file:
for TCP:
SC4S_SOURCE_TCP_FETCH_LIMIT=1000
for UDP:
SC4S_SOURCE_UDP_FETCH_LIMIT=1000
The default value is 1000.
Disk buffering¶
To prevent message loss during HEC connection outages, consider enabling Disk Buffering. This feature temporarily stores messages on disk when the destination is unavailable.
Switch to SC4S Lite¶
Parsing syslog messages can be a CPU-intensive task. During the parsing process, each syslog message goes through multiple parsing rules until a match is found. Some log messages follow longer parsing paths than others, and some parsers use regular expressions, which can be slow.
If you are familiar with your log sources, consider performing an A/B test and switching to SC4S Lite, which includes only the parsers for the vendors you require. Although artificial performance tests may not fully reflect the impact of this change, you may observe an increase in the capacity of your syslog layer when operating with real-world data.
Finetune for UDP traffic¶
Increase the number of UDP sockets¶
By default, SC4S uses a single UDP socket per port. Increasing the number of sockets allows traffic to be distributed across multiple CPU threads using the kernel’s SO_REUSEPORT feature.
Default behavior (without eBPF)
The kernel assigns packets to sockets based on a hash of the source IP and port. This means:
- Consistent routing: All packets from the same sender go to the same socket, preserving message order.
- Potential imbalance: If a few senders generate most of the traffic, their packets may all land on the same socket, leaving other sockets underutilized.
With eBPF enabled
eBPF provides true per-packet load balancing, where each packet is randomly distributed across all sockets regardless of sender. This results in:
- Even workload: Traffic is distributed more uniformly across CPU threads.
- No ordering guarantee: Packets from the same sender may be processed out of order.
Configuration
Add the following to /opt/sc4s/env_file:
SC4S_SOURCE_LISTEN_UDP_SOCKETS=32
Set this value based on the number of CPU cores available. Start with a value equal to your 4 x core count and adjust based on performance testing. Restart SC4S for the changes to take effect.
Impact on performance
Increasing the number of UDP sockets provides the best results when data comes from multiple sources, as it allows better utilization of CPU cores. If the data comes from a single source (same IP and port), this setting has only a minimal impact on performance. In this case, consider enabling eBPF instead.
Enable eBPF¶
Find more in the About eBPF section.
- Verify that your host supports eBPF.
- Ensure your container is running in privileged mode.
- Update the configuration in
/opt/sc4s/env_file:SC4S_SOURCE_LISTEN_UDP_SOCKETS=32 SC4S_ENABLE_EBPF=yes SC4S_EBPF_NO_SOCKETS=32 # to achieve best results set to number of threads x 4 - Restart SC4S for the changes to take effect.
Impact on performance
You can achieve the best results on multithreaded machines. This is demonstrated in the performance test results for different EC2 instances (m5.4xlarge, m5.2xlarge, and m5.xlarge):
Note
Performance may vary depending on a version and specifics of your environment.
| Receiver / Drops rate for EPS (msgs/sec) | 4,500 | 9,000 | 27,000 | 50,000 | 150,000 | 350,000 |
|---|---|---|---|---|---|---|
| Default SC4S - 16 threads | 0% | 0% | 59.4% | 79.18% | 93.18% | 96.88% |
| eBPF on - 16 threads | 0% | 0% | 0% | 0% | 49.79% | 81.10% |
| Default SC4S - 8 threads | 0% | 0.17% | 57.52% | 77.61% | 93.17% | 97.57% |
| eBPF on - 8 threads | 0% | 0% | 0% | 25.45% | 75.76% | 90.16% |
| Default SC4S - 4 threads | 0% | 0.24% | 66.62% | 82.66% | 94.31% | 97.92% |
| eBPF on - 4 threads | 0% | 0% | 31.49% | 67.02% | 93.80% | 96.22% |
Finetune for TCP traffic¶
Parallelize TCP processing¶
- Update
/opt/sc4s/env_file:
SC4S_ENABLE_PARALLELIZE=yes
SC4S_PARALLELIZE_NO_PARTITION=4
- Restart SC4S for the changes to take effect.
Parallelize distributes messages from a single TCP stream across multiple concurrent threads, which is beneficial in production environments with a single high-volume TCP source. With multiple TCP connections, this option will not improve performance and may add additional overhead. The impact of this option on performance will be most notable on machines with multiple threads.
| SC4S parallelize | Loggen TCP connections | Average rate (msg/sec) |
|---|---|---|
| off | 1 | 12,393 |
| on (16 threads) | 1 | 35,543 |
| off | 10 | 68,240 |
| on (16 threads) | 10 | 75,556 |