As with most asynchronous design patterns, troubleshooting can be an arduous task. However, there are some built-in utilities to that you can employ to make your lives easier.
The most important step in troubleshooting is to validate the environment your container runs in. Please ensure that the following questions are answered:
Please refer to the setup/installation page for comprehensive documentation on what is required on the host before launching your Splunk container. However, you may also want to verify that:
If you find your container gets into a state when it crashes immediately on boot, or it hits a crash-loop backoff failure mode, you may need to do some live debugging. Here is some advice to help you proceed and figure out exactly why your container keeps crashing
The logs are by far the best way to analyze what went wrong with your deployment or container. Because we rely heavily on the splunk-ansible project underneath the hood of this image, it’s very likely that your container hit a specific failure in the Ansible level. Grabbing the container’s stdout/stderr will tell you at what point and at what play this exception occurred.
To check for container logs, you can run the following command:
$ docker logs <container_name/container_id>
Alternatively, it’s also possible to stream logs so you can watch what happens during the provisioning process in real-time:
$ docker logs --follow <container_name/container_id>
$ docker logs -f <container_name/container_id>
If your container is still running but in a bad state, you can try to debug by putting yourself within the context of that process.
To gain interactive shell access to the container’s runtime as the splunk user, you can run:
$ docker exec -it -u splunk <container_name/container_id> /bin/bash
Once inside the container, you can install additional packages with the default package manager:
$ microdnf install <package_name>
Please note that the package installer microdnf
is specific to the redhat-8 operating system. When building other operating systems, please research and use the recommended package manager. You can refer to base/<operating_system>/install.sh
to see package installation examples for different operating systems.
There are some built-in environment variables to assist with troubleshooting. Please be aware that when using these variables, it is possible for sensitive keys and information to be shown on the container’s stdout/stderr. If you are using any custom logging driver or solution that persists this information, we recommend disabling it for the duration of this debug session.
The two primary environment variables that may be of assistance are DEBUG
and ANSIBLE_EXTRA_FLAGS
.
In order to use DEBUG
, create a container and define DEBUG=true
as so:
$ docker run -it -e DEBUG=true -e SPLUNK_START_ARGS=--accept-license -e SPLUNK_PASSWORD=<password> splunk/splunk:latest
ansible-playbook 2.7.7
config file = /opt/ansible/ansible.cfg
configured module search path = [u'/opt/ansible/library', u'/opt/ansible/apps/library', u'/opt/ansible/ansible_commands']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible-playbook
python version = 2.7.13 (default, Sep 26 2018, 18:42:22) [GCC 6.3.0 20170516]
{
"_meta": {
"hostvars": {
"localhost": {
"ansible_connection": "local"
}
}
},
...
If you check the container logs for this particular container, you’ll notice the entire object generated by the dynamic inventory script environ.py
is printed out at the top. This is particularly useful if you need to validate that certain variables are defined accordingly and map to exactly what has been explicitly set. It also displays the Python version and Ansible version used, which will greatly help if you plan on submitting a GitHub issue to make it easy for others to reproduce.
The ANSIBLE_EXTRA_FLAGS
is another help environment variable that can be used to display more information or output from Ansible. For instance, one practical application of this could be:
$ docker run -it -e "ANSIBLE_EXTRA_FLAGS=-vv" -e SPLUNK_START_ARGS=--accept-license -e SPLUNK_PASSWORD=<password> splunk/splunk:latest
ansible-playbook 2.7.7
config file = /opt/ansible/ansible.cfg
configured module search path = [u'/opt/ansible/library', u'/opt/ansible/apps/library', u'/opt/ansible/ansible_commands']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible-playbook
python version = 2.7.13 (default, Sep 26 2018, 18:42:22) [GCC 6.3.0 20170516]
Using /opt/ansible/ansible.cfg as config file
/opt/ansible/inventory/environ.py did not meet host_list requirements, check plugin documentation if this is unexpected
PLAYBOOK: site.yml ************************************************************
1 plays in site.yml
PLAY [Run default Splunk provisioning] ****************************************
Thursday 21 February 2019 00:50:55 +0000 (0:00:00.036) 0:00:00.036 *****
TASK [Gathering Facts] ********************************************************
task path: /opt/ansible/site.yml:2
ok: [localhost]
META: ran handlers
Thursday 21 February 2019 00:50:56 +0000 (0:00:01.148) 0:00:01.185 *****
With the above, you’ll notice how much more rich and verbose the Ansible output becomes, simply by adding more verbosity to the actual Ansible execution.
The no-provision
is a fairly useless supported command - after launching the container, it won’t run Ansible so Splunk will not get installed or even setup. Instead, it tails a file to keep the instance up and running.
This no-provision
keyword is an argument that gets passed into the container’s entrypoint script, so you can use it in the following manner:
$ docker run -it -e SPLUNK_START_ARGS=--accept-license -e SPLUNK_PASSWORD=<password> -e SPLUNK_HEC_TOKEN=abcd1234 --name spldebug splunk/splunk:latest no-provision
While by itself this seems fairly useless, it can be a great way to debug and troubleshoot Ansible problems locally. In the case above, we’ve set a SPLUNK_HEC_TOKEN
environment variable. Let’s go inside of this container and kick off Ansible manually, and make sure that the HEC token is set properly and that Splunk uses it:
$ docker exec -it spldebug bash
ansible@5f60f3164e69:/$ echo $SPLUNK_HEC_TOKEN
abcd1234
ansible@5f60f3164e69:/$ /sbin/entrypoint.sh start
PLAY [Run default Splunk provisioning] ****************************************
Thursday 21 February 2019 01:09:50 +0000 (0:00:00.034) 0:00:00.034 *****
TASK [Gathering Facts] ********************************************************
ok: [localhost]
A Splunk diagnostic file (diag) is a dump of a Splunk environment that shows how the instance is configured and how it has been operating. If you plan on working with Splunk Support, you may be requested to generate a diag for them to assist you. For more information on what is contained in a diag, refer to this topic.
Generating a diag is only an option if:
docker exec
into said containerTo create this diag, run the following command:
$ docker exec -it -u splunk <container_name/container_id> "${SPLUNK_HOME}/bin/splunk diag"
Additionally, if your Docker container/hosts have access to https://www.splunk.com you can now send the file directly to Splunk Support by using the following command:
$ docker exec -it -u splunk <container_name/container_id> "${SPLUNK_HOME}/bin/splunk diag --upload --case-number=<case_num> --upload-user=<your_splunk_id> --upload-password=<passwd> --upload-description='Monday diag, as requested'"
However, if you don’t have direct access, you can manually copy the diag back to your host via docker cp
:
$ docker cp <container_name/container_id>:/opt/splunk/<filename> <location on your local machine>
If you require additional assistance, please see the support guidelines on how to reach out to Splunk Support with issues or questions.