Subsections of Lambda Tracing
Setup
Prerequisites
Observability Workshop Instance
The Observability Workshop is most often completed on a Splunk-issued and preconfigured EC2 instance running Ubuntu.
Your workshop instructor will provide you with the credentials to your assigned workshop instance.
Your instance should have the following environment variables already set:
- ACCESS_TOKEN
- REALM
- These are the Splunk Observability Cloud Access Token and Realm for your workshop.
- They will be used by the OpenTelemetry Collector to forward your data to the correct Splunk Observability Cloud organization.
Note
Alternatively, you can deploy a local observability workshop instance using Multipass.
AWS Command Line Interface (awscli)
The AWS Command Line Interface, or awscli
, is an API used to interact with AWS resources. In this workshop, it is used by certain scripts to interact with the resource you’ll deploy.
Your Splunk-issued workshop instance should already have the awscli installed.
Check if the aws command is installed on your instance with the following command:
- The expected output would be /usr/local/bin/aws
If the aws command is not installed on your instance, run the following command:
Terraform is an Infrastructure as Code (IaC) platform, used to deploy, manage and destroy resource by defining them in configuration files. Terraform employs HCL to define those resources, and supports multiple providers for various platforms and technologies.
We will be using Terraform at the command line in this workshop to deploy the following resources:
- AWS API Gateway
- Lambda Functions
- Kinesis Stream
- CloudWatch Log Groups
- S3 Bucket
- and other supporting resources
Your Splunk-issued workshop instance should already have terraform installed.
Check if the terraform command is installed on your instance:
- The expected output would be /usr/local/bin/terraform
If the terraform command is not installed on your instance, follow Terraform’s recommended installation commands listed below:
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform
Workshop Directory (o11y-lambda-workshop)
The Workshop Directory o11y-lambda-workshop
is a repository that contains all the configuration files and scripts to complete both the auto-instrumentation and manual instrumentation of the example Lambda-based application we will be using today.
Confirm you have the workshop directory in your home directory:
- The expected output would include o11y-lambda-workshop
If the o11y-lambda-workshop directory is not in your home directory, clone it with the following command:
git clone https://github.com/gkono-splunk/o11y-lambda-workshop.git
AWS
The AWS CLI requires that you have credentials to be able to access and manage resources deployed by their services. Both Terraform and the Python scripts in this workshop require these variables to perform their tasks.
Configure the awscli with the access key ID, secret access key and region for this workshop:
- This command should provide a prompt similar to the one below:
AWS Access Key ID [None]: XXXXXXXXXXXXXXXX
AWS Secret Acces Key [None]: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Default region name [None]: us-east-1
Default outoput format [None]:
If the awscli is not configured on your instance, run the following command and provide the values your instructor would provide you with.
Terraform supports the passing of variables to ensure sensitive or dynamic data is not hard-coded in your .tf configuration files, as well as to make those values reusable throughout your resource definitions.
In our workshop, Terraform requires variables necessary for deploying the Lambda functions with the right values for the OpenTelemetry Lambda layer; For the ingest values for Splunk Observability Cloud; And to make your environment and resources unique and immediatley recognizable.
Terraform variables are defined in the following manner:
- Define the variables in your main.tf file or a variables.tf
- Set the values for those variables in either of the following ways:
- setting environment variables at the host level, with the same variable names as in their definition, and with TF_VAR_ as a prefix
- setting the values for your variables in a terraform.tfvars file
- passing the values as arguments when running terraform apply
We will be using a combination of variables.tf and terraform.tfvars files to set our variables in this workshop.
- Using either vi or nano, open the terraform.tfvars file in either the auto or manual directory
vi ~/o11y-lambda-workshop/auto/terraform.tfvars
- Set the variables with their values. Replace the CHANGEME placeholders with those provided by your instructor.
o11y_access_token = "CHANGEME"
o11y_realm = "CHANGEME"
otel_lambda_layer = ["CHANGEME"]
prefix = "CHANGEME"
- Ensure you change only the placeholders, leaving the quotes and brackets intact, where applicable.
- The prefix is a unique identifier you can choose for yourself, to make your resources distinct from other participants’ resources. We suggest using a short form of your name, for example.
- Also, please only lowercase letters for the prefix. Certain resouces in AWS, such as S3, would through an error if you use uppercase letters.
- Save your file and exit the editor.
- Finally, copy the terraform.tfvars file you just edited to the other directory.
cp ~/o11y-lambda-workshop/auto/terraform.tfvars ~/o11y-lambda-workshop/manual
- We do this as we will be using the same values for both the autoinstrumentation and manual instrumentation protions of the workshop
File Permissions
While all other files are fine as they are, the send_message.py script in both the auto
and manual
will have to be executed as part of our workshop. As a result, it needs to have the appropriate permissions to run as expected. Follow these instructions to set them.
First, ensure you are in the o11y-lambda-workshop
directory:
cd ~/o11y-lambda-workshop
Next, run the following command to set executable permissions on the send_message.py
script:
sudo chmod 755 auto/send_message.py manual/send_message.py
Now that we’ve squared off the prerequisites, we can get started with the workshop!
Auto-Instrumentation
The first part of our workshop will demonstrate how auto-instrumentation with OpenTelemetry allows the OpenTelemetry Collector to auto-detect what language your function is written in, and start capturing traces for those functions.
The Auto-Instrumentation Workshop Directory & Contents
First, let us take a look at the o11y-lambda-workshop/auto
directory, and some of its files. This is where all the content for the auto-instrumentation portion of our workshop resides.
The auto
Directory
The main.tf
file
- Take a closer look at the
main.tf
file:
Workshop Questions
- Can you identify which AWS resources are being created by this template?
- Can you identify where OpenTelemetry instrumentation is being set up?
- Hint: study the lambda function definitions
- Can you determine which instrumentation information is being provided by the environment variables we set earlier?
You should see a section where the environment variables for each lambda function are being set.
environment {
variables = {
SPLUNK_ACCESS_TOKEN = var.o11y_access_token
SPLUNK_REALM = var.o11y_realm
OTEL_SERVICE_NAME = "producer-lambda"
OTEL_RESOURCE_ATTRIBUTES = "deployment.environment=${var.prefix}-lambda-shop"
AWS_LAMBDA_EXEC_WRAPPER = "/opt/nodejs-otel-handler"
KINESIS_STREAM = aws_kinesis_stream.lambda_streamer.name
}
}
By using these environment variables, we are configuring our auto-instrumentation in a few ways:
We are setting environment variables to inform the OpenTelemetry collector of which Splunk Observability Cloud organization we would like to have our data exported to.
SPLUNK_ACCESS_TOKEN = var.o11y_access_token
SPLUNK_ACCESS_TOKEN = var.o11y_realm
We are also setting variables that help OpenTelemetry identify our function/service, as well as the environment/application it is a part of.
OTEL_SERVICE_NAME = "producer-lambda" # consumer-lambda in the case of the consumer function
OTEL_RESOURCE_ATTRIBUTES = "deployment.environment=${var.prefix}-lambda-shop"
We are setting an environment variable that lets OpenTelemetry know what wrappers it needs to apply to our function’s handler so as to capture trace data automatically, based on our code language.
AWS_LAMBDA_EXEC_WRAPPER - "/opt/nodejs-otel-handler"
In the case of the producer-lambda
function, we are setting an environment variable to let the function know what Kinesis Stream to put our record to.
KINESIS_STREAM = aws_kinesis_stream.lambda_streamer.name
These values are sourced from the environment variables we set in the Prerequisites section, as well as resources that will be deployed as a part of this Terraform configuration file.
You should also see an argument for setting the Splunk OpenTelemetry Lambda layer on each function
layers = var.otel_lambda_layer
The OpenTelemetry Lambda layer is a package that contains the libraries and dependencies necessary to collector, process and export telemetry data for Lambda functions at the moment of invocation.
While there is a general OTel Lambda layer that has all the libraries and dependencies for all OpenTelemetry-supported languages, there are also language-specific Lambda layers, to help make your function even more lightweight.
- You can see the relevant Splunk OpenTelemetry Lambda layer ARNs (Amazon Resource Name) and latest versions for each AWS region HERE
The producer.mjs
file
Next, let’s take a look at the producer-lambda
function code:
- Run the following command to view the contents of the
producer.mjs
file:cat ~/o11y-lambda-workshop/auto/handler/producer.mjs
- This NodeJS module contains the code for the producer function.
- Essentially, this function receives a message, and puts that message as a record to the targeted Kinesis Stream
Deploying the Lambda Functions & Generating Trace Data
Now that we are familiar with the contents of our auto
directory, we can deploy the resources for our workshop, and generate some trace data from our Lambda functions.
In order to deploy the resources defined in the main.tf
file, you first need to make sure that Terraform is initialized in the same folder as that file.
Ensure you are in the auto
directory:
- The expected output would be ~/o11y-lambda-workshop/auto
If you are not in the auto
directory, run the following command:
cd ~/o11y-lambda-workshop/auto
Run the following command to initialize Terraform in this directory
- This command will create a number of elements in the same folder:
.terraform.lock.hcl
file: to record the providers it will use to provide resources.terraform
directory: to store the provider configurations
- In addition to the above files, when terraform is run using the
apply
subcommand, the terraform.tfstate
file will be created to track the state of your deployed resources. - These enable Terraform to manage the creation, state and destruction of resources, as defined within the
main.tf
file of the auto
directory
Deploy the Lambda functions and other AWS resources
Once we’ve initialized Terraform in this directory, we can go ahead and deploy our resources.
First, run the terraform plan command to ensure that Terraform will be able to create your resources without encountering any issues.
- This will result in a plan to deploy resources and output some data, which you can review to ensure everything will work as intended.
- Do note that a number of the values shown in the plan will be known post-creation, or are masked for security purposes.
Next, run the terraform apply command to deploy the Lambda functions and other supporting resources from the main.tf file:
Send some traffic to the producer-lambda
URL (base_url
)
To start getting some traces from our deployed Lambda functions, we would need to generate some traffic. We will send a message to our producer-lambda
function’s endpoint, which should be put as a record into our Kinesis Stream, and then pulled from the Stream by the consumer-lambda
function.
Ensure you are in the auto
directory:
- The expected output would be ~/o11y-lambda-workshop/auto
If you are not in the auto
directory, run the following command
cd ~/o11y-lambda-workshop/auto
The send_message.py
script is a Python script that will take input at the command line, add it to a JSON dictionary, and send it to your producer-lambda
function’s endpoint repeatedly, as part of a while loop.
Run the send_message.py
script as a background process
- It requires the
--name
and --superpower
arguments
nohup ./send_message.py --name CHANGEME --superpower CHANGEME &
- You should see an output similar to the following if your message is successful
[1] 79829
user@host manual % appending output to nohup.out
- The two most import bits of information here are:
- The process ID on the first line (
79829
in the case of my example), and - The
appending output to nohup.out
message
- The
nohup
command ensures the script will not hang up when sent to the background. It also captures the curl output from our command in a nohup.out file in the same folder as the one you’re currently in. - The
&
tells our shell process to run this process in the background, thus freeing our shell to run other commands.
Next, check the contents of the response.logs
file, to ensure your output confirms your requests to your producer-lambda
endpoint are successful:
- You should see the following output among the lines printed to your screen if your message is successful:
{"message": "Message placed in the Event Stream: {prefix}-lambda_stream"}
- If unsuccessful, you will see:
{"message": "Internal server error"}
Important
If this occurs, ask one of the workshop facilitators for assistance.
View the Lambda Function Logs
Next, let’s take a look at the logs for our Lambda functions.
To view your producer-lambda logs, check the producer.logs file:
To view your consumer-lambda logs, check the consumer.logs file:
Examine the logs carefully.
Workshop Question
- Do you see OpenTelemetry being loaded? Look out for the lines with
splunk-extension-wrapper
- Consider running
head -n 50 producer.logs
or head -n 50 consumer.logs
to see the splunk-extension-wrapper being loaded.
Splunk APM, Lambda Functions & Traces
The Lambda functions should be generating a sizeable amount of trace data, which we would need to take a look at. Through the combination of environment variables and the OpenTelemetry Lambda layer configured in the resource definition for our Lambda functions, we should now be ready to view our functions and traces in Splunk APM.
View your Environment name in the Splunk APM Overview
Let’s start by making sure that Splunk APM is aware of our Environment
from the trace data it is receiving. This is the deployment.name
we set as part of the OTEL_RESOURCE_ATTRIBUTES
variable we set on our Lambda function definitions in main.tf
. It was also one of the outputs from the terraform apply
command we ran earlier.
In Splunk Observability Cloud:
Note
It may take a few minutes for your traces to appear in Splunk APM. Try hitting refresh on your browser until you find your environment name in the list of environments.
View your Environment’s Service Map
Once you’ve selected your Environment name from the Environment drop down, you can take a look at the Service Map for your Lambda functions.
- Click the
Service Map
Button on the right side of the APM Overview page. This will take you to your Service Map view.
You should be able to see the producer-lambda
function and the call it is making to the Kinesis Stream to put your record.
Workshop Question
What about your consumer-lambda
function?
Explore the Traces from your Lambda Functions
- Click the
Traces
button to view the Trace Analyzer.
On this page, we can see the traces that have been ingested from the OpenTelemetry Lambda layer of your producer-lambda
function.
- Select a trace from the list to examine by clicking on its hyperlinked
Trace ID
.
We can see that the producer-lambda
function is putting a record into the Kinesis Stream. But the action of the consumer-lambda
function is missing!
This is because the trace context is not being propagated. Trace context propagation is not supported out-of-the-box by Kinesis service at the time of this workshop. Our distributed trace stops at the Kinesis service, and because its context isn’t automatically propagated through the stream, we can’t see any further.
Not yet, at least…
Let’s see how we work around this in the next section of this workshop. But before that, let’s clean up after ourselves!
Clean Up
The resources we deployed as part of this auto-instrumenation exercise need to be cleaned. Likewise, the script that was generating traffice against our producer-lambda
endpoint needs to be stopped, if it’s still running. Follow the below steps to clean up.
Kill the send_message
Destroy all AWS resources
Terraform is great at managing the state of our resources individually, and as a deployment. It can even update deployed resources with any changes to their definitions. But to start afresh, we will destroy the resources and redeploy them as part of the manual instrumentation portion of this workshop.
Please follow these steps to destroy your resources:
Ensure you are in the auto
directory:
- The expected output would be ~/o11y-lambda-workshop/auto
If you are not in the auto
directory, run the following command:
cd ~/o11y-lambda-workshop/auto
Destroy the Lambda functions and other AWS resources you deployed earlier:
- respond
yes
when you see the Enter a value:
prompt - This will result in the resources being destroyed, leaving you with a clean environment
This process will leave you with the files and directories created as a result of our activity. Do not worry about those.
Manual Instrumentation
The second part of our workshop will focus on demonstrating how manual instrumentation with OpenTelemetry empowers us to enhance telemetry collection. More specifically, in our case, it will enable us to propagate trace context data from the producer-lambda
function to the consumer-lambda
function, thus enabling us to see the relationship between the two functions, even across Kinesis Stream, which currently does not support automatic context propagation.
The Manual Instrumentation Workshop Directory & Contents
Once again, we will first start by taking a look at our operating directory, and some of its files. This time, it will be o11y-lambda-workshop/manual
directory. This is where all the content for the manual instrumentation portion of our workshop resides.
The manual
directory
Workshop Question
Do you see any difference between this directory and the auto directory when you first started?
Compare auto
and manual
files
Let’s make sure that all these files that LOOK the same, are actually the same.
Compare the main.tf
files in the auto
and manual
directories:
diff ~/o11y-lambda-workshop/auto/main.tf ~/o11y-lambda-workshop/manual/main.tf
- There is no difference! (Well, there shouldn’t be. Ask your workshop facilitator to assist you if there is)
Now, let’s compare the producer.mjs
files:
diff ~/o11y-lambda-workshop/auto/handler/producer.mjs ~/o11y-lambda-workshop/manual/handler/producer.mjs
- There’s quite a few differences here!
You may wish to view the entire file and examine its content
cat ~/o11y-lambda-workshop/handler/producer.mjs
- Notice how we are now importing some OpenTelemetry objects directly into our function to handle some of the manual instrumentation tasks we require.
import { context, propagation, trace, } from "@opentelemetry/api";
- We are importing the following objects from @opentelemetry/api to propagate our context in our producer function:
Finally, compare the consumer.mjs
files:
diff ~/o11y-lambda-workshop/auto/handler/consumer.mjs ~/o11y-lambda-workshop/manual/handler/consumer.mjs
Propagating the Trace Context from the Producer Function
The below code executes the following steps inside the producer function:
- Get the tracer for this trace
- Initialize a context carrier object
- Inject the context of the active span into the carrier object
- Modify the record we are about to pu on our Kinesis stream to include the carrier that will carry the active span’s context to the consumer
...
import { context, propagation, trace, } from "@opentelemetry/api";
...
const tracer = trace.getTracer('lambda-app');
...
return tracer.startActiveSpan('put-record', async(span) => {
let carrier = {};
propagation.inject(context.active(), carrier);
const eventBody = Buffer.from(event.body, 'base64').toString();
const data = "{\"tracecontext\": " + JSON.stringify(carrier) + ", \"record\": " + eventBody + "}";
console.log(
`Record with Trace Context added:
${data}`
);
try {
await kinesis.send(
new PutRecordCommand({
StreamName: streamName,
PartitionKey: "1234",
Data: data,
}),
message = `Message placed in the Event Stream: ${streamName}`
)
...
span.end();
The below code executes the following steps inside the consumer function:
- Extract the context that we obtained from
producer-lambda
into a carrier object. - Extract the tracer from current context.
- Start a new span with the tracer within the extracted context.
- Bonus: Add extra attributes to your span, including custom ones with the values from your message!
- Once completed, end the span.
import { propagation, trace, ROOT_CONTEXT } from "@opentelemetry/api";
...
const carrier = JSON.parse( message ).tracecontext;
const parentContext = propagation.extract(ROOT_CONTEXT, carrier);
const tracer = trace.getTracer(process.env.OTEL_SERVICE_NAME);
const span = tracer.startSpan("Kinesis.getRecord", undefined, parentContext);
span.setAttribute("span.kind", "server");
const body = JSON.parse( message ).record;
if (body.name) {
span.setAttribute("custom.tag.name", body.name);
}
if (body.superpower) {
span.setAttribute("custom.tag.superpower", body.superpower);
}
...
span.end();
Now let’s see the different this makes!
Deploying Lambda Functions & Generating Trace Data
Now that we know how to apply manual instrumentation to the functions and services we wish to capture trace data for, let’s go about deploying our Lambda functions again, and generating traffic against our producer-lambda
endpoint.
Seeing as we’re in a new directory, we will need to initialize Terraform here once again.
Ensure you are in the manual
directory:
- The expected output would be ~/o11y-lambda-workshop/manual
If you are not in the manual
directory, run the following command:
cd ~/o11y-lambda-workshop/manual
Run the following command to initialize Terraform in this directory
Deploy the Lambda functions and other AWS resources
Let’s go ahead and deploy those resources again as well!
Run the terraform plan command, ensuring there are no issues.
Follow up with the terraform apply command to deploy the Lambda functions and other supporting resources from the main.tf file:
As you can tell, aside from the first portion of the base_url and the log gropu ARNs, the output should be largely the same as when you ran the auto-instrumentation portion of this workshop up to this same point.
Send some traffic to the producer-lambda
endpoint (base_url)
Once more, we will send our name
and superpower
as a message to our endpoint. This will then be added to a record in our Kinesis Stream, along with our trace context.
Ensure you are in the manual
directory:
- The expected output would be ~/o11y-lambda-workshop/manual
If you are not in the manual
directory, run the following command:
cd ~/o11y-lambda-workshop/manual
Run the send_message.py
script as a background process:
nohup ./send_message.py --name CHANGEME --superpower CHANGEME &
Next, check the contents of the response.logs file for successful calls to ourproducer-lambda endpoint:
You should see the following output among the lines printed to your screen if your message is successful:
{"message": "Message placed in the Event Stream: hostname-eventStream"}
If unsuccessful, you will see:
{"message": "Internal server error"}
Important
If this occurs, ask one of the workshop facilitators for assistance.
View the Lambda Function Logs
Let’s see what our logs look like now.
Examine the logs carefully.
Workshop Question
Do you notice the difference?
Copy the Trace ID from the consumer-lambda
logs
This time around, we can see that the consumer-lambda log group is logging our message as a record
together with the tracecontext
that we propagated.
To copy the Trace ID:
- Take a look at one of the
Kinesis Message
logs. Within it, there is a data
dictionary - Take a closer look at
data
to see the nested tracecontext
dictionary - Within the
tracecontext
dictionary, there is a traceparent
key-value pair - The
traceparent
key-value pair holds the Trace ID we seek- There are 4 groups of values, separated by
-
. The Trace ID is the 2nd group of characters
- Copy the Trace ID, and save it. We will need it for a later step in this workshop
Splunk APM, Lambda Functions and Traces, Again!
In order to see the result of our context propagation outside of the logs, we’ll once again consult the Splunk APM UI.
View your Lambda Functions in the Splunk APM Service Map
Let’s take a look at the Service Map for our environment in APM once again.
In Splunk Observability Cloud:
Click on the APM
Button in the Main Menu.
Select your APM Environment from the Environment:
dropdown.
Click the Service Map
Button on the right side of the APM Overview page. This will take you to your Service Map view.
Note
Reminder: It may take a few minutes for your traces to appear in Splunk APM. Try hitting refresh on your browser until you find your environment name in the list of environments.
- You should be able to see the
producer-lambda
and consumer-lambda
functions linked by the propagated context this time!
Explore a Lambda Trace by Trace ID
Next, we will take another look at a trace related to our Environment.
- Paste the Trace ID you copied from the consumer function’s logs into the
View Trace ID
search box under Traces and click Go
Note
The Trace ID was a part of the trace context that we propagated.
You can read up on two of the most common propagation standards:
- W3C
- B3
Workshop Question
Which one are we using?
- The Splunk Distribution of Opentelemetry JS, which supports our NodeJS functions, defaults to the
W3C
standard
Workshop Question
Bonus Question: What happens if we mix and match the W3C and B3 headers?
Click on the consumer-lambda
span.
Workshop Question
Can you find the attributes from your message?
Clean Up
We are finally at the end of our workshop. Kindly clean up after yourself!
Kill the send_message
Destroy all AWS resources
Terraform is great at managing the state of our resources individually, and as a deployment. It can even update deployed resources with any changes to their definitions. But to start afresh, we will destroy the resources and redeploy them as part of the manual instrumentation portion of this workshop.
Please follow these steps to destroy your resources:
Ensure you are in the manual
directory:
- The expected output would be ~/o11y-lambda-workshop/manual
If you are not in the manual
directory, run the following command:
cd ~/o11y-lambda-workshop/manual
Destroy the Lambda functions and other AWS resources you deployed earlier:
- respond
yes
when you see the Enter a value:
prompt - This will result in the resources being destroyed, leaving you with a clean environment
Conclusion
Congratulations on finishing the Lambda Tracing Workshop! You have seen how we can complement auto-instrumentation with manual steps to have the producer-lambda
function’s context be sent to the consumer-lambda
function via a record in a Kinesis stream. This allowed us to build the expected Distributed Trace, and to contextualize the relationship between both functions in Splunk APM.
You can now build out a trace manually by linking two different functions together. This comes in handy when your auto-instrumentation, or 3rd-party systems, do not support context propagation out of the box, or when you wish to add custom attributes to a trace for more relevant trace analaysis.