This workshop will equip you to build a distributed trace for a small serverless application that runs on AWS Lambda, producing and consuming a message via AWS Kinesis.
First, we will see how OpenTelemetry’s auto-instrumentation captures traces and exports them to your target of choice.
Then, we will see how we can enable context propagation with manual instrumentation.
For this workshop Splunk has prepared an Ubuntu Linux instance in AWS/EC2 all pre-configured for you. To get access to that instance, please visit the URL provided by the workshop leader.
Subsections of Lambda Tracing
Setup
Prerequisites
Observability Workshop Instance
The Observability Workshop uses the Splunk4Ninjas - Observability workshop template in Splunk Show,
which provides a pre-configured EC2 instance running Ubuntu.
Your workshop instructor will provide you with the credentials to your assigned workshop instance.
Your instance should have the following environment variables already set:
ACCESS_TOKEN
REALM
These are the Splunk Observability Cloud Access Token and Realm for your workshop.
They will be used by the OpenTelemetry Collector to forward your data to the correct Splunk Observability Cloud organization.
Note
Alternatively, you can deploy a local observability workshop instance using Multipass.
AWS Command Line Interface (awscli)
The AWS Command Line Interface, or awscli, is an API used to interact with AWS resources. In this workshop, it is used by certain scripts to interact with the resource you’ll deploy.
Your Splunk-issued workshop instance should already have the awscli installed.
Check if the aws command is installed on your instance with the following command:
which aws
The expected output would be /usr/local/bin/aws
If the aws command is not installed on your instance, run the following command:
sudo apt install awscli
Terraform
Terraform is an Infrastructure as Code (IaC) platform, used to deploy, manage and destroy resource by defining them in configuration files. Terraform employs HCL to define those resources, and supports multiple providers for various platforms and technologies.
We will be using Terraform at the command line in this workshop to deploy the following resources:
AWS API Gateway
Lambda Functions
Kinesis Stream
CloudWatch Log Groups
S3 Bucket
and other supporting resources
Your Splunk-issued workshop instance should already have terraform installed.
Check if the terraform command is installed on your instance:
which terraform
The expected output would be /usr/local/bin/terraform
If the terraform command is not installed on your instance, follow Terraform’s recommended installation commands listed below:
The Workshop Directory lambda is a repository that contains all the configuration files and scripts to complete both the auto-instrumentation and manual instrumentation of the example Lambda-based application we will be using today.
Confirm you have the workshop directory in your home directory:
cd ~/workshop && ls
The expected output would include lambda
AWS & Terraform Variables
AWS
Note to the workshop instructor: create a new user in the target AWS account called lambda-workshop-user.
Ensure it has full permissions to perform the required actions via Terraform. Create an access token for the lambda-workshop-user
user and share the Access Key ID and Secret Access Key with the workshop participants. Delete the user
when the workshop is complete.
The AWS CLI requires that you have credentials to be able to access and manage resources deployed by their services. Both Terraform and the Python scripts in this workshop require these variables to perform their tasks.
Configure the awscli with the access key ID, secret access key and region for this workshop:
aws configure
This command should provide a prompt similar to the one below:
AWS Access Key ID [None]: XXXXXXXXXXXXXXXX
AWS Secret Access Key [None]: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Default region name [None]: us-east-1
Default outoput format [None]:
If the awscli is not configured on your instance, run the following command and provide the values your instructor would provide you with.
aws configure
Create an IAM Role (Workshop Instructor Only)
Note to the workshop instructor: This step only needs to be completed once, as the IAM role created
in this step will be shared by all workshop participants:
cd ~/workshop/lambda/iam_role
terraform init
terraform plan
terraform apply
Note to the workshop instructor: After the workshop is complete, cleanup the role as follows:
cd ~/workshop/lambda/iam_role
terraform destroy
Terraform
Terraform supports the passing of variables to ensure sensitive or dynamic data is not hard-coded in your .tf configuration files, as well as to make those values reusable throughout your resource definitions.
In our workshop, Terraform requires variables necessary for deploying the Lambda functions with the right values for the OpenTelemetry Lambda layer; For the ingest values for Splunk Observability Cloud; And to make your environment and resources unique and immediatley recognizable.
Terraform variables are defined in the following manner:
Define the variables in your main.tf file or a variables.tf
Set the values for those variables in either of the following ways:
setting environment variables at the host level, with the same variable names as in their definition, and with TF_VAR_ as a prefix
setting the values for your variables in a terraform.tfvars file
passing the values as arguments when running terraform apply
We will be using a combination of variables.tf and terraform.tfvars files to set our variables in this workshop.
Using either vi or nano, open the terraform.tfvars file in either the auto or manual directory
vi ~/workshop/lambda/auto/terraform.tfvars
Set the variables with their values. Replace the CHANGEME placeholders with those provided by your instructor.
Ensure you change only the placeholders, leaving the quotes and brackets intact, where applicable.
_For the otel_lambda_layer, use the value for us-east-1 found here
The prefix is a unique identifier you can choose for yourself, to make your resources distinct from other participants’ resources. We suggest using a short form of your name, for example.
Also, please only lowercase letters for the prefix. Certain resources in AWS, such as S3, would through an error if you use uppercase letters.
Save your file and exit the editor.
Finally, copy the terraform.tfvars file you just edited to the other directory.
We do this as we will be using the same values for both the autoinstrumentation and manual instrumentation protions of the workshop
File Permissions
While all other files are fine as they are, the send_message.py script in both the auto and manual will have to be executed as part of our workshop. As a result, it needs to have the appropriate permissions to run as expected. Follow these instructions to set them.
First, ensure you are in the lambda directory:
cd ~/workshop/lambda
Next, run the following command to set executable permissions on the send_message.py script:
Now that we’ve squared off the prerequisites, we can get started with the workshop!
Auto-Instrumentation
The first part of our workshop will demonstrate how auto-instrumentation with OpenTelemetry allows the OpenTelemetry Collector to auto-detect what language your function is written in, and start capturing traces for those functions.
The Auto-Instrumentation Workshop Directory & Contents
First, let us take a look at the workshop/lambda/auto directory, and some of its files. This is where all the content for the auto-instrumentation portion of our workshop resides.
The auto Directory
Run the following command to get into the workshop/lambda/auto directory:
cd ~/workshop/lambda/auto
Inspect the contents of this directory:
ls
The output should include the following files and directories:
By using these environment variables, we are configuring our auto-instrumentation in a few ways:
We are setting environment variables to inform the OpenTelemetry collector of which Splunk Observability Cloud organization we would like to have our data exported to.
We are also setting variables that help OpenTelemetry identify our function/service, as well as the environment/application it is a part of.
OTEL_SERVICE_NAME="producer-lambda"# consumer-lambda in the case of the consumer functionOTEL_RESOURCE_ATTRIBUTES="deployment.environment=${var.prefix}-lambda-shop"
We are setting an environment variable that lets OpenTelemetry know what wrappers it needs to apply to our function’s handler so as to capture trace data automatically, based on our code language.
These values are sourced from the environment variables we set in the Prerequisites section, as well as resources that will be deployed as a part of this Terraform configuration file.
You should also see an argument for setting the Splunk OpenTelemetry Lambda layer on each function
layers= var.otel_lambda_layer
The OpenTelemetry Lambda layer is a package that contains the libraries and dependencies necessary to collector, process and export telemetry data for Lambda functions at the moment of invocation.
While there is a general OTel Lambda layer that has all the libraries and dependencies for all OpenTelemetry-supported languages, there are also language-specific Lambda layers, to help make your function even more lightweight.
You can see the relevant Splunk OpenTelemetry Lambda layer ARNs (Amazon Resource Name) and latest versions for each AWS region HERE
The producer.mjs file
Next, let’s take a look at the producer-lambda function code:
Run the following command to view the contents of the producer.mjs file:
cat ~/workshop/lambda/auto/handler/producer.mjs
This NodeJS module contains the code for the producer function.
Essentially, this function receives a message, and puts that message as a record to the targeted Kinesis Stream
Deploying the Lambda Functions & Generating Trace Data
Now that we are familiar with the contents of our auto directory, we can deploy the resources for our workshop, and generate some trace data from our Lambda functions.
Initialize Terraform in the auto directory
In order to deploy the resources defined in the main.tf file, you first need to make sure that Terraform is initialized in the same folder as that file.
Ensure you are in the auto directory:
pwd
The expected output would be ~/workshop/lambda/auto
If you are not in the auto directory, run the following command:
cd ~/workshop/lambda/auto
Run the following command to initialize Terraform in this directory
terraform init
This command will create a number of elements in the same folder:
.terraform.lock.hcl file: to record the providers it will use to provide resources
.terraform directory: to store the provider configurations
In addition to the above files, when terraform is run using the apply subcommand, the terraform.tfstate file will be created to track the state of your deployed resources.
These enable Terraform to manage the creation, state and destruction of resources, as defined within the main.tf file of the auto directory
Deploy the Lambda functions and other AWS resources
Once we’ve initialized Terraform in this directory, we can go ahead and deploy our resources.
First, run the terraform plan command to ensure that Terraform will be able to create your resources without encountering any issues.
terraform plan
This will result in a plan to deploy resources and output some data, which you can review to ensure everything will work as intended.
Do note that a number of the values shown in the plan will be known post-creation, or are masked for security purposes.
Next, run the terraform apply command to deploy the Lambda functions and other supporting resources from the main.tf file:
terraform apply
Respond yes when you see the Enter a value: prompt
Terraform outputs are defined in the outputs.tf file.
These outputs will be used programmatically in other parts of our workshop, as well.
Send some traffic to the producer-lambda URL (base_url)
To start getting some traces from our deployed Lambda functions, we would need to generate some traffic. We will send a message to our producer-lambda function’s endpoint, which should be put as a record into our Kinesis Stream, and then pulled from the Stream by the consumer-lambda function.
Ensure you are in the auto directory:
pwd
The expected output would be ~/workshop/lambda/auto
If you are not in the auto directory, run the following command
cd ~/workshop/lambda/auto
The send_message.py script is a Python script that will take input at the command line, add it to a JSON dictionary, and send it to your producer-lambda function’s endpoint repeatedly, as part of a while loop.
Run the send_message.py script as a background process
You should see an output similar to the following if your message is successful
[1]79829user@host manual % appending output to nohup.out
The two most import bits of information here are:
The process ID on the first line (79829 in the case of my example), and
The appending output to nohup.out message
The nohup command ensures the script will not hang up when sent to the background. It also captures the curl output from our command in a nohup.out file in the same folder as the one you’re currently in.
The & tells our shell process to run this process in the background, thus freeing our shell to run other commands.
Next, check the contents of the response.logs file, to ensure your output confirms your requests to your producer-lambda endpoint are successful:
cat response.logs
You should see the following output among the lines printed to your screen if your message is successful:
{"message": "Message placed in the Event Stream: {prefix}-lambda_stream"}
If unsuccessful, you will see:
{"message": "Internal server error"}
Important
If this occurs, ask one of the workshop facilitators for assistance.
View the Lambda Function Logs
Next, let’s take a look at the logs for our Lambda functions.
To view your producer-lambda logs, check the producer.logs file:
cat producer.logs
To view your consumer-lambda logs, check the consumer.logs file:
cat consumer.logs
Examine the logs carefully.
Workshop Question
Do you see OpenTelemetry being loaded? Look out for the lines with splunk-extension-wrapper
Consider running head -n 50 producer.logs or head -n 50 consumer.logs to see the splunk-extension-wrapper being loaded.
Splunk APM, Lambda Functions & Traces
The Lambda functions should be generating a sizeable amount of trace data, which we would need to take a look at. Through the combination of environment variables and the OpenTelemetry Lambda layer configured in the resource definition for our Lambda functions, we should now be ready to view our functions and traces in Splunk APM.
View your Environment name in the Splunk APM Overview
Let’s start by making sure that Splunk APM is aware of our Environment from the trace data it is receiving. This is the deployment.name we set as part of the OTEL_RESOURCE_ATTRIBUTES variable we set on our Lambda function definitions in main.tf. It was also one of the outputs from the terraform apply command we ran earlier.
In Splunk Observability Cloud:
Click on the APM Button from the Main Menu on the left. This will take you to the Splunk APM Overview.
Select your APM Environment from the Environment: dropdown.
Your APM environment should be in the PREFIX-lambda-shop format, where the PREFIX is obtained from the environment variable you set in the Prerequisites section
Note
It may take a few minutes for your traces to appear in Splunk APM. Try hitting refresh on your browser until you find your environment name in the list of environments.
View your Environment’s Service Map
Once you’ve selected your Environment name from the Environment drop down, you can take a look at the Service Map for your Lambda functions.
Click the Service Map Button on the right side of the APM Overview page. This will take you to your Service Map view.
You should be able to see the producer-lambda function and the call it is making to the Kinesis Stream to put your record.
Workshop Question
What about your consumer-lambda function?
Explore the Traces from your Lambda Functions
Click the Traces button to view the Trace Analyzer.
On this page, we can see the traces that have been ingested from the OpenTelemetry Lambda layer of your producer-lambda function.
Select a trace from the list to examine by clicking on its hyperlinked Trace ID.
We can see that the producer-lambda function is putting a record into the Kinesis Stream. But the action of the consumer-lambda function is missing!
This is because the trace context is not being propagated. Trace context propagation is not supported out-of-the-box by Kinesis service at the time of this workshop. Our distributed trace stops at the Kinesis service, and because its context isn’t automatically propagated through the stream, we can’t see any further.
Not yet, at least…
Let’s see how we work around this in the next section of this workshop. But before that, let’s clean up after ourselves!
Clean Up
The resources we deployed as part of this auto-instrumenation exercise need to be cleaned. Likewise, the script that was generating traffice against our producer-lambda endpoint needs to be stopped, if it’s still running. Follow the below steps to clean up.
Kill the send_message
If the send_message.py script is still running, stop it with the follwing commands:
fg
This brings your background process to the foreground.
Next you can hit [CONTROL-C] to kill the process.
Destroy all AWS resources
Terraform is great at managing the state of our resources individually, and as a deployment. It can even update deployed resources with any changes to their definitions. But to start afresh, we will destroy the resources and redeploy them as part of the manual instrumentation portion of this workshop.
Please follow these steps to destroy your resources:
Ensure you are in the auto directory:
pwd
The expected output would be ~/workshop/lambda/auto
If you are not in the auto directory, run the following command:
cd ~/workshop/lambda/auto
Destroy the Lambda functions and other AWS resources you deployed earlier:
terraform destroy
respond yes when you see the Enter a value: prompt
This will result in the resources being destroyed, leaving you with a clean environment
This process will leave you with the files and directories created as a result of our activity. Do not worry about those.
Manual Instrumentation
The second part of our workshop will focus on demonstrating how manual instrumentation with OpenTelemetry empowers us to enhance telemetry collection. More specifically, in our case, it will enable us to propagate trace context data from the producer-lambda function to the consumer-lambda function, thus enabling us to see the relationship between the two functions, even across Kinesis Stream, which currently does not support automatic context propagation.
The Manual Instrumentation Workshop Directory & Contents
Once again, we will first start by taking a look at our operating directory, and some of its files. This time, it will be workshop/lambda/manual directory. This is where all the content for the manual instrumentation portion of our workshop resides.
The manual directory
Run the following command to get into the workshop/lambda/manual directory:
cd ~/workshop/lambda/manual
Inspect the contents of this directory with the ls command:
ls
The output should include the following files and directories:
Here also, there are a few differences of note. Let’s take a closer look
cat handler/consumer.mjs
In this file, we are importing the following @opentelemetry/api objects:
propagation
trace
ROOT_CONTEXT
We use these to extract the trace context that was propagated from the producer function
Then to add new span attributes based on our name and superpower to the extracted trace context
Propagating the Trace Context from the Producer Function
The below code executes the following steps inside the producer function:
Get the tracer for this trace
Initialize a context carrier object
Inject the context of the active span into the carrier object
Modify the record we are about to pu on our Kinesis stream to include the carrier that will carry the active span’s context to the consumer
...import{context,propagation,trace,}from"@opentelemetry/api";...consttracer=trace.getTracer('lambda-app');...returntracer.startActiveSpan('put-record',async(span)=>{letcarrier={};propagation.inject(context.active(),carrier);consteventBody=Buffer.from(event.body,'base64').toString();constdata="{\"tracecontext\": "+JSON.stringify(carrier)+", \"record\": "+eventBody+"}";console.log(`Record with Trace Context added:
${data}`);try{awaitkinesis.send(newPutRecordCommand({StreamName:streamName,PartitionKey:"1234",Data:data,}),message=`Message placed in the Event Stream: ${streamName}`)...span.end();
Extracting Trace Context in the Consumer Function
The below code executes the following steps inside the consumer function:
Extract the context that we obtained from producer-lambda into a carrier object.
Extract the tracer from current context.
Start a new span with the tracer within the extracted context.
Bonus: Add extra attributes to your span, including custom ones with the values from your message!
Deploying Lambda Functions & Generating Trace Data
Now that we know how to apply manual instrumentation to the functions and services we wish to capture trace data for, let’s go about deploying our Lambda functions again, and generating traffic against our producer-lambda endpoint.
Initialize Terraform in the manual directory
Seeing as we’re in a new directory, we will need to initialize Terraform here once again.
Ensure you are in the manual directory:
pwd
The expected output would be ~/workshop/lambda/manual
If you are not in the manual directory, run the following command:
cd ~/workshop/lambda/manual
Run the following command to initialize Terraform in this directory
terraform init
Deploy the Lambda functions and other AWS resources
Let’s go ahead and deploy those resources again as well!
Run the terraform plan command, ensuring there are no issues.
terraform plan
Follow up with the terraform apply command to deploy the Lambda functions and other supporting resources from the main.tf file:
terraform apply
Respond yes when you see the Enter a value: prompt
As you can tell, aside from the first portion of the base_url and the log gropu ARNs, the output should be largely the same as when you ran the auto-instrumentation portion of this workshop up to this same point.
Send some traffic to the producer-lambda endpoint (base_url)
Once more, we will send our name and superpower as a message to our endpoint. This will then be added to a record in our Kinesis Stream, along with our trace context.
Ensure you are in the manual directory:
pwd
The expected output would be ~/workshop/lambda/manual
If you are not in the manual directory, run the following command:
cd ~/workshop/lambda/manual
Run the send_message.py script as a background process:
Next, check the contents of the response.logs file for successful calls to ourproducer-lambda endpoint:
cat response.logs
You should see the following output among the lines printed to your screen if your message is successful:
{"message": "Message placed in the Event Stream: hostname-eventStream"}
If unsuccessful, you will see:
{"message": "Internal server error"}
Important
If this occurs, ask one of the workshop facilitators for assistance.
View the Lambda Function Logs
Let’s see what our logs look like now.
Check the producer.logs file:
cat producer.logs
And the consumer.logs file:
cat consumer.logs
Examine the logs carefully.
Workshop Question
Do you notice the difference?
Copy the Trace ID from the consumer.logs file
This time around, we can see that the consumer-lambda log group is logging our message as a record together with the tracecontext that we propagated.
To copy the Trace ID:
Take a look at one of the Kinesis Message logs. Within it, there is a data dictionary
Take a closer look at data to see the nested tracecontext dictionary
Within the tracecontext dictionary, there is a traceparent key-value pair
The traceparent key-value pair holds the Trace ID we seek
There are 4 groups of values, separated by -. The Trace ID is the 2nd group of characters
Copy the Trace ID, and save it. We will need it for a later step in this workshop
Splunk APM, Lambda Functions and Traces, Again!
In order to see the result of our context propagation outside of the logs, we’ll once again consult the Splunk APM UI.
View your Lambda Functions in the Splunk APM Service Map
Let’s take a look at the Service Map for our environment in APM once again.
In Splunk Observability Cloud:
Click on the APM Button in the Main Menu.
Select your APM Environment from the Environment: dropdown.
Click the Service Map Button on the right side of the APM Overview page. This will take you to your Service Map view.
Note
Reminder: It may take a few minutes for your traces to appear in Splunk APM. Try hitting refresh on your browser until you find your environment name in the list of environments.
Workshop Question
Notice the difference?
You should be able to see the producer-lambda and consumer-lambda functions linked by the propagated context this time!
Explore a Lambda Trace by Trace ID
Next, we will take another look at a trace related to our Environment.
Paste the Trace ID you copied from the consumer function’s logs into the View Trace ID search box under Traces and click Go
Note
The Trace ID was a part of the trace context that we propagated.
You can read up on two of the most common propagation standards:
The Splunk Distribution of Opentelemetry JS, which supports our NodeJS functions, defaults to the W3C standard
Workshop Question
Bonus Question: What happens if we mix and match the W3C and B3 headers?
Click on the consumer-lambda span.
Workshop Question
Can you find the attributes from your message?
Clean Up
We are finally at the end of our workshop. Kindly clean up after yourself!
Kill the send_message
If the send_message.py script is still running, stop it with the follwing commands:
fg
This brings your background process to the foreground.
Next you can hit [CONTROL-C] to kill the process.
Destroy all AWS resources
Terraform is great at managing the state of our resources individually, and as a deployment. It can even update deployed resources with any changes to their definitions. But to start afresh, we will destroy the resources and redeploy them as part of the manual instrumentation portion of this workshop.
Please follow these steps to destroy your resources:
Ensure you are in the manual directory:
pwd
The expected output would be ~/workshop/lambda/manual
If you are not in the manual directory, run the following command:
cd ~/workshop/lambda/manual
Destroy the Lambda functions and other AWS resources you deployed earlier:
terraform destroy
respond yes when you see the Enter a value: prompt
This will result in the resources being destroyed, leaving you with a clean environment
Conclusion
Congratulations on finishing the Lambda Tracing Workshop! You have seen how we can complement auto-instrumentation with manual steps to have the producer-lambda function’s context be sent to the consumer-lambda function via a record in a Kinesis stream. This allowed us to build the expected Distributed Trace, and to contextualize the relationship between both functions in Splunk APM.
You can now build out a trace manually by linking two different functions together. This comes in handy when your auto-instrumentation, or 3rd-party systems, do not support context propagation out of the box, or when you wish to add custom attributes to a trace for more relevant trace analaysis.