Build a Distributed Trace in Lambda and Kinesis

45 minutes   Author Katie Hymers

This workshop will equip you with how a distributed trace is constructed for a small serverless application that runs on AWS Lambda, producing and consuming a message via AWS Kinesis.

We will see how auto-instrumentation works with manual steps to force a Producer function’s context to be sent to Consumer function via a Record put on a Kinesis stream.

For this workshop Splunk has prepared an Ubuntu Linux instance in AWS/EC2 all pre-configured for you.

To get access to the instance that you will be using in the workshop, please visit the URL provided by the workshop leader.d

Last Modified Sep 19, 2024

Subsections of Build a Distributed Trace in Lambda and Kinesis

Setup

This lab will make a tracing superhero out of you!

In this lab you will learn how a distributed trace is constructed for a small serverless application that runs on AWS Lambda, producing and consuming your message via AWS Kinesis.

1-architecture 1-architecture

Pre-requisites

You should already have the lab content available on your EC2 lab host.

Ensure that this lab’s required folder o11y-lambda-lab is on your home directory:

cd ~ && ls
o11y-lambda-lab
Note

If you don’t see it, fetch the lab contents by running the following command:

git clone https://github.com/kdroukman/o11y-lambda-lab.git

Set Environment Variables

In your Splunk Observability Cloud Organisation (Org) obtain your Access Token and Realm Values.

Please reset your environment variables from the earlier lab. Take care that for this lab we may be using different names - make sure to match the Environment Variable names below.

export ACCESS_TOKEN=CHANGE_ME \
export REALM=CHANGE_ME \
export PREFIX=$INSTANCE

Update Auto-instrumentation serverless template

Update your auto-instrumentation Serverless template to include new values from the Enviornment variables.

cat ~/o11y-lambda-lab/auto/serverless_unset.yml | envsubst > ~/o11y-lambda-lab/auto/serverless.yml

Examine the output of the updated serverless.yml contents (you may need to scroll up to the relevant section).

cat ~/o11y-lambda-lab/auto/serverless.yml
# USER SET VALUES =====================              
custom: 
  accessToken: <updated to your Access Token>
  realm: <updated to your Realm>
  prefix: <updated to your Hostname>
#====================================== 

Update Manual instrumentation template

Update your manual instrumentation Serverless template to include new values from the Enviornment variables.

cat ~/o11y-lambda-lab/manual/serverless_unset.yml | envsubst > ~/o11y-lambda-lab/manual/serverless.yml

Examine the output of the updated serverless.yml contents (you may need to scroll up to the relevant section).

cat ~/o11y-lambda-lab/manual/serverless.yml
# USER SET VALUES =====================              
custom: 
  accessToken: <updated to your Access Token>
  realm: <updated to your Realm>
  prefix: <updated to your Hostname>
#====================================== 

Set your AWS Credentials

You will be provided with AWS Access Key ID and AWS Secret Access Key values - substitue these values in place of AWS_ACCESS_KEY_ID and AWS_ACCESS_KEY_SECRET in the bellow command:

sls config credentials --provider aws --key AWS_ACCCESS_KEY_ID --secret AWS_ACCESS_KEY_SECRET

This command will create a file ~/.aws/credentials with your AWS Credentials populated.

Note that we are using sls here, which is a Serverless framework for developing and deploying AWS Lambda functions. We will be using this command throughout the lab.

Now you are set up and ready go!

Last Modified Sep 19, 2024

Auto-Instrumentation

Auto-Instrumentation

Navigate to the auto directory that contains auto-instrumentation code.

cd ~/o11y-lambda-lab/auto

Inspect the contents of the files in this directory. Take a look at the serverless.yml template.

cat serverless.yml
Workshop Question
  • Can you identify which AWS entities are being created by this template?
  • Can you identify where OpenTelemetry instrumentation is being set up?
  • Can you determine which instrumentation information is being provided by the Environment Variables?

You should see the Splunk OpenTelemetry Lambda layer being added to each fuction.

layers:
      - arn:aws:lambda:us-east-1:254067382080:layer:splunk-apm:70

You can see the relevant layer ARNs (Amazon Resource Name) and latest versions for each AWS region here: https://github.com/signalfx/lambda-layer-versions/blob/main/splunk-apm/splunk-apm.md

You should also see a section where the Environment variables that are being set.

environment:
  AWS_LAMBDA_EXEC_WRAPPER: /opt/nodejs-otel-handler
  OTEL_RESOURCE_ATTRIBUTES: deployment.environment=${self:custom.prefix}-apm-lambda
  OTEL_SERVICE_NAME: consumer-lambda
  SPLUNK_ACCESS_TOKEN: ${self:custom.accessToken}
  SPLUNK_REALM: ${self:custom.realm}

Using the environment variables we are configuring and enriching our auto-instrumentation.

Here we provide minimum information, such as NodeJS wrapper location in the Splunk APM Layer, environment name, service name, and our Splunk Org credentials. We are sending trace data directly to Splunk Observability Cloud. You could alternatively export traces to an OpenTelemetry Collector set up in Gateway mode.

Take a look at the function code.

cat handler.js
Workshop Question
  • Can you identify the code for producer function?
  • Can you identify the code for consumer function?

Notice there is no mention of Splunk or OpenTelemetry in the code. We are adding the instrumentation using the Lambda layer and Environment Variables only.

Deploy your Lambdas

Run the following command to deploy your Lambda Functions:

sls deploy
Deploying hostname-lambda-lab to stage dev (us-east-1)
...
...
endpoint: POST - https://randomstring.execute-api.us-east-1.amazonaws.com/dev/producer
functions:
  producer: hostname-lambda-lab-dev-producer (1.6 kB)
  consumer: hostname-lambda-lab-dev-consumer (1.6 kB)

This command will follow the instructions in your serverless.yml template to create your Lambda functions and your Kinesis stream. Note it may take a 1-2 minutes to execute.

Note

serverless.yml is in fact a CloudFormation template. CloudFormation is an infrastructure as code service from AWS. You can read more about it here - https://aws.amazon.com/cloudformation/

Check the details of your serverless functions:

sls info

Take note of your endpoint value: 2-auto-1-endpoint-value 2-auto-1-endpoint-value

Send some Traffic

Use the curl command to send a payload to your producer function. Note the command option -d is followed by your message payload.

Try changing the value of name to your name and telling the Lambda function about your superpower. Replace YOUR_ENDPOINT with the endpoint from your previous step.

curl -d '{ "name": "CHANGE_ME", "superpower": "CHANGE_ME" }' YOUR_ENDPOINT

For example:

curl -d '{ "name": "Kate", "superpower": "Distributed Tracing" }' https://xvq043lj45.execute-api.us-east-1.amazonaws.com/dev/producer

You should see the following output if your message is successful:

{"message":"Message placed in the Event Stream: hostname-eventSteam"}

If unsuccessful, you will see:

{"message": "Internal server error"}

If this occurs, ask one of the lab facilitators for assistance.

If you see a success message, generate more load: re-send that messate 5+ times. You should keep seeing a success message after each send.

Check the lambda logs output:

Producer function logs:

sls logs -f producer

Consumer function logs:

sls logs -f consumer

Examine the logs carefully.

Workshop Question

Do you see OpenTelemetry being loaded? Look out for lines with splunk-extension-wrapper.

Last Modified Sep 19, 2024

Lambdas in Splunk APM

Lambdas in Splunk APM

Now it’s time to check how your Lambda traffic has been captured in Splunk APM.

Select APM from the Main Menu and then select your APM Environment. Your APM environment should be in the format $INSTANCE-apm-lambda where the hostname value is a four letter name of your lab host. (Check it by looking at your command prompt, or by running echo $INSTANCE).

Note

It may take a few minutes for you traces to appear in Splunk APM. Try hitting refresh on your browser until you find your environement name in the list of Envrionments

3-splunk-1-filter 3-splunk-1-filter

Go to Explore the Service Map to see the Dependencies between your Lambda Functions.

3-splunk-2-explore 3-splunk-2-explore

You should be able to see the producer-lambda and the call it is making to Kinesis service.

Workshop Question

What about your consumer-lambda?

3-splunk-3-map-producer 3-splunk-3-map-producer

Click into Traces and examine some traces that container procuder function calls and traces with consumer function calls.

3-splunk-4-trace-producer.png 3-splunk-4-trace-producer.png

We can see the producer-lambda putting a Record on the Kinesis stream. But the action of consumer-function is disconnected!

This is because the Trace Context is not being propagated.

This is not something that is supported automatically Out-of-the-Box by Kinesis service at the time of this lab. Our Distributed Trace stops at Kinesis inferred service, and we can’t see the propagation any further.

Not yet…

Let’s see how we work around this in the next section of this lab.

Last Modified Sep 19, 2024

Manual Instrumentation

Manual Instrumentation

Navigate to the manual directory that contains manually instrumentated code.

cd ~/o11y-lambda-lab/manual

Inspect the contents of the files in this directory. Take a look at the serverless.yml template.

cat serverless.yml
Workshop Question

Do you see any difference from the same file in your auto directory?

You can try to compare them with a diff command:

diff ~/o11y-lambda-lab/auto/serverless.yml ~/o11y-lambda-lab/manual/serverless.yml 
19c19
< #======================================    
---
> #======================================   

There is no difference! (Well, there shouldn’t be. Ask your lab facilitator to assist you if there is)

Now compare handler.js it with the same file in auto directory using the diff command:

diff ~/o11y-lambda-lab/auto/handler.js ~/o11y-lambda-lab/manual/handler.js 

Look at all these differences!

You may wish to view the entire file with cat handler.js command and examine its content.

Notice how we are now importing some OpenTelemetry libraries directly into our function to handle some of the manual instrumenation tasks we require.

const otelapi  = require('@opentelemetry/api');
const otelcore = require('@opentelemetry/core');

We are using https://www.npmjs.com/package/@opentelemetry/api to manipulate the tracing logic in our functions. We are using https://www.npmjs.com/package/@opentelemetry/core to access the Propagator objects that we will use to manually propagate our context with.

Inject Trace Context in Producer Function

The below code executes the following steps inside the Producer function:

  1. Get the current Active Span.
  2. Create a Propagator.
  3. Initialize a context carrier object.
  4. Inject the context of the active span into the carrier object.
  5. Modify the record we are about to put on our Kinesis stream to include the carrier that will carry the active span’s context to the consumer.
const activeSpan = otelapi.trace.getSpan(otelapi.context.active());
const propagator = new otelcore.W3CTraceContextPropagator();
let carrier = {};
propagator.inject(otelapi.trace.setSpanContext(otelapi.ROOT_CONTEXT, activeSpan.spanContext()),
    carrier,
    otelapi.defaultTextMapSetter
  );
const data = "{\"tracecontext\": " + JSON.stringify(carrier) + ", \"record\":" + event.body + "}";
console.log(`Record with Trace Context added: 
  ${data}`);

Extract Trace Context in Consumer Function

The bellow code executes the following steps inside the Consumer function:

  1. Extract the context that we obtained from the Producer into a carrier object.
  2. Create a Propagator.
  3. Extract the context from the carrier object in Customer function’s parent span context.
  4. Start a new span with the parent span context.
  5. Bonus: Add extra attributes to your span, including custom ones with the values from your message!
  6. Once completed, end the span.
const carrier = JSON.parse( message ).tracecontext;
const propagator = new otelcore.W3CTraceContextPropagator();
const parentContext = propagator.extract(otelapi.ROOT_CONTEXT, carrier, otelapi.defaultTextMapGetter);
const tracer = otelapi.trace.getTracer(process.env.OTEL_SERVICE_NAME);
const span = tracer.startSpan("Kinesis.getRecord", undefined, parentContext);
                         
span.setAttribute("span.kind", "server");
const body = JSON.parse( message ).record;
if (body.name) {
    span.setAttribute("custom.tag.name", body.name);
}
 if (body.superpower) {
    span.setAttribute("custom.tag.superpower", body.superpower);
}
  --- function does some work
 span.end();

Now let’s see the difference this makes.

Last Modified Sep 19, 2024

Redeploy Lambdas

Re-deploy your Lambdas

While remaining in your manual directory, run the following commandd to re-deploy your Lambda Functions:

sls deploy -f producer
Deploying function producer to stage dev (us-east-1)

✔ Function code deployed (6s)
Configuration did not change. Configuration update skipped. (6s)
sls deploy -f consumer
Deploying function consumer to stage dev (us-east-1)

✔ Function code deployed (6s)
Configuration did not change. Configuration update skipped. (6s)

Note that this deployment now only updates the code changes within the function. Our configuration remains the same.

Check the details of your serverless functions:

sls info

You endpoint value should remain the same:

5-redeploy-1-endpoint-value 5-redeploy-1-endpoint-value

Send some Traffic again

Use the curl command to send a payload to your producer function. Note the command option -d is followed by your message payload.

Try changing the value of name to your name and telling the Lambda function about your superpower. Replace YOUR_ENDPOINT with the endpoint from your previous step.

curl -d '{ "name": "CHANGE_ME", "superpower": "CHANGE_ME" }' YOUR_ENDPOINT

For example:

curl -d '{ "name": "Kate", "superpower": "Distributed Tracing" }' https://xvq043lj45.execute-api.us-east-1.amazonaws.com/dev/producer

You should see the following output if your message is successful:

{"message":"Message placed in the Event Stream: hostname-eventSteam"}

If unsuccessful, you will see:

{"message": "Internal server error"}

If this occurs, ask one of the lab facilitators for assistance.

If you see a success message, generate more load: re-send that messate 5+ times. You should keep seeing a success message after each send.

Check the lambda logs output:

sls logs -f producer
sls logs -f consumer

Examine the logs carefully.

Workshop Question

Do you notice the difference?

Note that we are logging our Record together with the Trace context that we have added to it. Copy one of the underlined sub-sections of your trace parent context, and save it for later.

5-redeploy-2-logs 5-redeploy-2-logs

Last Modified Sep 19, 2024

Updated Lambdas in Splunk APM

Navigate back to APM in Splunk Observabilty Cloud

Go back to your Service Dependency map.

Workshop Question

Notice the difference?

6-updated-1-map 6-updated-1-map

You should be able to see the consumer-lambda now clearly connected to the producer-lambda.

Remember the value you copied from your producer logs? You can run sls logs -f consumer command again on your EC2 lab host to fetch one.

Take that value, and paste it into trace search:

6-updated-2-trace-search 6-updated-2-trace-search

Click on Go and you should be able to find the logged Trace:

6-updated-3-trace 6-updated-3-trace

Notice that the Trace ID is something that makes up the trace context that we propagated.

You can read up on the two common propagation standards:

  1. W3C: https://www.w3.org/TR/trace-context/#traceparent-header
  2. B3: https://github.com/openzipkin/b3-propagation#overall-process
Workshop Question

Which one are we using?

It should be self-explanatory from the Propagator we are creating in the Functions

Workshop Question

Bonus Question: What happens if we mix and match the W3C and B3 headers?

Expand the consumer-lambda span.

Workshop Question

Can you find the attributes from your message?

6-updated-4-attributes 6-updated-4-attributes

Last Modified Sep 19, 2024

Summary

Before you Go

Please kindly clean up your lab using the following command:

sls remove

Conclusion

Congratuations on finishing the lab. You have seen how we complement auto-instrumentation with manual steps to force Producer function’s context to be sent to Consumer function via a Record put on a Kinesis stream. This allowed us to build the expected Distributed Trace.

7-conclusion-1-architecture 7-conclusion-1-architecture

You can now build out a Trace manually by linking two different functions together. This is very powerful when your auto-instrumenation, or third-party systems, do not support context propagation out of the box.