How to invoke Lambda parallelly with MSK trigger?
Why we are writing this blog when they already have proper documentation on each AWS’s service, because of the highlighted text below;
So, let’s start with the introduction & requirement and then we will proceed with the demonstration.
We are from the FinTech family and the FinTech world works on a scale. Back in July 2021, We at Hexaview were working on a new project to manage millions of reconciliation and rebalance transactions.
The ask was:
- Each recon/rebalance transaction should be immutable (a separate process that comes up, does the job, and gets killed).
- Few million reconciliations are expected in a few minutes.
- There could be traffic spikes (up to a few hundred requests per second).
So, if you see our requirements we needed the system which was capable of doing a million reconciliations in few minutes. Our organization is based on AWS cloud and in AWS we have multiple services like EC2, ECS, EKS, Lambda, Fargate, etc. The main problem was rebalancing transactions.
Due to high scale and time-critical recon and rebalance transactions (where traffic spikes are normal), a queue-based solution was preferred. Multiple POC’s were executed comparing the queuing system and the compute services.
We picked two Amazon services MSK and Lambda to create our system.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that makes it easy for you to build and run applications that use Apache Kafka to process streaming and event data.
AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes. The main task was to run multiple lambda invocations in parallel based on predefined project criteria/attributes and MSK triggers.
Proposed Architecture
How does Lambda work with MSK?
In 2020 lambda started supporting MSK triggers. In simple terms, whenever a message would come into a Kafka topic, a Lambda would be triggered to take care of that message from the MSK itself.
Lambda reads the messages from the MSK in batches and provides these to your function as an event payload. The maximum batch size is configurable. (The default batch size is 100 messages.) From the immutability perspective, we kept the batch size as 1 (1 lambda = 1 rebalance)
So, At this point everything seems perfect we have MSK for queuing and lambda which comes with a lot of features like you do not need to think about infra and auto-scaling.
The result from our first attempt:
We started producing messages on MSK topic and lambda was configured with MSK trigger but the problem was lambda function was not getting invoked parallelly.
So, we started reading AWS documents then this paragraph came into the picture “Every 15 minutes, Lambda evaluates the consumer offset lag of all the partitions in the topic. If the lag is too high, the partition is receiving messages faster than Lambda can process them. If necessary, Lambda adds or removes consumers from the topic.” From this point, we changed our approach and we were able to invoke lambda parallelly.
Note: In this blog, we are not discussing NETWORK, INFRA, DATA, or SECURITY.
Let’s start our demonstration “How to invoke lambda parallelly with MSK trigger”.
Test Bed
Lambda Configurations
- Runtime: Python 3.6
- Memory: 128 MB
Trigger Configuration
- Trigger type: MSK
- Batch size: 1
- Starting Position: LATEST
MSK Configuration
- Broker Type: kafka.t3.small
- Broker Per Zone: 1
- Number of Brokers: 2
- 1 Lambda Invocation = 1 Recon/Rebalance immutable process
There were some variables in the test;
- Partitions per topic: 10(case1), 20(case2), 100(case3)
- Time taken by each lambda (3sec)
Test Case(s)
We took constant message creation rate in all scenarios.
CASE 1
Two simple lines of code were used for lambda function to execute this scenario and graph is showing the result:
Code Block
“Import time
time.sleep(3)”
Calculation and value
Message Production Rate = 2 messages per second for 2000 seconds (Total of 4000 messages sent in 2000 seconds at a consistent rate)
Message Consumption Rate = lambda rate. Each lambda took exactly 3 seconds
Number of Partitions = 10
20 * 3 = 60 …It means only one lambda executing at any point in time.
Conclusion Summary
- For the first 15 minutes, only one Lambda runs at any point in time.
- After 15 minutes, parallel lambdas were invoked (with Max invocation <= number of partitions).
CASE 2
5 messages per second, messages sent duration: 2000 seconds, number of partitions: 20
Message Production Rate = 5 messages per second for 2000 seconds
Message Consumption Rate = lambda rate. Each lambda took exactly 3 seconds
CASE 3
5 messages per second, messages sent duration: 2000 seconds, number of partitions: 100
Message Production Rate = 5 messages per second for 2000 seconds
Message Consumption Rate = lambda rate. Each lambda took exactly 3 seconds
So, as we can see, the entire Lambda Execution process can be split into three phases.
- Tear Up Phase: First 15 minutes
- Max Throughput Phase
- Cool Down Phase: Last 15 minutes.
Number of Portfolios that can be processed in first 15 minutes = 900/ Average time taken by a Portfolio.
Maximum Number of Portfolios that can be processed per minute after 15 minutes = Number of Partitions 60 / Average time taken by a Portfolio (This can vary as per the input message flight rate).
Calm Down Time ~ 15 minutes
Conclusions
We ran the test in multiple cases and conclusions are as follows:
- During the first 15 minutes, only one lambda can run in parallel irrespective of the number of partitions in a topic.
- Hence, the number of lambdas that can be executed in first 15 minutes = 900 / AverageTimeTakenByLambdas(sec).
- Every 13-15 minutes, Lambda trigger revalidates the incoming message rate and increases the parallel lambdas invocations.
- After 15 minutes, the maximum number of lambdas that can be invoked = number of partitions.
For more exciting work that Hexaview does, please get in touch with us.
References
Partition: https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html
MSK: https://docs.aws.amazon.com/lambda/latest/dg/with-msk.html