Concurrency Compared: AWS Lambda, AWS App Runner, and AWS Fargate
Concurrency is one of the core principles of modern computing. When concurrency is combined with the cloud, it becomes even more powerful. In this article you’ll learn about how concurrency works across three of the compute options available on Amazon Web Services.
What is concurrency?
Concurrency is when a compute system is working on multiple things at once. For example if your computer is running multiple programs at once then it has multiple concurrent processes sharing the CPU time. If a single application process is handling multiple web requests at once, or working on multiple jobs off a queue in parallel, then the application itself is doing concurrent work.
When I first started coding about 15 years ago it was common to implement concurrency at the operating system level, instead of within the application. Many web applications were written in PHP. They used a process pool such as FastCGI Process Manager. A web request that was sent to the server would be handed off to it’s own PHP process from the pool. That process would work on that single request. If multiple requests came in at the same time then multiple PHP processes would be launched in parallel. Still, each process would only work on a single request at a time. The server was capable of handling concurrent requests by doing context switching of the PHP processes. The operating system scheduler would keep track of all the PHP processes and switch out which PHP process was actively running on the CPU to give each PHP process it’s own fair share of the CPU time when needed.
Today there are a lot more tools for concurrency. This includes modern programming languages that have powerful asynchronous concurrency mechanics built in, and compute services that help simplify concurrency. Let’s look at a few of the cloud compute services on AWS and how concurrency applies to them.
AWS Lambda: Automatic concurrency via many small instances of your code
To understand how AWS Lambda works we can refer to the documentation:
When a function is first invoked, the Lambda service creates an instance of the function and runs the handler method to process the event. After completion, the function remains available for a period of time to process subsequent events. If other events arrive while the function is busy, Lambda creates more instances of the function to handle these requests concurrently.
Source: Scaling and Concurrency in Lambda
From this explanation we can see that each instance of your Lambda function is working on a single event at a time. While working on an event the function is considered busy so any concurrent events that arrive must go to another instance of the function. Each time a new instance of the function must be created there is a short “cold start” delay. The duration of this cold start depends on the size of your code and the runtime that you are using. In many cases it will not be noticeable.
The following diagram shows how this works, and how Lambda scales out the number of function instances on the fly when there are multiple concurrent requests coming in, requiring parallel processing:
This makes AWS Lambda’s concurrency model similar in some ways to those old school PHP process managers. In both cases concurrency is achieved via launching more processes in parallel. A single process is only working on a single event or request at a time. However, there are some modern twists to AWS Lambda well. Again from the documentation:
Execution environments are isolated from one another using several container-like technologies built into the Linux kernel, along with AWS proprietary isolation technologies.
Source: Lambda Isolation Technologies
One of the classic problems of using process level concurrency was the challenge of isolating processes from each other. You had to count on the operating system to do the right thing when it came to distributing CPU time and system resources to processes. One process could be consuming too many resources and impacting the performance of other processes running on the same machine. When you use AWS Lambda it takes care of this problem for you.
Last but not least, one of the classic problems of PHP applications was that you had to manage the number of PHP processes per server, and you had to scale out the number of servers for running those processes. The huge benefit of AWS Lambda is that none of this has to be managed by you either.
With AWS Lambda you get isolated execution environments automatically. AWS Lambda manages the number of execution environments and the capacity that is used to power them. All you have to do is supply your code and send events to AWS Lambda to invoke that code.
AWS App Runner: Set and enforce concurrency limits for your application process
AWS App Runner is a serverless engine for containerized applications, which autoscales as you receive requests. You supply a containerized application that listens on a port, and App Runner gives you back a URL. You can send traffic to that URL and get a response from your application. App Runner manages the number of containers that are powering the application, and the amount of concurrent traffic each container receives.
From the documentation:
AWS App Runner automatically scales compute resources (instances) up or down for your App Runner application. Automatic scaling provides adequate request handling when incoming traffic is high, and reduces your cost when traffic slows down.
The primary dimension used for scaling in AWS App Runner is concurrency. You specify how many concurrent requests you want a particular instance of your application container to receive. Then AWS App Runner automatically distributes incoming requests to instances of your application container while staying at (or below) that target concurrency.
AWS App Runner observes how many concurrent requests each container instance is processing. Incoming requests go into a queue, which can absorb large bursts of requests, and then distribute them to the container instance at the intended concurrency rate. This ensures that container instance won’t get overloaded.
If the queue gets too full the request is retried against another instance of
the container. If the burst of traffic is extremely large, and that container’s
queue is also full, then AWS App Runner will prioritize the performance and
availability of existing container instances. It will return a 429 Too Many Requests
status code response to some of the requests until enough application instances
are launched to serve all of the traffic. This allows you
to keep application latency low and predictable, while shedding excess traffic bursts.
In AWS App Runner the individual container instances are isolated from each other. However, each container instance can process many concurrent requests, up to the limit that you specify.
AWS Fargate: You manage concurrency and scaling
AWS Fargate is serverless compute for running containers, but it leaves the concurrency and scaling up to you. In order to run your web application on AWS Fargate you use Elastic Container Service (ECS) to define a service that uses AWS Fargate as capacity. The ECS control plane helps you capture metrics such as CPU consumption, and it helps automatically keep a load balancer like Application Load Balancer in sync as your containers are started and stopped on AWS Fargate. However, it is up to you to use the metrics to define your own scaling rules. You can create scaling rules based on metrics that ECS captures, such as application CPU or memory consumption. Or you can create scaling rules based on metrics from the load balancer, such as concurrent requests or request latency. You can even create custom scaling metrics powered by your application itself. This gives you maximum control over the scaling and concurrency of your application.
AWS Fargate has strong isolation between the containers. However, AWS Fargate is similar to AWS App Runner in that each container can serve many concurrent requests. The difference is that there is no built-in concurrency limit as there is in AWS App Runner.
This means that if your load balancer receives a large spike of traffic then the requests will be distributed across all the available containers. When using an AWS Application Load Balancer there are two available routing algorithms:
- Round robin - Equally distributes requests across the available containers
- Least outstanding requests - Attempts to balance the concurrency by ensuring that each container is serving a similar number of concurrent requests.
In both cases, if there is a large spike of traffic, then containers running in AWS Fargate will receive a share of the traffic spike, even if that share is bigger than they can handle. Most application frameworks can handle such traffic spikes in the short term by using a bit more memory. Clients will see an increase in server side latency though, as the application runs out of CPU time and requests start queueing up in memory.
To recover from the situation you can use autoscaling that monitors the CPU metric for the containers. When containers exceed a threshold for CPU consumption you can increase the number of containers so that the requests are distributed across a greater number of containers.
Unlike with AWS App Runner, there is no built-in load shedding in the form of returning 429 status codes. This has both benefits and drawbacks depending on your application. One of the challenges of managing your own scaling with traffic spikes is that a significant traffic spike may cause your application to become so overloaded that it causes your load balancer healthchecks to fail as well, causing the application containers to be terminated and restarted, which in turn further exacerbates the problems caused by the original traffic spike.
Depending on your traffic patterns and the value of your traffic you may prefer the AWS Fargate model of making a best effort to serve traffic, even if the response time is longer than expected. Or you may prefer the AWS App Runner method of shedding excessive load under high concurrency spikes to keep response times low and predictable. Or you could choose to implement a hybrid of the two approaches by implementing load shedding within your own application while it is running on AWS Fargate.
Comparing concurrent compute options on AWS
With these three options for computing concurrency in mind let’s do some comparisons:
Attribute | AWS Lambda | AWS App Runner | AWS Fargate |
---|---|---|---|
Concurrency | Single concurrent request per Lambda function instance, but many separate Lambda function instances | Multiple concurrent requests per container, enforces a configurable hard limit such as 100 concurrent reqs/container | Multiple concurrent requests per container, no built-in limits on concurrency per container |
Scaling | Fully managed by AWS Lambda, default limit of 1000 concurrent executions. Scale out more function instances in under a second. | Fully managed by App Runner. Configure a concurrency limit per containerized process. Scale out more container instances in less than 1 min. | Managed by you. Scale out more container instances based on your desired metric: CPU, concurrency, or a custom metric. Scale out in less than 1 min. |
Pricing | Pay per ms of time for each individual execution. No charge when there are no executions. | Pay per second, per App Runner container, based on CPU and memory size. Container price per second is the same when serving one request or many concurrent requests. There is a discounted price when there are no requests and the CPU is not active. | Pay per second per Fargate task, based on CPU and memory. Static price for the container whether it is serving requests or not. |
Traffic Ingress | Launch your own API Gateway or load balancer and pay for it separately | Fully managed, load balancer ingress included in AWS App Runner cost | Launch your own API Gateway, load balancer, or App Mesh service mesh and pay for it separately. |
Code Format | ZIP file or container image | Container Image, or Git repo | Container Image |
So which one should I use for my application?
Ultimately this decision is up to you, but here are a few scenarios and the compute option that I’d personally choose for that scenario:
Scenario | My Choice | Why? |
---|---|---|
Periodic background job such as rebuilding the HTML for the homepage of my site with new info | AWS Lambda | The compute only runs for a couple seconds once per minute. It is also asynchronous so I don’t care about cold start delays. |
Application that receives large spikes of traffic at unexpected times, frequently going from low traffic to high traffic | AWS Lambda | The function instances on AWS Lambda will scale out to handle spikes, and scale in when spikes are done. Cold starts will likely be far less impactful than high latency from sudden traffic spikes. |
Application where each request must be isolated from each other, such as rendering a screenshot of an HTML page | AWS Lambda | The per request isolation of AWS Lambda is perfect for keeping these potentially malicious jobs isolated from each other. |
Business application that receives high traffic during the day, but no traffic at night. | AWS App Runner | During the day App Runner will scale out, and I’ll benefit from the simplified pricing model which includes the ingress, but I get automatic savings during the night. |
Development environment that serves low traffic from developer who are testing code during business hours | AWS App Runner | The application will cost less after hours when the devs stop working. Again the built-in ingress makes AWS App Runner super simple for the devs as well. |
Application that receives consistent, extremely high traffic with a predictable pattern. | AWS Fargate | I can predict the amount of containers that will be needed, and have each of them serving many concurrent requests at any given time, for an overall savings on compute cost. |
Busy production application where there are “hot” request paths being fetched by many concurrent clients | AWS Fargate | Because containers in AWS Fargate can be sized such that they are serving many hundreds of concurrent requests you can make maximum advantage of in-memory caching, downstream fetch debouncing, and other optimizations. |
These scenarios are in no way intended to be prescriptive. As always, there are many tradeoffs to consider. But hopefully these descriptions of concurrency in AWS Lambda, AWS App Runner, and AWS Fargate can help you to make an informed decision about which compute option will work best for your application.
As always if you have questions or comments please message @nathankpeck on LinkedIn if you’d like to chat!
If you are interested in learning more about the code patterns that you can utilize to take advantage of concurrency on AWS App Runner and AWS Fargate then please read: “Concurrency Deep Dive: Code Strategies for High Traffic Applications”