Nathan Peck
Nathan Peck
Senior Developer Advocate for Generative AI at Amazon Web Services
Aug 31, 2017 3 min watch

Amazon ECS: Autoscaling

I just published a video on the official Amazon Web Service’s YouTube channel, explaining how to autoscale a service in Amazon Elastic Container Service:

 

Script:

Hey all, I’m Nathan Peck, and I’m a developer advocate for EC2 Container Service.

In the last couple segments we took an initial look at the core concepts of ECS, and then covered how to get network traffic to your containers using an Application Load Balancer which is automatically managed by ECS.

Once you have your containers running on ECS, and HTTP requests flowing into your containers the next step is scaling. Most services will see variable request volume throughout the day and week. Fortunately ECS has autoscaling features built in that are designed to help your ECS managed services react to variable conditions.

In this panel you can see that ECS has been capturing statistics such as CPU and memory load for my service. These stats are all being piped into Cloudwatch, and here you can see that I have created alarms based on those stats, and those alarms are tied to scaling actions.

So when CPU usage for this service is greater than 80% ECS will automatically launch another task to distribute the load across, but if the CPU usage dips below 20% then it will start removing tasks from this service to free up capacity on the cluster to be used for other purposes. Lets see what this autoscaling looks like when the service is under load:

In this dashboard I created you can see an overview of all the stats for all the services in the cluster, in one place. You can see from this request count graph at the top that I’m currently running a load test that ramps up every 30 minutes to approximately 40k requests per minute.

But this total request volume is being distributed across a number of different services, so this service is doing roughly 9k requests per minute, at the same time that this one is only doing 4k requests per minute. And some of these services are doing more CPU intensive jobs. For example the auth service has to do 10 round bcrypt hashing of password plaintexts in order to validate login attempts. That is a very CPU intensive task. Each service is going to have a different level of CPU demand that grows at a different rate as the load test progresses.

Each service has its own autoscaling policies that are operating independently, and that’s what you can see from this graph of CPU usage per service. The graph looks somewhat sawtooth. Each time the CPU for a particular service crosses the threshold it drops back down as ECS launches more containers to distribute the load across. If the request volume drops back down it will cause the CPU usage metric to drop as well, and ECS will respond by scaling back the number of containers. This autoscaling integrates seamlessly with the Application Load Balancer to avoid dropped requests. When ECS is scaling your service down it transitions one or more containers into a DRAINING state so that the load balancer stops sending new requests to that container. Once the container has finished handling all inflight connections ECS stops the container. In summary ECS has autoscaling abilities that give you another level of autoscaling beyond just the number of instances. ECS autoscaling is ideal for use with reserved instances. To get the most value out of your reserved isntances you want to run the at all times, but you also want to get the best utilization out of them. By scaling services up and down ECS can ensure that your reserved instances are being well utilized throughout the day by different workloads, perhaps web traffic during the day, and background batch processing at night.

And best of all ECS automates everything so that its hands off, no developer intervention required.