Auto Scaling services on Swarm — I

6 min readJun 4, 2021

We have all used containers. But one of the basic requirements of any client today is optimal costs. And that means keeping your infrastructure as lean as you can, as much as you can.

The background:

It's the night before Black Friday. You're logging out of work with the peace of mind that your services are all configured with replica sets and threshold limits. Black Friday will be a walk in the park. And it is. What you don't realize is that you would need to scale down your services once the weekend is over. And now you're on monitoring duty over the weekend. What if we could figure out a solution to auto-scale the services as per customer demands. That would make your life a lot easier.

The problem:

Without auto-scaling, each service is set to a fixed replica count, which will be maintained by Docker Swarm. Once Black Friday (and Cyber Monday) is over, you have extra containers running for no reason. This results in extra costs even when the services are not in use. Similarly, manual intervention is needed when the client site usage is high and services need to be scaled up.

The solution:

Implement Docker Flow Monitor along with Prometheus and Jenkins.

Prometheus will constantly check each service against a predefined threshold. If the threshold is reached, it will send an alert to Alert Manager.

The Alert Manager will send API requests to Jenkins with parameters for scaling up/down.

Jenkins will trigger a new job which will scale the service according to the request it received from Alert Manager.

Value addition:

The major benefit from this approach will be a more efficient system where services are automatically scaled up/down depending on resource demands, ultimately reducing overhead costs and downtime and eliminates the need for constant manual monitoring.

Let's get started.

Tools used:

● Docker

● Docker-Machine

● Docker Swarm

● Docker Flow Monitor

● Docker Flow Proxy

● Docker Flow Swarm Listener

● Prometheus

● Jenkins

● A demo service (written in Go)

Assumption:

Swarm cluster running with 3 nodes (named swarm-1, swarm-2 and swarm-3). If you do not have a cluster setup, please create a new cluster by following the below instructions.

Creating a Swarm cluster

Create a Swarm cluster consisting of three nodes created with Docker Machine.

git clone https://github.com/soumyadp/docker-flow.git
cd docker-flow
./scripts/dm-swarm.sh
eval $(docker-machine env swarm-1)

Next, we executed the dm-swarm.sh script that created the cluster. Finally, we used the eval command to tell our local Docker client to use the remote Docker engine swarm-1. To go back to using your local docker host, simply run: eval $(docker-machine env -u) To check if all the nodes are up and running:

Deploying Docker Flow Proxy (DFP) and Docker Flow Swarm Listener (DFSL)

Proxy is not strictly necessary. We’re using it only as a convenient way to get a single access point to the cluster instead of opening a different port for each publicly accessible service.

docker network create -d overlay proxy 
docker stack deploy \ -c stacks/docker-flow-proxy-mem.yml \ Proxy

The stack deployed two services; proxy and swarm-listener.

From now on, the proxy will be notified whenever a service is deployed or updated as long as it has the com.df.notify label set to true

If you already had a Swarm cluster and did not need a Proxy setup, start here:

Deploying Docker Flow Monitor and Alertmanager

The next stack defines Docker Flow Monitor and Alertmanager. Before we deploy the stack, we should create the monitor network that will allow Prometheus to scrape metrics from exporters and instrumented services.

docker network create -d overlay monitor

Next, we’ll create the Alert Manager configuration as a Docker secret. That way we won’t need to create a new image with configuration or mount a volume.

echo "route:
group_by: [service,scale]
repeat_interval: 5m
group_interval: 5m
receiver: 'slack'
routes:
- match:
service: 'go-demo_main'
scale: 'up'
receiver: 'jenkins-go-demo_main-up'
- match:
service: 'go-demo_main'
scale: 'down'receiver: 'jenkins-go-demo_main-down'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
title: '[{{ .Status | toUpper }}] {{ .GroupLabels.service }}
service is in danger!'
title_link: 'http://$(docker-machine ip swarm-1)/monitor/alerts'
text: '{{ .CommonAnnotations.summary}}'
api_url:
'https://hooks.slack.com/services/T308SC7HD/B59ER97SS/S0KvvyStVnIt3ZWpIa
LnqLCu'
- name: 'jenkins-go-demo_main-up'
webhook_configs:
- send_resolved: false
url: 'http://$(docker-machine ip
swarm-1)/jenkins/job/service-scale/buildWithParameters?token=DevOps22&se
rvice=go-demo_main&scale=1'
- name: 'jenkins-go-demo_main-down'
webhook_configs:
- send_resolved: false
url: 'http://$(docker-machine ip
swarm-1)/jenkins/job/service-scale/buildWithParameters?token=DevOps22&se
rvice=go-demo_main&scale=-1'
" | docker secret create alert_manager_config -

The default receiver is slack. As a result, any alert that does not match one of the routes will be sent to Slack.

The routes section defines two match entries. If the alert label service is set to go-demo_main and the label scale is ‘up’, the receiver will be jenkins-go-demo_main-up. Similarly, when the same service is associated with an alert but the scale label is set to ‘down’, the receiver will be jenkins-go-demo_main-down.

There are three receivers. The slack receiver will send notifications to Slack. As stated before, it’s used only for alerts that do not match one of the routes. Both jenkins-go-demo_main-up and jenkins-go-demo_main-down are sending a POST request to Jenkins job service-scale. The only difference between the two is in the scale parameter. One will set it to 1 indicating that the go-demo_main service should be up-scaled by one and the other will set it to -1 indicating that the service should de de-scaled by 1.

Now we can deploy the monitor stack:

DOMAIN=$(docker-machine ip swarm-1) \
docker stack deploy \
-c stacks/docker-flow-monitor-slack.yml \
monitor

Confirm that the monitor stack is up and running:

Wait for all the replicas to start before proceeding.

Deploying Exporters

Prometheus is a pull-based system. It scrapes exporters and stores metrics in its internal database. We’ll deploy the exporter stack defined in the stacks/exporters-tutorial.yml. It contains two services: cadvisor and node-exporter

docker stack deploy \
-c stacks/exporters-tutorial.yml \
Exporter

Wait for all the replicas to start before proceeding.

Deploying Jenkins

The Jenkins image we’ll run already has all the plugins baked in. The administrative user and password will be retrieved from Docker secrets. A job that will scale up and scale down services is also defined inside the image.

echo "admin" | \
docker secret create jenkins-user -echo "admin" | \
docker secret create jenkins-pass -export SLACK_IP=$(ping \
-c 1 soumyadp.slack.com \
| awk -F'[()]' '/PING/{print $2}')docker stack deploy \
-c stacks/jenkins-scale.yml jenkinsdocker stack deploy \
-c stacks/jenkins-scale.yml jenkins

Wait for all replicas to be up and running. Once done, go to this URL:

http://$(docker-machine ip swarm-1)/jenkins/job/service-scale/configure

Note: If you are doing this POC on a ubuntu server with no GUI, you will need to map the port of swarm-1 with that of your server. To do that, run:

VBoxManage modifyvm "swarm-1" --natpf1 "myproxy,tcp,,88,,80"

Where, myproxy is simply the rule name, 88 is the port on your ubuntu server, 80 is the port of the swarm-1 VM

Enter ‘admin’ as username and password for logging into Jenkins.

This concludes the first part of this extensive walkthrough. In this, we set up all the different components and tools we will need to assemble our snappy new auto-scaling orchestrator.

In the second part, we will set up Jenkins, Alert Manager and Prometheus.

The second part is now live here