Auto Scaling services on Swarm — II

6 min readJun 30, 2021

Explanation of the services and triggers

The first half of the job is relatively straightforward. The job should be executed inside a prod agent (short for production). It defines two parameters. One holds the name of the service that should be scaled. The other expected a number of replicas that should be added or removed. If the value is positive, the service will be up-scaled. A negative value means that it should de-scale.

The job defines only one stage called Scale. Inside it is a single step defined inside a script. It executes the docker service inspect command and retrieves the current number of replicas. It also retrieves scaleMin and scaleMax labels to discover the limits that should be applied to scaling. Without them, we would run a risk of scaling to infinity or de-scaling to zero replicas.

The desired number of replicas (newReplicas) is obtained by adding the current number of replicas with the scale parameter.

Once all the variables are set, it evaluates whether scaling would hit thresholds defined with scaleMin and scaleMax. If it would, it throws an error which, later on in the post section, results in a message to Slack. If neither threshold would be reached, a simple docker service scale command is executed.

Open the service-scale activity screen

Now click the Run button. A few moments later, you’ll see that the build failed. Don’t panic. That is expected. It will not fail again for the same reason.

This completes the setup from a Docker perspective.

Creating instrumented service

Now we need to create an instrumented service, which can respond with resp_time metrics such as response time, service name, response code and others as required. The code of the whole service is in a single file main.go. We’re using Go only to demonstrate how instrumentation works. You can implement similar principles in almost any programming language. Hopefully, you should have no problem understanding the logic behind it even if Go is not your programming language of choice.

Now, we deploy the last stack. It will be the service we’re hoping to scale based on response time metrics.

docker stack deploy \
-c stacks/go-demo-instrument-alert-short.yml \
go-demo

Inside the go-demo-instrument-alert-short.yml we just deployed, we have labels defined as:

The servicePath and port label will be used by Docker Flow Proxy to configure itself and start forwarding requests coming to /demo to the go-demo service.
Jenkins uses the scaleMin and scaleMax to decide whether the service should be scale or the number of replicas already reached the limits.
The alertName, alertIf, and alertFor labels are the key to scaling. The define Prometheus alerts. The first one (memlimit) defines that it will create an alert that will be fired if memory usage is over 80% of the memory limit which is set to 10MB. Additionally, we also set the alertFor label tells Prometheus to fire the alert only if the condition persists for more than 5 minutes.
The second (resp_time_above) defines alert that will be fired if the rate of response times of the 0.1 seconds bucket (100 milliseconds or faster) is above 99% (0.99) for over five minutes (5m). Similarly, the resp_time_below alert will fire if the rate of response times of the 0.025 seconds bucket (25 milliseconds or faster) is below 75% (0.75) for over five minutes (5m). In all the cases, we’re using AlertIf Parameter Shortcuts that will be expanded into full Prometheus expressions.

Now, if you go to Prometheus’s dashboard here: http://$(docker-machine ip swarm-1)/monitor/alerts

You should see three alerts that correspond to the three labels define in the main service of the go-demo stack. Docker Flow Swarm Listener detected the new service and sent those labels to Docker Flow Monitor which, in turn, converted them into Prometheus configuration.

Let’s confirm that the go-demo stack is up-and-running

docker stack ps -f desired-state=running go-demo

You should see three replicas of the go-demo_main and one replica of the go-demo_db service. If that’s not the case, please wait a while longer and repeat the docker stack ps command.

Also, confirm that all the targets of the service are indeed registered.

Open “http://$(docker-machine ip swarm-1)/monitor/targets”

You should see two or three targets depending on whether Prometheus already sent the alert to de-scale the service

Automatically Scaling Services

Go back to the Prometheus’ alert screen: http://$(docker-machine ip swarm-1)/monitor/alerts

By this time, the godemo_main_resp_time_below alert should be red. The go-demo service periodically pings itself and the response is faster than the twenty-five milliseconds limit we set. As a result, Prometheus fired the alert to Alertmanager. It evaluated the service and scale labels and decided that it should send a POST request to Jenkins with parameters service=go-demo_main&scale=-1.

We can confirm that the process worked by opening the Jenkins service-scale activity screen: http://$(docker-machine ip swarm-1)/jenkins/blue/organizations/jenkins/service-scale/activity

You should see that the new build was executed and, hopefully, it’s green. If more than ten minutes passed, you might see a third build as well. If that’s the case, we’ll ignore it for now.

Please click the second (green) build followed by a click to the last step with the name Print Message. The output should say that go-demo_main was scaled from 3 to 2 replicas.

Let’s double-check that’s what truly happened.

docker stack ps -f desired-state=running go-demo

As you can see, Prometheus used metrics to deduce that we have more replicas in the system than we really need since they respond very fast. As a result, if fired an alert to Alertmanager which executed a Jenkins build and our service was scaled down from three to two replicas.

If you take a closer look at the Alertmanager configuration, you’ll notice that both the repeat_interval and the group_interval are set to five minutes. If Prometheus continues firing the alert, Alertmanager will repeat the same process ten minutes later.

To check the up-scaling process, We’ll send requests that will result in high response time and observe the behaviour of the system. A simple curl to the URL:

for i in {1..30}; do
DELAY=$[ $RANDOM % 6000 ]
curl “http://$(docker-machine ip swarm-1)/demo/hello?delay=$DELAY”
done

If the service receives the delay parameter, it goes to sleep for the specified number of milliseconds. The above commands sent thirty requests with a random delay between 0 and 6000 milliseconds.

Now we can take a look at the alerts: http://$(docker-machine ip swarm-1)/monitor/alerts

The godemo_main_resp_time_above turned red indicating that the threshold is reached and Prometheus fired an alert to Alertmanager. If everything went as planned, Alertmanager should have sent a request to Jenkins

Confirm that indeed happened: http://$(docker-machine ip swarm-1)/jenkins/blue/organizations/jenkins/service-scale/activity

You should see a new build. Please click it. The last step with the Print Message header should state that go-demo_main was scaled from 2 to 3 replicas.

Also, confirm that the number of replicas indeed scaled to three by taking a look at the stack processes.

To remove the demo cluster and free your resources:

docker-machine rm -f swarm-1 swarm-2 swarm-3

Conclusion

This was a simple (albeit long!) example of a system that automatically scales and de-scales services. We have successfully built our own self-sufficient system that features not only the auto-healing provided with Docker Swarm but also auto-scaling based on scraped metrics, using Prometheus and Jenkins

Auto Scaling services on Swarm — II

Explanation of the services and triggers

Creating instrumented service

Automatically Scaling Services

Conclusion

Written by DevBizOps