Some time ago, Vilmate was involved with a high-load web project. It had been launched back in 2012, and since then, multiple teams had been taking turns working on the web application. Team members were coming and going, and each new squad tried to leverage the latest technology trends as they continued software development.
In the end, we had a project that was rendered part on the back-end, part in Angular, and there also was another part rendered on the back-end in Angular. Some background tasks were executed using broker systems, others using Cron.
The deployment process was not flawless, too. We were inconsistently executing both Fabric and Ansible scripts. Overall, it could take up to two hours and more to complete deployment, including the time spent running tests. As it is often the case with projects like ours, it was way too expensive, and its maintenance cost the client a dear price they had to pay monthly.
So, it was high time to face up to the radical decisions on the existing problems. First things first, so we began with dockerizing the app. It was a significant amount of work to accomplish, and here’s what we’ve managed to do. We created Docker multi-stage builds so that our new multi-stage build pipeline had four stages:
The first basic stage was the Node.js Docker image running scripts like npm install and npm build.
The second stage was inherited from the Docker Nginx image copying files from the first stage and distributing static js files in return.
The third basic stage was the Python Docker image, where all the required dependencies were installed.
The fourth stage was inherited from the third one that copied the code and collected files from the first stage.
Also, when running the same Dockerfile targetting the fourth stage, we ensured the back-end storage support of all needed files and dependencies excluding node_modules and pip/cache. Therefore, this image was smaller in size. It contained only the files that were required to run the back-end and nothing more.
We also took care of the developers and adding the necessary build-time variables (--build-arg), we made it possible for them to run the same environment in development mode more easily.
As a consequence of the innovation introduced, we made it easier to have a simple build environment on a CI server that built Docker images and pushed them to the private registry.
We used Drone CI that, among other things, could cache needed layers. Thus, specifying a new entrypoint or command, we could launch an image and run tests receiving results generated in CI.
What about security?
Having a stable environment and speeding up CI, we used Rancher and Swarm for container management. Rancher allowed us to update containers via CI, connect and use physical nodes, as well as balance requests between nodes from any of the containers.
We set the entry point for which we used HAProxy (High Availability Proxy) working in reverse-proxy mode. It was configured to balance requests between containers. At this level, we also wrote a script tracking requests based on client IP address. It kept track of requests and blocked them in case the frequency exceeded 100 per second, even denying the application’s access requests.
Before HAProxy, we used Elastic Load Balancing (ELB) with a geolocation-based routing policy that, in turn, was using HAProxy to proxy requests. There also was Cloudflare Load Balancer that was responsible for A-records and diagnostic.
Thus, we created an infrastructure closed from the inside. It could be accessed through a number of reverse proxies that provided security at different levels.
All in all, we had several subnets:
- The Rancher network that had its own internal DNS service
- The AWS network that provided an endpoint to Rancher and ELB
- The Cloudflare network that could proxy requests using ELB
According to the structure described above, it was not at all necessary for us to have assigned IP addresses or static physical nodes.
We moved physical nodes from on-demand instances and configured the launch and connection of these nodes to Rancher. As soon as a new node was connected, Rancher independently launched containers on it, balancing the load on containers on the network.
There also was Webhook in Rancher that allowed increasing the number of running containers, the load between which was balanced automatically. Thus, we increased the number of requests that we could process.
If the load needed balancing, we used our webhook that doubled the number of running containers, providing thus the number of workers sufficient to process requests. AWS, in turn, received an increased load from spot requests, raising the quantity and improving the quality of nodes. When the load decreased, the number of containers was going down, too. It helped ease memory and CPU usage, while AWS limited Spot instance requests, thereby, it all provided increased cost efficiencies.
Finally, we stopped using services like ElasticCache, Codedeploy, EC2 On-demand, RDS, Elastic-IP, etc.
At the moment, we are proud to have created an infrastructure that can scale up and down based on the application load. We’ve optimized costs for our client and ensured that the runtime environment was easy for the developers to work with.
© 2020, Vilmate LLC
for monthly digest