forgot password?
by Wolfram Hempel (@wolframhempel)
Fri Jul 27 2018

Scaling webapps for newbs & non-techies

This guide summarizes the basic principles of scaling - from a single server to a web app capable of serving millions of users. It is aimed at newcomers and non-developers working in technical fields - so if you've just deployed your multi-cloud-terraform-vpn-setup, this one is not for you.

For everyone else though: Let's get started!


You've just finished building your website, online store, social network or whatever else it is you are up to, put it online and things are going splendidly: A few hundred visitors pass by your site every day, requests are answered quickly, orders are processed immediately and everything is humming along nicely.

But then something terrible happens: You become successful!

More and more users are flooding in, thousands, tens of thousands, every hour, minute, second... What's fantastic news for your business is bad news for your infrastructure; because now, it needs to scale. That means it needs to be able to:

  • serve more costumers simultaneously
  • be always available without downtimes
  • serve users around the globe

How scaling works

A few years ago, this post would have started with a discussion of "vertical" vs "horizontal" scaling (also called scaling up vs scaling out). In a nutshell, vertical scaling means running the same thing on a more powerful computer whereas horizontal scaling means running many processes in parallel.

Today, almost no one is scaling up / vertically anymore. The reasons are simple:

  • computers get exponentially more expensive the more powerful they are
  • a single computer can only be so fast, putting a hard limit on how far one can scale vertically
  • multi-core CPUs mean that even a single computer is effectively parallel- so why not parallelize from the start?

Alright, horizontal scaling it is! But what are the required steps?

1. A single server + database

This is probably how your backend looks initially. A single application server running your business logic and a database that stores data for the long run. Things are nice and simple, but the only way for this setup to cater for higher demand is to run it on a beefier computer - not good.

2. Adding a Reverse Proxy

An initial step to prepare your architecture for larger scale is to add a "reverse proxy". Think of it as the reception desk in a hotel. Sure, you could just let guests go directly to their rooms - but really, what you want is an intermediary that checks if a guest is allowed to enter, has all her papers in order and is on the way to a room that actually exists. And should a room be closed, you want someone telling the guest in a friendly voice, rather than just having them run into limbo. That's exactly what a reverse proxy does. A proxy, in general, is just a process that receives and forwards requests. Usually, these requests would go from our server out to the internet. This time, however, the request comes from the internet and needs to be routed to our server, so we call it a "reverse proxy".

Such a proxy fulfills a number of tasks:

  • Health Checks make sure that our actual server is still up and running
  • Routing forwards a request to the right endpoint
  • Authentication makes sure that a user is actually permissioned to access the server
  • Firewalling ensure that users only have access to the parts of our network they are allowed to use ... and more

3. Introducing a Load Balancer

Most "Reverse Proxies" have one more trick up their sleeve: they can also act as load balancers. A load balancer is a simple concept: Imagine a hundred users are ready to pay in your online shop in a given minute. Unfortunately, your payment server can only process 50 payments in the same time. The solution? You simply run two payment servers at once.

A load balancer's job is now to split incoming payment requests between these two servers. User one goes left, user two goes right, user three goes left and so on.

And what do you do if five-hundred users want to pay at once? Exactly, you scale to ten payment servers and leave it to the load balancer to distribute the incoming requests between them.

4. Growing your Database

Using a load balancer allows us to split the load between many servers. But can you spot the problem? While we can utilize tens, hundreds or even thousands of servers to process our requests, they all store and retrieve data from the same database.

So can't we just scale the database the same way? Unfortunately not. The problem here is consistency. All parts of our system need to agree on the data they are using. Inconsistent data leads to all sorts of hairy problems - orders being executed multiple times, two payments of $90 being deducted from an account holding $100 and so on... so how do we grow our database while ensuring consistency?

The first thing we can do is split it into multiple parts. One part is exclusively responsible for receiving data and storing it, all other parts are responsible for retrieving the stored data. This solution is sometimes called a Master/Slave setup or Write with Read-replicas. The assumption is that servers read from the database way more often than they write to it. The good thing about this solution is that consistency is guaranteed since data is only ever written to a single instance and flows from there in one direction, from write to read. The downside is that we still only have a single database instance to write to. That's ok for small to medium web projects, but if you run Facebook it won't do. We'll look into further steps to scale our DB in chapter 9.

5. Microservices

Until now, we've dealt with only one server that did everything: handle payments, orders, inventory, serve the website, manage user accounts and so on.

That's not necessarily a bad thing - a single server means lower complexity and thus less headache for our developers. But as scale increases, things start getting complicated and inefficient:

  • different parts of our server are utilized to different extents - for every user login there's probably a few hundred pageviews to handle and assets to serve, yet all is done by the same server
  • our development team grows with our app - but as more developers work on the same server they are more likely to step on each other's toes.
  • Having only a single server means that everything has to be done and dusted whenever we want to take a new version live. This results in dangerous interdependencies whenever one team quickly wants to release an update, but another team is only half done with their work.

The solution to these challenges is an architectural paradigm that has taken the dev world by storm: Microservices. The idea is simple - break your server down into functional units and deploy them as individual, interconnected mini-servers. This has a number of benefits:

  • each service can be scaled individually, enabling us to better adjust to demand
  • development teams can work independently, each being responsible for their own microservice's lifecycle (creation, deployment, updating etc.)
  • each microservice can use its own resources, e.g. its own database further reducing the problem described in 4.)

6. Caching & Content Delivery Networks

What's better than working more efficiently? Not having to work at all! A large portion of our web app consists of static assets - bits that never change such as images, javascript and css files, pre-rendered landing pages for certain products and so on. Rather than recalculating or re-serving these assets on every request, we can use a "cache" - a small store that simply remembers the last result and hands it out to everyone interested without bothering the underlying server.

A cache's bigger brother is called a "Content Delivery Network" or CDN for short - a huge array of caches placed all around the world. This allows us to serve content to our users from a store physically close to them, rather than shipping our data across the globe every time.

7. Message Queues

Have you ever been to an amusement park? Did you just walk up to a ticket counter to buy your ticket? Probably not - chances are you've ended up waiting in a queue. Government institutions, post offices, and amusement park entrances are all great examples of a concept called "sub-capacity parallelism" - yes, they are parallel: multiple ticket booths sell tickets simultaneously - but there never seem to be enough to serve everyone instantly and as a result, queues start to form.

The same concept is used for large web apps. Hundreds of thousands of images are uploaded to Instagram, Facebook and Co every minute, each of which needs to be processed, resized, analyzed and tagged - a time-consuming process. So rather than making the user wait until their upload has gone through all these steps. the server receiving the image only does three things:

  • it stores the raw, unprocessed image
  • it confirms the upload to the user
  • it adds a virtual sticky-note to a large pile, specifying what needs to be done

From here on this note is picked up by any number of other servers, each fulfilling one of its tasks, ticking it off and putting the note back onto the pile - until our todo list is complete. The system managing this pile of notes is called a "message queue". Using such a queue has a number of advantages:

  • it decouples tasks and processors. Sometimes a lot of images need to be processed, sometimes only a few. Sometimes a lot of processors are available, sometimes its just a couple. By simply adding tasks to a backlog rather than processing them directly we ensure that our system stays responsive and no tasks get lost.
  • it allows us to scale on demand. Starting up more processors takes time - so by the time a lot of users tries to upload images, it's already too late. By adding our tasks to a queue we can defer the need to provision additional capacities to process them

Alright, if we followed all the steps above our system is now ready to serve a sizable amount of traffic. But what can we do if we want to go big - really big? Well, there are still a few options left:

8. Sharding, Sharding, Sharding

What's sharding? Alright, take a deep breath. Are you ready? Here it goes:

"Sharding is a technique of parallelizing an application's stacks by separating them into multiple units, each responsible for a certain key or namespace"

...phew. So what exactly does that mean? It's actually quite simple: Need to serve Facebook profiles for 2 billion users? Break your architecture down into e.g. 26 Mini-Facebooks, each serving users with a different letter of the alphabet. Aaron Abrahams? You'll be served by stack A. Zacharias Zuckerberg? Stack Z it is...

Sharding doesn't have to be based on letters but can be based on any number of factors, e.g. location, usage frequency (power-users are routed to the good hardware) and so on. You can shard servers, databases or almost any aspect of your stack this way, depending on your needs.

9. Load-balancing the load-balancer

A single load balancer only gets you so far - and even if you start buying some incredibly powerful (and incredibly expensive) hardware load balancers there is a hard limit to how many requests they can handle.

Fortunately, there is a worldwide, decentralized and incredibly stable layer that can be used to load-balance traffic even before it hits our load balancers. And the best thing? It's absolutely free. This layer is the "Domain Name System" - or DNS for short. The worldwide registry mapping domain names such as "" to IPs such as "". This registry allows us to specify multiple IPs per domain name, each leading to a different load balancer.

Phew, this was a lot to take in. Thanks for holding out with me for so long. I hope there was something useful in this blog post for you. But - if you are working in any IT related field there was probably one question burning on your chest throughout reading this text: "What about cloud services?"

Cloud Computing / Serverless

But what about cloud services? Well - it is 2018 and the cheapest and most efficient solution to a lot of the problems described above is pretty clear:

Don't solve them at all.

Instead, leave it to your cloud provider to give you a system that simply scales to your demands without you having to worry about its intricacies.

Arcentry, for instance, doesn't do any of the things described above (apart from the write/read split for its DB), but simply leaves it to Amazon Web Service's Lambda functions to run its logic - no servers, no hassle.

But the cloud isn't something you simply turn on and all your problems are solved - it comes with its own set of challenges and trade-offs. Stay tuned for the next article in this series to learn more about "the cloud for newbs and non-techies"