Let's talk about Load balacers (Part 1)

Hello again, yeah, years without showing up, and I’m here again to present another topic, just because of AI. Yeah, we need to trust the fundamentals because we’ll maintain all the infrastructure and code created by an AI (which uses our disorganized code in thousands of repos on GH).

So there is no one to blame, because we are the ones who dig our own grave when we trust and train AI. At least we still have some critical thinking grounded in fundamental knowledge, as I’ll present today.

Our old friend Load Balancer

Load balancers are older than you might think and are a very important component of system design, as they help distribute requests and traffic across multiple servers. The goal of LB is to ensure availability, reliability, and prevent a single server from becoming overloaded, thereby reducing downtime.

How LB works?

The LB received a request from a client
The LB evaluates the request and chooses which server should receive it. This is done using a predefined algorithm that takes into account factors such as server capacity, response time, number of active connections, and geographic localtion.
The LB forwards the received traffic to the selected server.
The server processes the request and sends a response back to the LB.
The LB received the response from the server and sends it to the client that made the request.

Load Balancer workflow diagram
Basically a traffic cop who never takes a coffee break

Key concepts

People often ask this in interviews, so make a note of it.

Load Balancer: A device or software that distributes network traffic across multiple services based on predefined rules or algorithms.
Backend Servers: The servers that receive and process requests forwarded by the LB. Also known as a server pool or server farm.
Load Balacing Algorithm: The method used by the load balancer to determine how to distribute incoming traffic among the backend servers.
Health Checks: Periodic tests performed by LB to determine the availability and performance of backend servers. Servers with problems are removed from the server pool until they are recovered.
Session Persistence: A technique used to ensure that subsequent requests from the same client are directed to the same backend server, maintaining session state and providing a consistent user experience.
SSL/TLS Termination: The process of decrypting SSL/TLS-encrypted traffic at the load balancer level, offloading the decryption burden from backend servers and enabling centralized SSL/TLS management.

LB Algorithms

The primary goal of an LB algorithm is to ensure efficient use of available resources, improve overall system performance, and maintain high availability and reliability. So, choose your dish based on the flavor of the context you are working in.

1. Round Robin

It simply assigns a request to the first server, then moves on to the second, third, and so on, and after reaching the last server, he starts again at the first.

Pros

Easy to implement and understand
Equal distribution of requests, each one gets a turn in a fixed order
Works well in same capacities servers

Cons

No Load Awerenes: Since all servers are treated equally, regardless of their status, the current load or capacity of each server is not taken into account.
No Session Afinity: Subsequent requests from the same client may be redirected to different servers, which is very problematic for persistence in stateful applications.
Performance issues: If many servers have different properties, they may not function optimally.
Predictable Distribution Pattern:: It can potentially be exploited by attackers who observe traffic patterns and can find vulnerabilities in specific servers by predicting which server will handle requests.

Use Cases

Suitable for environments where all servers have similar capacity and performance.
Works well for stateless applications

Round Robin workflow diagram
Round Robin works like a queue: each server gets its turn, no favoritism.

2. Least Connections

It assigns requests to the server with the fewest active connections at the time of the request. This ensures a more balanced distribution of load across servers, especially in environments where traffic is unpredictable and request processing times vary.

Pros

Efficient for different server configurations
Better utilization of the servers: As it takes into accontablity the current load on each server.
Dynamic Distribution: Adapts to changing traffic patterns and server loads, ensuring no single server becomes a bottleneck.

Cons

More complex: Compared to simpler algorithms, such as Round Robin, as it requires real-time monitoring of the active connection.
State overhead: As it requires maintaining the state of active connections
Connection spikes: In contexts where the connection duration is short, servers may experience rapid spikes in the number of connections, leading to frequent rebalancing

User Cases

Suitable for servers with different capacities and workloads, requiring dynamic load distribution
Works well for apps with unpredictable traffic, ensuring that no server becomes overloaded
Very effective for apps where maintaining session state is essential, as it helps distribute active sessions more evenly.

While Round Robin doesn’t consider the current load and distributes requests in a fixed cyclic order, Least Connections distributes requests based on the current load, directing new requests to the server with the fewest active connections.

Least Connections workflow diagram
The least busy server gets the next request.

3. Weighted Round Robin

WRR is an enhanced version of Round Robin. It assigns weights to each server based on its capacity or performance, distributing incoming requests proportionally according to these weights. This ensures that more powerful servers handle a larger share of the load, while less powerful servers handle a smaller share.

Pros

Better use of resources, as high-capacity servers process more requests
Easily adjustable to accommodate changes or additions of new servers.
Optimize overall system performance by avoiding overloading less powerful servers.

Cons

Set appropriate weights for each server can be challenging and requires accurate performance metrics.
It doesn’t consider real-time server load.

User Cases

Ideal for environments with different processing capacities, ensuring efficient use of resources
Suitable for web apps where different servers have varying performance properties
Useful in database clusters where some nodes have greater processing power and can handle more queries.

Weighted Round Robin workflow diagram
Servers with more power receive more requests in the rotation.

4. Weighted Least Connections

It is a smart way to share work between servers, and looks at thw things:

Which server is less busy right now
Which server is stronger Big, strong server get more work, while, busy servers get a break. It’s like a manager giving more tasks to seniors, but only if they’re not overloaded.

Pros

Dynamic real-time load balancing on each server, ensuring balanced distribution of requests
For better resource utilization, takes into account the capacity of each server
Flexibility to handle servers with different configurations

Cons

More complex compared to Round Robin and Least Connections
Requires the LB to track both active connections and server weights, as well as requiring accurate performance metrics.

User Cases

Ideal for environments where servers have different capacities and workloads
Suitable for high-traffic applications
Also useful for database clusters

Weighted Least Connections workflow diagram
Strong servers do more work, unless they are already busy.

5. IP Hash

It decides on the server using the client’s IP address. The load balancer does some calculations with the IP and says:

“Ah, you belong to this server,” and the IP always goes to the same server.

This means that the server always remembers the client, no drama.

A simple example would be:

You’ve three servers (A, B, and C), and the client has the IP address 192.168.1.10.

The load balancer converts this IP address into a number, and if the resolution is 2, it sends the request to server C.

The next time this client returns, it will be the same IP address and the same server.

Pros

Session persistence: Same IP > Same service
Easy to use, no need to track connection.
Idempotent, as the result is always the same.

Cons

If many users have similar IPs, one server get stressed, while others relax.
Add or remove a server means that some users suddenly go to a different server.
It doesn’t care about if the server is tired or overloaded, just folows the IP.

User Cases

Statefull apps, like shop carts, and logged-in user sessions.
Clients in different regions with consistent routing.

IP Hash workflow diagram
The same client IP always goes to the same server.

6. Least Response Time

It sends the request to the fastest server at the moment, not the most powerful or least busy one.

How it works

The load balancer checks the response speed of each server.
A new request arrives and is forwarded to the server with the shortest response time.
If a server slows down, it receives less traffic. If it speeds up again, it receives more.

Pros

Requests go to the fastest server > users are satisfied.
Reacts automatically when servers slow down or speed up.
Fast servers work harder, slow servers rest.

Cons

More complex, as it requires monitoring and metrics, not just simple functions.
Measuring response time affects performance.
Small network issues can cause a server to appear slow for a moment, and traffic can fluctuate too much.

User Cases

Real-time apps such as games, streaming, and trading platforms.
APIs and web services, when fast response times are more important than session memory.
Great when server performance fluctuates throughout the day.

Least Response Time workflow diagram
Requests go to the server that responds the fastest.

7. Random

Yes, it’s exactly what it sounds like: the LB chooses a server at random.

How it works

You have servers A, B, and C.
A request comes in.
The LB randomly chooses a server. Over time, if luck is fair, each server will receive approximately the same number of requests.

Pros

Very simple, easy to understand, and easy to configure.
Doesn’t track load, speed, or connection, which means less overhead.
Good with randomness, traffic spreads out over time.

Cons

Not intelligent, does not know if a server is slow or overloaded.
A server may receive many requests in a row.
The same user may access different servers. (terrible for login sessions).
Random traffic makes attack patterns (such as DDoS) more difficult to detect.

Use cases

All servers are similar
Each request is independent (stateless), with no memory requirements
Simple systems when you want something fast and don’t need sophisticated logic

Random workflow diagram
The server is chosen randomly for each request.

8. Least Bandwidth

Sends traffic to the server that is currently using the least network data.

If a server is busy downloading files, it gets a break.
If a server has free internet, it gets the next request.

How it works

The LB checks how much bandwidth each server is using.

A new request arrives.
The request goes to the server that is using the least bandwidth. This keeps network traffic balanced.

Pros

Dynamic and intelligent: adjusts in real time based on network usage
Prevents network overload: no server is ever overloaded with too much data
Better use of resources: all servers share the same load more evenly.

Cons

Requires constant monitoring of bandwidth
Bandwidth measurement consumes some resources
Small bandwidth spikes can cause traffic fluctuations

Use cases

High-bandwidth applications such as streaming, file downloads, and large data transfers
CDNs: fast content delivery without network bottlenecks
Real-time systems, where low latency and smooth traffic are really im

Least Bandwidht workflow diagram
Traffic is routed to the server with the lowest bandwidth usage.

9. Custom load

Instead of using a fixed rule, you tell the LB what “busy” really means for your application.

Think of it as a personalized meal plan: not everyone eats the same food.

How it works

Choose what to monitor, for example, CPU, memory, disk, or specific application numbers.
A monitoring system constantly checks these metrics.
Create your own rules, such as sending less traffic if the CPU is high or sending more traffic if memory is free.
Real-time adjustment: traffic automatically changes based on your rules.

Pros

Works exactly the way you want it to.
Better use of resources by using many signals, not just one.
Great for complex and constantly changing systems.

Cons

Complex: more logic = more things to manage
Monitoring cost: monitoring many metrics consumes resources
Easy to get wrong: bad rules = bad traffic decisions.

Use cases

Complex applications with different behaviors and bottlenecks
Highly dynamic systems where load changes quickly and frequently
Special needs where standard algorithms are not sufficient.

Yeah, that’s part 1 of 3, I know that maybe nobody is reading this, but tbh, that’s ok. At the end of the day I’m improving my writing and sharpening know knowledge of a fundamental topic, and that already makes it worth it.

I appreciate your time reading this boring article, and just remember, being good in something isn’t a spring but a marathon that you have to work on every day.

(I probably saying it to myself… but mayber it helps you too.)

See ya!

Well-architected framework (WAF) on Azure