Hello again, yeah, years without showing up, and I’m here again to present another topic, just because of AI. Yeah, we need to trust the fundamentals because we’ll maintain all the infrastructure and code created by an AI (which uses our disorganized code in thousands of repos on GH).
So there is no one to blame, because we are the ones who dig our own grave when we trust and train AI. At least we still have some critical thinking grounded in fundamental knowledge, as I’ll present today.
Our old friend Load Balancer
Load balancers are older than you might think and are a very important component of system design, as they help distribute requests and traffic across multiple servers. The goal of LB is to ensure availability, reliability, and prevent a single server from becoming overloaded, thereby reducing downtime.
How LB works?
- The LB received a request from a client
- The LB evaluates the request and chooses which server should receive it. This is done using a predefined algorithm that takes into account factors such as server capacity, response time, number of active connections, and geographic localtion.
- The LB forwards the received traffic to the selected server.
- The server processes the request and sends a response back to the LB.
- The LB received the response from the server and sends it to the client that made the request.

Basically a traffic cop who never takes a coffee break
Key concepts
People often ask this in interviews, so make a note of it.
- Load Balancer: A device or software that distributes network traffic across multiple services based on predefined rules or algorithms.
- Backend Servers: The servers that receive and process requests forwarded by the LB. Also known as a server pool or server farm.
- Load Balacing Algorithm: The method used by the load balancer to determine how to distribute incoming traffic among the backend servers.
- Health Checks: Periodic tests performed by LB to determine the availability and performance of backend servers. Servers with problems are removed from the server pool until they are recovered.
- Session Persistence: A technique used to ensure that subsequent requests from the same client are directed to the same backend server, maintaining session state and providing a consistent user experience.
- SSL/TLS Termination: The process of decrypting SSL/TLS-encrypted traffic at the load balancer level, offloading the decryption burden from backend servers and enabling centralized SSL/TLS management.
LB Algorithms
The primary goal of an LB algorithm is to ensure efficient use of available resources, improve overall system performance, and maintain high availability and reliability. So, choose your dish based on the flavor of the context you are working in.
1. Round Robin
It simply assigns a request to the first server, then moves on to the second, third, and so on, and after reaching the last server, he starts again at the first.
Pros
- Easy to implement and understand
- Equal distribution of requests, each one gets a turn in a fixed order
- Works well in same capacities servers
Cons
- No Load Awerenes: Since all servers are treated equally, regardless of their status, the current load or capacity of each server is not taken into account.
- No Session Afinity: Subsequent requests from the same client may be redirected to different servers, which is very problematic for persistence in stateful applications.
- Performance issues: If many servers have different properties, they may not function optimally.
- Predictable Distribution Pattern:: It can potentially be exploited by attackers who observe traffic patterns and can find vulnerabilities in specific servers by predicting which server will handle requests.
Use Cases
- Suitable for environments where all servers have similar capacity and performance.
- Works well for stateless applications

Round Robin works like a queue: each server gets its turn, no favoritism.
2. Least Connections
It assigns requests to the server with the fewest active connections at the time of the request. This ensures a more balanced distribution of load across servers, especially in environments where traffic is unpredictable and request processing times vary.
Pros
- Efficient for different server configurations
- Better utilization of the servers: As it takes into accontablity the current load on each server.
- Dynamic Distribution: Adapts to changing traffic patterns and server loads, ensuring no single server becomes a bottleneck.
Cons
- More complex: Compared to simpler algorithms, such as Round Robin, as it requires real-time monitoring of the active connection.
- State overhead: As it requires maintaining the state of active connections
- Connection spikes: In contexts where the connection duration is short, servers may experience rapid spikes in the number of connections, leading to frequent rebalancing
User Cases
- Suitable for servers with different capacities and workloads, requiring dynamic load distribution
- Works well for apps with unpredictable traffic, ensuring that no server becomes overloaded
- Very effective for apps where maintaining session state is essential, as it helps distribute active sessions more evenly.
While Round Robin doesn’t consider the current load and distributes requests in a fixed cyclic order, Least Connections distributes requests based on the current load, directing new requests to the server with the fewest active connections.

The least busy server gets the next request.
3. Weighted Round Robin
WRR is an enhanced version of Round Robin. It assigns weights to each server based on its capacity or performance, distributing incoming requests proportionally according to these weights. This ensures that more powerful servers handle a larger share of the load, while less powerful servers handle a smaller share.
Pros
- Better use of resources, as high-capacity servers process more requests
- Easily adjustable to accommodate changes or additions of new servers.
- Optimize overall system performance by avoiding overloading less powerful servers.
Cons
- Set appropriate weights for each server can be challenging and requires accurate performance metrics.
- It doesn’t consider real-time server load.
User Cases
- Ideal for environments with different processing capacities, ensuring efficient use of resources
- Suitable for web apps where different servers have varying performance properties
- Useful in database clusters where some nodes have greater processing power and can handle more queries.

Servers with more power receive more requests in the rotation.
4. Weighted Least Connections
It is a smart way to share work between servers, and looks at thw things:
- Which server is less busy right now
- Which server is stronger Big, strong server get more work, while, busy servers get a break. It’s like a manager giving more tasks to seniors, but only if they’re not overloaded.
Pros
- Dynamic real-time load balancing on each server, ensuring balanced distribution of requests
- For better resource utilization, takes into account the capacity of each server
- Flexibility to handle servers with different configurations
Cons
- More complex compared to Round Robin and Least Connections
- Requires the LB to track both active connections and server weights, as well as requiring accurate performance metrics.
User Cases
- Ideal for environments where servers have different capacities and workloads
- Suitable for high-traffic applications
- Also useful for database clusters

Strong servers do more work, unless they are already busy.
5. IP Hash
It decides on the server using the client’s IP address. The load balancer does some calculations with the IP and says:
“Ah, you belong to this server,” and the IP always goes to the same server.
This means that the server always remembers the client, no drama.
A simple example would be:
You’ve three servers (A, B, and C), and the client has the IP address 192.168.1.10.
The load balancer converts this IP address into a number, and if the resolution is 2, it sends the request to server C.
The next time this client returns, it will be the same IP address and the same server.
Pros
- Session persistence: Same IP > Same service
- Easy to use, no need to track connection.
- Idempotent, as the result is always the same.
Cons
- If many users have similar IPs, one server get stressed, while others relax.
- Add or remove a server means that some users suddenly go to a different server.
- It doesn’t care about if the server is tired or overloaded, just folows the IP.
User Cases
- Statefull apps, like shop carts, and logged-in user sessions.
- Clients in different regions with consistent routing.

The same client IP always goes to the same server.
6. Least Response Time
It sends the request to the fastest server at the moment, not the most powerful or least busy one.
How it works
- The load balancer checks the response speed of each server.
- A new request arrives and is forwarded to the server with the shortest response time.
- If a server slows down, it receives less traffic. If it speeds up again, it receives more.
Pros
- Requests go to the fastest server > users are satisfied.
- Reacts automatically when servers slow down or speed up.
- Fast servers work harder, slow servers rest.
Cons
- More complex, as it requires monitoring and metrics, not just simple functions.
- Measuring response time affects performance.
- Small network issues can cause a server to appear slow for a moment, and traffic can fluctuate too much.
User Cases
- Real-time apps such as games, streaming, and trading platforms.
- APIs and web services, when fast response times are more important than session memory.
- Great when server performance fluctuates throughout the day.

Requests go to the server that responds the fastest.
7. Random
Yes, it’s exactly what it sounds like: the LB chooses a server at random.
How it works
- You have servers A, B, and C.
- A request comes in.
- The LB randomly chooses a server. Over time, if luck is fair, each server will receive approximately the same number of requests.
Pros
- Very simple, easy to understand, and easy to configure.
- Doesn’t track load, speed, or connection, which means less overhead.
- Good with randomness, traffic spreads out over time.
Cons
- Not intelligent, does not know if a server is slow or overloaded.
- A server may receive many requests in a row.
- The same user may access different servers. (terrible for login sessions).
- Random traffic makes attack patterns (such as DDoS) more difficult to detect.
Use cases
- All servers are similar
- Each request is independent (stateless), with no memory requirements
- Simple systems when you want something fast and don’t need sophisticated logic

The server is chosen randomly for each request.
8. Least Bandwidth
Sends traffic to the server that is currently using the least network data.
- If a server is busy downloading files, it gets a break.
- If a server has free internet, it gets the next request.
How it works
The LB checks how much bandwidth each server is using.
- A new request arrives.
- The request goes to the server that is using the least bandwidth. This keeps network traffic balanced.
Pros
- Dynamic and intelligent: adjusts in real time based on network usage
- Prevents network overload: no server is ever overloaded with too much data
- Better use of resources: all servers share the same load more evenly.
Cons
- Requires constant monitoring of bandwidth
- Bandwidth measurement consumes some resources
- Small bandwidth spikes can cause traffic fluctuations
Use cases
- High-bandwidth applications such as streaming, file downloads, and large data transfers
- CDNs: fast content delivery without network bottlenecks
- Real-time systems, where low latency and smooth traffic are really im

Traffic is routed to the server with the lowest bandwidth usage.
9. Custom load
Instead of using a fixed rule, you tell the LB what “busy” really means for your application.
Think of it as a personalized meal plan: not everyone eats the same food.
How it works
- Choose what to monitor, for example, CPU, memory, disk, or specific application numbers.
- A monitoring system constantly checks these metrics.
- Create your own rules, such as sending less traffic if the CPU is high or sending more traffic if memory is free.
- Real-time adjustment: traffic automatically changes based on your rules.
Pros
- Works exactly the way you want it to.
- Better use of resources by using many signals, not just one.
- Great for complex and constantly changing systems.
Cons
- Complex: more logic = more things to manage
- Monitoring cost: monitoring many metrics consumes resources
- Easy to get wrong: bad rules = bad traffic decisions.
Use cases
- Complex applications with different behaviors and bottlenecks
- Highly dynamic systems where load changes quickly and frequently
- Special needs where standard algorithms are not sufficient.
Yeah, that’s part 1 of 3, I know that maybe nobody is reading this, but tbh, that’s ok. At the end of the day I’m improving my writing and sharpening know knowledge of a fundamental topic, and that already makes it worth it.
I appreciate your time reading this boring article, and just remember, being good in something isn’t a spring but a marathon that you have to work on every day.
(I probably saying it to myself… but mayber it helps you too.)
See ya!
