Round-robin Algorithm

Round Robin: Round Robin passes each new connection request to the next server in line, eventually distributing connections evenly across the array of machines being load balanced. Round Robin works well in most configurations, but could be better if the equipment that you are load balancing is not roughly equal in processing speed, connection speed, and/or memory.

Round Robin is undoubtedly the most widely used algorithm. It's easy to implement and easy to understand. Here's how it works. Let's say you have 2 servers waiting for requests behind your load balancer. Once the first request arrives, the load balancer will forward that request to the 1st server. When the 2nd request arrives (presumably from a different client), that request will then be forwarded to the 2nd server.

Because the 2nd server is the last in this cluster, the next request (i.e., the 3rd) will be forwarded back to the 1st server, the 4th request back to the 2nd server, and so on, in a cyclical fashion.
Round-robin Algorithm
As you can see, the method is very simple. However, it won't do well in certain scenarios.

For example, what if Server 1 had more CPU, RAM, and other specs compared to Server 2? Server 1 should be able to handle a higher workload than Server 2, right?

Unfortunately, a load balancer running on a round robin algorithm won't be able to treat the two servers accordingly. In spite of the two servers' disproportionate capacities, the load balancer will still distribute requests equally. As a result, Server 2 can get overloaded faster and probably even go down. You wouldn't want that to happen.

The Round Robin algorithm is best for clusters consisting of servers with identical specs

Round-robin load balancing

is one of the simplest methods for distributing client requests across a group of servers. Going down the list of servers in the group, the round-robin load balancer forwards a client request to each server in turn. When it reaches the end of the list, the load balancer loops back and goes down the list again (sends the next request to the first listed server, the one after that to the second server, and so on).

The main benefit of round-robin load balancing is that it is extremely simple to implement. However, it does not always result in the most accurate or efficient distribution of traffic, because many round-robin load balancers assume that all servers are the same: currently up, currently handling the same load, and with the same storage and computing capacity. The following variants to the round-robin algorithm take additional factors into account and can result in better load balancing:

Weighted round robin

A weight is assigned to each server based on criteria chosen by the site administrator; the most commonly used criterion is the server’s traffic-handling capacity. The higher the weight, the larger the proportion of client requests the server receives. If, for example, server A is assigned a weight of 3 and server B a weight of 1, the load balancer forwards 3 requests to server A for each 1 it sends to server B.

This algorithm directs traffic in a circular pattern to each node of a load balancer in succession, with a larger portion of requests being serviced by nodes with a greater weight. This algorithm works well when you have two or more Cloud Servers that are unequal in computing power and available resources. For example, you want the majority of traffic to go to the server that has the most RAM. Or if one of your servers hosts several mission critical applications, you may want to direct the majority of traffic to a different server that hosts fewer applications.

Dynamic round robin

A weight is assigned to each server dynamically, based on real-time data about the server’s current load and idle capacity.

Weighted round robin algorithm

Weighted round robin (WRR) is a network scheduling discipline. Each packet flow or connection has its own packet queue in a network interface controller. It is the simplest approximation of generalized processor sharing (GPS). While GPS serves infinitesimal amounts of data from each nonempty queue, WRR serves a number of packets for each nonempty queue: number=normalized(weight/meanpacketsize).

WRR mechanism (pseudo-code):

// calculate number of packets to be served each round by connections

for each flow f
   f.normalized_weight = f.weight / f.mean_packet_size

min = findSmallestNormalizedWeight

for each flow f
   f.packets_to_be_served = f.normalized_weight / min

// main loop
loop
   for each non-empty flow queue f
      min(f.packets_to_be_served, f.packets_waiting).times do
         servePacket f.getPacket

Deficit round robin algorithm

The DRR scans all non empty queues in sequence. When a non empty queue <math>i</math> is selected, its deficit counter is incremented by its quantum value. Then, the value of the deficit counter is a maximal amount of bytes that can be sent at this turn: if the deficit counter is greater than the packet's size at the head of the queue (HoQ), this packet can be sent and the value of the counter is decremented by the packet size. Then, the size of the next packet is compared to the counter value, etc. Once the queue is empty or the value of the counter is insufficient, the scheduler will skip to the next queue. If the queue is empty, the value of the deficit counter is reset to 0.

Variables and Constants
   const integer N             // Nb of queues
   const integer Q[1..N]       // Per queue quantum 
   integer DC[1..N]            // Per queue deficit counter
   queue queue[1..N]           // The queues   

Scheduling Loop
while (true)
    for i in 1..N       
        if not queue[i].empty()
            DC[i]:= DC[i] + Q[i]
            while( not queue[i].empty() and
                         DC[i] >= queue[i].head().size() )
                DC[i]:= DC[i] - queue[i].head().size()
                send( queue[i].head() )
                queue[i].dequeue()
            end while 
            if queue[i].empty()
                DC[i]:= 0
            end if
        end if
    end for
end while