10 April 2014 / hazelcast

nginx - Stateless Loadbalancer - Balance your load on nginx without proxying the request

Today I want to show how you can use nginx to build your own stateless loadbalancer which just redirects your requests to random servers. It does neither support sticky sessions nor does it proxy your request. It will redirect (HTTP 302) your original request to the random location.

This is pratical for a lot of different use cases where you either want to just distribute static content as you would otherwise do with Content Delivery Networks (CDN) or if you have a cluster of servers that is capable of clustering your HTTP sessions (for example using Hazelcast. It also works amazingly good for PHP (or node.js) based mini games that always read data from the database.

nginx is a small non-blocking, event-driven webserver which handles 10k+ connection with no problem. The original internal loadbalancing option is used for proxying the request through nginx to the backend endpoint. If you have dynamic servers or you need sticky sessions that might be what you want. If you have just static content or your backend servers can handly stateless sessions you might not want to proxy the requests since all connections would need to be held open until the backend request is processed.

We will create a small Perl script that supports a simple healthcheck (connects to the remote socket) and selects a random server on request. If the randomly selected server is not available we retry up to 10 times. Eventually we either have found an active server or we return an HTTP status 503 that the service is currently not available.
Also this example is based on Ubuntu but installation on other Linux derivates should be kind of similar and differs only in installation of nginx.

So how do we set it up?

1. Install nginx stable on Ubuntu:
The nginx versions of Ubuntu are sometimes a bit old or do not support all required functionality so we're going to setup the PPA for nginx first as described here.

$ sudo add-apt-repository ppa:nginx/stable
$ sudo apt-get update
$ sudo apt-get install nginx-extras libio-socket-ssl-perl curl

2. Create the perl load balancer script:

$ sudo nano /etc/nginx/loadbalancer.pm

Let's start with the header of the perl file.

package loadbalancer;
 
use nginx;
use IO::Socket;
 
## Available servers
%servers = (
  "cdn1.example.com" => 0,
  "cdn2.example.com" => 0
);

We set a package name and import some external libs, additionally we setup our servers we want to load balance on. In the given example we setup 2 servers but you can set as many as you want to. The initial 0 means 'server is unavailable' but this will change automatically if the server is reachable by the later healthcheck. The full script can be downloaded here: loadbalancer.pm

## Request load balancer
sub load_balance {
  # Initialize retry counter
  my $retry = 0;
 
  while($retry < 10) {
    # Get a random number
    my $rand = int(rand(1000000));
     
    # Get keys from the map
    my @keys = keys(%servers);
 
    # Calculate index based on
    # random number and selected server
    my $index = $rand % scalar(@keys);
    my $selected_server = @keys[$index];
 
    # If server is activated by healthcheck
    # we can return it to nginx
    my $active = $servers{$selected_server};
    if ($active) {
      return "http://".@keys[$index];
    }
 
    # Retry with another one
    $retry++;
  }
  # No server seems available
  return "No Server Available";
}

This subroutine is used to select a server from the previously created map. We create a random value and use the modulo operator to retrieve the index inside the array. If the server is not available we just retry a few times. If you want to you could add some sleep value to maybe update the servers variables using the healthcheck, should be no problem to add this.

## Connects to the given servers one by one
## and checks availability
sub healthcheck {
  # Update variable
  my %update;
   
  # Loop through servers
  foreach $server (keys %servers) {
    $key = $server;
   
    # Select port, defaults to 80
    my $port = 80;
    if (index($server, ':') != -1) {
      @tokens = split(/:/, $server);
      $server = @tokens[0];
      $port = int(@tokens[1]);
    }
 
    # Connect to server
    my $socket = $socket = IO::Socket::INET->new(
        PeerAddr => $server,
        PeerPort => $port,
        Timeout => 5
    );
     
    # Is server connectable
    if (defined $socket) {
      $update{$key} = 1;
      $socket->close();
    } else {
      $update{$key} = 0;
    }
  }
   
  # Update servers variable with availabilities
  %servers = %update;
  return OK;
}

This subroutine now creates the healthcheck itself. It iterators through all values inside the servers map and tries to connect to the given address. The servers address can either be the host itself (then port 80 is assumed) or you give a full hostname:port for a special portnumber. Since the socket is only connected but no data is read this will work for HTTP and HTTPS.
No finialize the file so that nginx is happy for startup and we immediately execute our first healthcheck on restart.

# Initiate immediate health check on server start
healthcheck();
 
1;
__END__

3. Configure nginx:

We open the default site configuration and configure our location endpoints.

$ sudo nano /etc/nginx/sites-enabled/default

Delete the complete content and paste in the below configuration

perl_require /etc/nginx/loadbalancer.pm;
perl_set $redirectSite loadbalancer::load_balance;

server {
  listen   80 default;
  server_name  cdn.example.com;

  access_log  /var/log/nginx/localhost.access.log;

  location / {
    if ($redirectSite = "No Server Available") {
      return 503;
    }

    rewrite ^(.*)$ $redirectSite$1? redirect;
  }

  location /healthcheck {
    allow 127.0.0.1;
	deny all;
    default_type text/plain;
    perl loadbalancer::healthcheck;
    echo Thanks;
  }
}

The first two lines are embedding our loadbalancer.pm Perl module into the configuration and execute the load_balance subroutine for every request. The result is written to the $redicteSite variable for later use.
We configure our default site to listen to port 80 on whatever domain is configured for this host and set a hostname for the HTTP headers. Additionally we setup an access log (in the expected location) and create the default location "/".
The $redirectSite variable is now tested for the string that tells us that no server is available to return to the user and if this is the case we return a HTTP 503 status. Otherwise we use the URL inside the variable for our HTTP status 302 redirect.

The location "/healthcheck" is available only on localhost and executes the healthcheck function. We will setup a cronjob in a second to periodically call this subfunction.

$ sudo service nginx restart

4. Test server and configure cronjob:

First we'll test if our load balancer works as expected, therefor we request a healthcheck and a url redirect.

$ curl http://127.0.0.1/healthcheck
Thanks
$ curl http://127.0.0.1
<html>
<head><title>302 Found</title></head>
<body bgcolor="white">
<center><h1>302 Found</h1></center>
<hr><center>nginx/1.4.1</center>
</body>
</html>

If something similar to the above happens you're fine, otherwise start again at point 1 ;-)
Now we want to configure a cronjob to periodically call the healthcheck to stay up to date on server problems. We use a one minute healthcheck, if you have higher contraints on serverlost you might want to get a lower healthcheck using another scheduler.

$ sudo crontab -e
* * * * * curl http://127.0.0.1/healthcheck

Exit the editor and save the new configuration, it will automatically be installed. For higher security reasons you might want to install the healthcheck job on another user than root. Make sure your healthcheck works as expected.

$ tail -f /var/log/nginx/localhost.access.log

If a request to the healthcheck comes in on regular basis everything should be fine to now test your load balancing on a real browser. The more servers you add as possible backends the better will be your load balancing. Next to the here shown version with a randomly selected server you also could use a counter value that is always incremented to work in some round-robin alike way. The given basic script should make it easy to add more ways of balance your load and I would be happy to see some more additions in the comments below.

PS: I'm not a Perl programmer so I'm almost sure the code above can be achieved easier or be prettified. I open for suggestions :)

nginx - Stateless Loadbalancer - Balance your load on nginx without proxying the request

snowcast - like christmas in the distributed Hazelcast world

Hazelcast MapReduce on GPU (APRIL'S FOOL!)