Erlang n-squared gen_server

One of the nice things about Erlang is it’s amazing performance. However, in certain circumstances its performance is surprisingly poor.

As I pointed out in my previous post, work is done in an Erlang application by Erlang processes and each process has an input queue of messages.

The standard way of using Erlang is with OTP. OTP is an acronym that stands for Open Telecom Platform. OTP is a framework and set of support libraries that handle general problems like spawning supervised processes and building whole applications with them as well as a wide variety of general purpose support modules.

Typically processes in an OTP application that do most of the work will implement the gen_server behavior. In general, a gen_server process will wait for a message to arrive in its message queue and then process it before waiting for another. If in between beginning to handle a message and finishing, two or more messages arrive in the process message queue, the process will immediately begin processing the next one at the front of the queue.

Pulling the next message from the front of the queue is very fast. More importantly, it’s an O(1) time complexity operation with respect to the number of messages waiting to be processed. It’s done with syntax that looks like this:

    Message -> handle_message(Message)

In this example, the first message in the queue will be removed from the queue and then bound to the variable Message and then it will be passed as a parameter to the function handle_message/1.

However, Erlang has a facility that allows messages to be processed out of order. This is done using an Erlang mechanism called selective receive. Selective receive will scan the message queue of a process from the front to the back looking for a message matching a particular pattern. If no match is found, a selective receive will wait for a matching message to be added to the end. Selective receive is done using syntax like this:

    some_message -> got_the_message_i_was_looking_for()

In this code snippet, the message queue will be scanned for a message matching the single atom some_message, waiting for it if necessary, and when found, it will be removed from the queue and the function got_the_message_i_was_looking_for/0 will be called.

The problem here is that while pulling a message from the front of the queue has O(1) time complexity, using selective receive has O(N) time complexity with respect to the number of messages waiting in the queue to be processed. This is fine if your gen_server process doesn’t handle a lot of messages or else it doesn’t make use of selective receive. However, if your process is a high throughput server process the use of selective receive can be a big problem.

The most common Erlang facility that would make use selective receive would be calling gen_server:call/2 or gen_server:call/3. These functions make synchronous calls to other gen_server processes. The way this is implemented is by sending a message to the other process (which is normally always asynchronous) and then waiting for a particular pattern of response message using selective receive. Regardless of the time complexity of selective receive, it’s generally not advisable to wait synchronously for work to be done in a high throughput server process, so this usually isn’t an issue.

The real problem is typically more subtle. This is because a subset of OTP library calls are implemented in terms of selective receive. For example, the OTP library function for sending a packet to a UDP port is implemented by sending a message with the payload to an Erlang port followed by a selective receive to get the result. If, for example, your application sends UDP packets to localhost port 514 to log messages to syslog, you might assume that you could do that directly from a high throughput server process. If you were to do that, your application will probably work fine most of the time. However, if your application ever has a spike in workload or a stall in processing that causes your high throughput server process to get a bit behind, it may take a long time to catch up. If an Erlang process were to have N messages in its message queue and the processing of each message required sending a UDP packet then the O(N) nature of selective receive means that processing N messages has O(N^2) time complexity.

If an Erlang process is continuing to receive more messages at a constant rate, it’s possible for it to get far enough behind that it takes more time to process a message than the average time between messages arriving. In this case, it will never get caught up. Since each message in a processes’ message queue takes memory, the accumulation of messages in the processes message queue will cause the Erlang VM to eventually use all available RAM followed by increasingly severe swapping.

Here’s a simple demonstration of the problem. The following module definition will send N messages to itself, handle each by sending a UDP packet, and print the length of time it takes to drain its message queue.



start(N) ->
    {ok, Socket} = gen_udp:open(0),
    Seq = lists:seq(1, N),
    lists:foreach(fun(_I) -> self() ! message end, Seq),
    Start = erlang:now(),
    lists:foreach(fun(_I) ->
            receive _Message -> gen_udp:send(Socket, {127, 0, 0, 1}, 1234, <<"message">>) end
        end, Seq),
    End = erlang:now(),
    io:format("processed ~p messages in ~p seconds~n", [N, time_diff(Start, End)]).

time_diff({StartMega, StartSecs, StartMicro}, {EndMega, EndSecs, EndMicro}) ->
    ((EndMega - StartMega) * 1000000.0) + (EndSecs - StartSecs) + ((EndMicro - StartMicro) / 1000000.0).

Now calling start/1 with increasing values of N will illustrate the quadratic relationship between N and the time it takes to send N UDP packets. If the relationship were linear, doubling N should roughly double the time. Instead, as the following output shows, doubling N roughly multiplies the time by 4 which is exactly what would be expected if the relationship were quadratic.

$ erl
Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
1> time_udp:start(10000).
processed 10000 messages in 0.343503 seconds
2> time_udp:start(20000).
processed 20000 messages in 1.222438 seconds
3> time_udp:start(40000).
processed 40000 messages in 4.574205 seconds
4> time_udp:start(80000).
processed 80000 messages in 17.498623 seconds

If only the implementation of sending a message to a UDP socket were implemented with an Erlang built-in function rather than a selective receive, this would not be a problem. Without writing your own native Erlang module, there is no way to avoid n-squared time complexity when sending UDP packets while using gen_server.*

So how do you write an Erlang application that doesn’t suffer from this problem and still use OTP? The answer is gen_server2. gen_server2 is actually a fork of the OTP gen_server module but with some significant modifications. The objective of the modifications is to minimize the time spent scanning the message queue when doing a selective receive. It accomplishes this by first draining the message queue into local memory before handling the first message in the queue the same way that gen_server would have done. By doing this, the cost of scanning the message queue when doing a selective receive is limited to any messages that have arrived in the message queue since handling of the current message started.

While gen_server2 can solve the n-squared problem caused by a large message backlog in the gen_server processes you’ve written for your application, it will not eliminate the problem entirely for any OTP application. The reason is that the same problem exists in all supervisor processes using the OTP supervisor behavior. For very busy supervisors, the problem can be severe since the protocol for a supervisor starting a new child process involves a selective receive in the supervisor to get the result of the child processes initialization. Additionally, the Erlang VM will automatically start a number of support processes that are implemented using gen_server.

One such system process that is easy to overload is the error_logger process. Generally applications don’t block waiting for the error_logger process to do work so an accumulation of messages in the error_logger process just causes increased memory and CPU utilization until the it catches up (assuming it does so).

You might think that if your application doesn’t send UDP packets, it’s safe from this problem (other than the supervisor and error_logger) so you don’t need gen_server2. While I’ve tracked down this problem for certain to exist within the gen_udp implementation, I strongly suspect that the same issue exists within other OTP library calls, though I’ve not specifically identified any. Since gen_server2 behaves identically to gen_server under normal circumstances and strictly better than gen_server under abnormal but not necessarily unusual circumstances, I strongly recommend using gen_server2 rather than gen_server in your Erlang applications.

* It is possible to not use the gen_udp module to send UDP messages and handle the result message asynchronously, avoiding the selective receive performed in the gen_udp implementation. However, doing so would eliminate the encapsulation within the gen_udp implementation of the format of the result message. It’s possible to do this, but not necessarily a good idea.

Leave a Reply

Your email address will not be published. Required fields are marked *