RFC: Proc::Governor

Here is the documentation for a little module I threw together after one of our services did a denial-of-service attack against another of our services. The math for this simple trick works out very neatly.

I plan to upload this to CPAN very soon. Please let me know what you think.

NAME

Proc::Governor - Automatically prevent over-consumption of resources.

SYNOPSIS

    use Proc::Governor();

    my $gov = Proc::Governor->new();

    while( ... ) {
        $gov->breathe();
        ... # Use resources
    }

    while( ... ) {
        my $res = $gov->work( sub {
            ... # Use Service
        } );
        ...
    }
[download]

DESCRIPTION

If you want to do a batch of processing as fast as possible, then you should probably also worry about overwhelming some resource and causing problems for other tasks that must share that resource. Fortunately, there is a simple trick that allows one to perform a batch of processing as fast as possible while automatically backing off resource consumption when most any involved resource starts to become a bottleneck (or even before it has become much of a bottleneck).

The simple trick is to pause between steps for a duration equal to how long the prior step took to complete. The one minor down-side to this is that a single strand of execution can only go about 1/2 maximum speed. But if you have 2 or more strands (processes or threads), then throughput is not limited by this simple "universal governor" trick.

It is also easy to slightly modify this trick so that, no matter how many strands you have working, they together (without any coordination or communication between the strands) will never consume more than, say, 60% of any resource (on average).

A typical pattern for batch processing is a client sending a series of requests to a server over a network. But the universal governor trick also works in lots of other situations such as with 1 or more strands where each is doing a series of calculations and you don't want the collection of strands to use more than X% of the system's CPU.

Note that the universal governor does not work well for resources that remain consumed while a process is sleep()ing, such as your process using too much memory.

Proc::Governor provides lots of simple ways to incorporate this trick into your code so that you don't have to worry about your code becoming a "denial-of-service attack", which also frees you to split your processing among many strands of execution in order to get it done as fast as possible.

METHODS

new()

    my $gov = Proc::Governor->new( {
        working     => 0,
        minSeconds  => 0.01,
        maxPercent  => 100,
        unsafe      => 0,
    } );
[download]

new() constructs a new Proc::Governor object for tracking how much time has recently been spent potentially consuming resources and how much time has recently been spent not consuming resources.

new() takes a single, optional argument of a reference to a hash of options. The following option names are currently supported:

working

If given a true value, then the time spent immediately after the call to new() is counted as "working" (consuming resources). By default, the time spent immediately after the call to new() is counted as "not working" (not consuming).

minSeconds

minSeconds specifies the shortest duration for which a pause should be done. If a pause is requested but the calculated pause duration is shorter than the number of seconds specified for minSeconds, then no pause happens (and that calculated duration is effectively added to the next pause duration).
The default for minSeconds is 0.01.

maxPercent

maxPercent indicates how much of any particular resource the collection of strands should be allowed to consume. The default is 100 (for 100%, or all of any resource, but avoid building up a backlog by trying to over-consuming any resource).
Note that percentages are not simply additive. Having 3 groups of clients where each is set to not consume more than 75% of the same service's resources is the same as having just 1 group. The 3 groups together will not consume more than 75% of the service's resources in total.
Say you have a group of clients, H, all set to not consume more than 50% of some service's resources and you have another group of clients, Q, all set to not consume more than 25% of that same service's resources. Both H and Q together will not add up to consuming more than 50% of the service's resources.
If Q is managing to consume 20% of the service's resources when H starts running, then H won't be able to consume more than 30% of the service's resources without (slightly) impacting performance to the point that Q starts consuming less than 20%.
     H   Q   Total
    50%  0%   50%
    40% 10%   50%
    30% 20%   50%
    25% 25%   50%
[download]
unsafe

You can actually specify a maxPercent value larger than 100, perhaps because you have measured overhead that isn't easily accounted for by the client. But doing so risks overloading a resource (your measured overhead could end up being a much smaller percentage of the request time when the service is near capacity).
So specifying a maxPercent of more than 100 is fatal unless you also specify a true value for unsafe.

beginWork()

    $gov->beginWork( $breathe );
[download]

Calling beginWork() means that the time spent immediately after the call is counted as "working" (consuming resources). Such time adds to how long the next pause will be.

If $breathe is a true value, then beginWork() may put the strand to sleep for an appropriate duration.

endWork()

    $gov->endWork( $breathe );
[download]

Calling endWork() means that the time spent immediately after the call is counted as "not working" (not consuming resources). Such time subtracts from how long the next pause will be.

If $breathe is a true value, then endWork() may put the strand to sleep for an appropriate duration.

work()

    $gov->work( sub {
        ... # Consume resources
    }, $which );
[download]

work() is a convenient shortcut that is roughly equivalent to:

    $gov->beginWork( $before );
    ... # Consume resources
    $gov->endWork( $after );
[download]

The value of $which can be:

    0   No pause will happen.
    1   A pause may happen before the sub reference is called.
    2   A pause may happen after the sub reference is called.
    3   A pause may happen before and/or after the sub is called.
[download]

If $which is not given or is undefined, then a value of 1 is used.

You can actually get a return value through work():

    my @a = $gov->work( sub { ...; get_list() }, $which );
    my $s = $gov->work( sub { ...; get_item() }, $which );
[download]

Note that scalar or list (or void) context is preserved.

Currently, if your code throws an exception, then endWork() does not get called. This is the same as would happen with the "equivalent" code shown above.

breathe()

    $gov->breathe( $begin );
[download]

Calling breathe() requests that the current process/thread pause for an appropriate duration.

Each of the following:

    $gov->breathe();
    # or
    $gov->breathe( 1 );
[download]

is actually equivalent to:

    $gov->beginWork( 1 );
[download]

While

    $gov->breathe( 0 );
[download]

will just pause but will not change whether $gov is counting time as "working" or as "not working".

pulse()

    $gov->pulse( $count, $begin );
[download]

pulse() is very much like breathe() except that it is optimized for being called many times before enough "working" time has accumulated to justify doing a pause. The meaning of $begin is the same as with breathe().

So, if you are making requests of a very fast service or are doing work in small chunks, then you can call pulse() directly in your loop and just pass it a value specifying approximiately how many calls to pulse() should be made before one of those calls does the work of calculating how long of a pause is called for.

For example, a request to our Redis service typically takes a bit under 1ms. So code to perform a large number of such requests back-to-back might be written like:

    my $gov = Proc::Governor->new( {
        maxPercent  => 70,
        working     => 1,
    } );
    my $redis = Redis->new(server=>...);
    while( ... ) {
        $gov->pulse( 20 );
        $redis->...;
    }
[download]

That is like calling breathe() every 20th time through the loop and is only the slightest bit less efficient (in run time) than if you had made the extra effort to write:

    ...
    my $count = 0;
    while( ... ) {
        if( 20 < ++$count ) {
            $gov->breathe();
            $count = 0;
        }
        ...
[download]

CROSS-OBJECT INTERACTIONS

A single process (or thread) can simultaneously use more than one Proc::Governor object. For example, each process (of a group) that makes a series of requests to a service and does significant local processing of the data from each request might want to both prevent overwhelming the service and prevent overwhelming local resources (such as CPU).

So you could have two Proc::Governor objects. One throttles use of local resources ($g_cpu below). The other throttles use of service resources ($g_db below).

    my $g_cpu = Proc::Governor->new( { maxPercent => 80 } );
    my $g_db =  Proc::Governor->new( { maxPercent => 30 } );

    $g_db->beginWork();
    my $db = DBI->connect( ... );       # DB work
    my $rows = $db->selectall_arrayref( ... );
    $g_db->endWork();

    for my $row ( @$rows ) {
        my $upd = $g_cpu->work( sub {
            process_row( $row );        # Local work
        } );
        $g_db->work( sub {
            $db->update_row( $upd );    # DB work
        } );
    }
[download]

The above code assumes that the local resources required for making requests of the database service are relatively low. And realizes that doing local computations do not use database resources.

If you set maxPercent to 100 for both Governors and each process spent about the same amount of time waiting for a response from the database as it spent performing local computations, then there might be no need for any pauses.

Note that only time spent doing "DB work" adds to how long of a pause might be performed by the $g_db Governor. And only time spent doing "Local work" adds to how long of a pause might be performed by the $g_cpu Governor.

Any pauses executed by either Governor get subtracted from the duration of any pauses of any Governor objects. So the $g_db Governor executing a pause also counts as a pause for the $g_cpu Governor (and thus makes the next pause that it performs either shorter or later or just not needed).

Time spent inside of Proc::Governor methods may also be subtracted from future pause durations. But the code pays more attention to keeping such overhead small than to providing highly accurate accounting of the overhead and trying to subtract such from every Governor object.

WHEN TO PAUSE

Say you have a service that is a layer in front of some other service. You want to ensure that your service can't become a denial-of-service attack against the other service. But you want to prevent a Governor pause from impacting clients of your service when possible.

You could implement such as follows:

    sub handle_request {
        my( $req ) = @_;
        our $Gov ||= Proc::Governor->new();
        my $res = $Gov->work( sub {
            forward_request( $req );
        }, 0 );                 # Don't pause here.
        give_response( $res );
        $Gov->breathe( 0 );     # Pause here; still idle.
    }
[download]

(Well, so long as your service architecture supports returning a complete response before the request handler subroutine has returned.)

If the other service is not near capacity, then the added pauses have no impact (other than perhaps preventing the number of active strands for your service from dropping lower). Be sure your service has an appropriate cap on how many strands it is allowed to keep active (as always).

TO-DO

A future version should have support for asynchronous processing. The shape of that interface is already sketched out, but the initial release was not delayed by the work to implement such.

- tye

Comment on RFC: Proc::Governor Select or Download Code

Replies are listed 'Best First'.
Re: RFC: Proc::Governor by zentara (Cardinal) on Jul 28, 2014 at 09:47 UTC
I like the naming of the methods breathe and pulse, it makes it seem ALIVE. :-) I'm not really a human, but I play one on earth. Old Perl Programmer Haiku ................... flash japh	[reply]
Re: RFC: Proc::Governor by RonW (Parson) on Jul 28, 2014 at 16:42 UTC
Very well thought out. I think the test engineering group at my employer would find this useful.	[reply]
Re: RFC: Proc::Governor by locked_user sundialsvc4 (Abbot) on Jul 28, 2014 at 17:26 UTC
I think that it is excellent, and that it should be posted to CPAN without delay. “Well done.”