I know two prints are slower than one, but 200x slower?!?!?

rsmah has asked for the wisdom of the Perl Monks concerning the following question:

I was doing a bit of experimenting with writing a socket server and found some weird behavior that I just don't understand.

I have a simple forking server that echo's data back to the client. Connections are persistent, multiple requests can take place over a single connection. The client simply opens a connection, sends a line of text, reads lines back until a period -- repeats this many times -- and then exits.

When server.pl prints its output to the client using 1 print statement, it's lickety split fast. When it uses 2 or more print statements, it's dead slow. Not just a little slower, over 100x slower! I can't understand why.

Let's run a test using a single print statement in the server (the arg 1 means use 1 print statement):

$ perl server-fork.pl 1 &
$ time perl client.pl 1000 > /dev/null
real    0m0.163s
user    0m0.093s
sys     0m0.021s
[download]

And now using 2 print statements:

$ perl server-fork.pl 0 &
$ time perl client.pl 1000 > /dev/null
real    0m40.018s
user    0m0.052s
sys     0m0.015s
[download]

That's about 245 times slower! Now, I'd expect it to be a bit slower, and I could even accept 2x or 5x slower, but 200x times slower?!?! This just makes no sense to me. Can someone who understands how the perl stdio layer interacts with TCP sockets on Linux explain this?

A few notes that might help:

When the experiment is repeated using UNIX domain sockets, there is no slowdown.
When the test is performed on WinXP, the slowdown is only a few percent.
Changing the size of the data (up to 4KB) echoed or the # of print statements doesn't materially affect the timings.
At 4KB of data transferred, performance with 1 print statement degrades to the level of 2+ print statements

And finally, here's my stupid simple code:

The server.pl is:

use strict;
use IO::Select;
use IO::Socket::INET;

$SIG{CHLD} = 'IGNORE';

my $one_print = shift @ARGV || 0;

my $listener = IO::Socket::INET->new(Listen => 1, LocalPort => 8080);
my $select = IO::Select->new($listener);

while( my @ready = $select->can_read ){
    foreach( @ready ){
        if( $_ == $listener ){
            my $sock = $listener->accept;
            my $pid = fork;
            if( defined $pid && $pid == 0 ){
                while( 1 ){
                    my $line = <$sock>;
                    last unless defined $line;
                    if( $line =~ /^QUIT/ ){
                        print $sock "Goodbye\n";
                        $sock->close;
                        last;
                    }elsif( $one_print ){
                        print $sock $line, ".\n";
                    }else{
                        print $sock $line;
                        print $sock ".\n";
                    }
                } # while read loop
                exit(0);
            }
        }
    }
}
[download]

The client.pl is:

use strict;
use IO::Socket::INET;

my $trials = shift @ARGV;
my $sock = IO::Socket::INET->new(PeerHost => 'localhost', PeerPort => 
+8080);

for(1..$trials){
    my $tm = time;
    print $sock "Hello\n";
    while(my $line = <$sock>){
        last if $line eq ".\n";
        print $line;;
    }
}
print $sock "QUIT\n";
[download]

Comment on I know two prints are slower than one, but 200x slower?!?!? Select or Download Code

Replies are listed 'Best First'.
Re: I know two prints are slower than one, but 200x slower?!?!? by kyle (Abbot) on Aug 16, 2007 at 00:58 UTC
I reproduced the time difference on Mac OS X (real 1.120s vs. 0.381s). The time difference seems to disappear when I change the server code to read: `... my $pid = fork; if ( ! defined $pid ) { die "Can't fork: $!" } if ( $pid ) { close $sock; } else { while( 1 ){ ...` [download] (Each "`...`" is code I didn't change. The salient changes are (1) it will die when it can't fork, and (2) the parent will close the socket it won't be using anyway.) I'm not sure why having the parent leave the socket open causes the problem you're having, but it doesn't surprise me that it causes some problem. Also, you should be checking (in the client and in the server) that your sockets are created successfully. In some runs, but not all, I got a message that implies to me that the call to `IO::Socket::INET->new()` failed.	[reply] [d/l] [select]
Re^2: I know two prints are slower than one, but 200x slower?!?!? by rsmah (Scribe) on Aug 16, 2007 at 01:04 UTC
Wow, that's it -- fixes the problems on Linux too! But so weird. Anyway, you the man, kyle!	[reply]
Re: I know two prints are slower than one, but 200x slower?!?!? by jbert (Priest) on Aug 16, 2007 at 08:43 UTC
I think you're getting Nagled. From the wikipedia article: "With both algorithms enabled, applications which do two successive writes to a TCP connection, followed by a read, experience a constant delay of up to 500 milliseconds, the 'ACK delay'." Adding `setsockopt($sock, IPPROTO_TCP, TCP_NODELAY, 1);` just after your accept seems to clear the delay in the two print case, confirming the guess. Of course, this now raises two other questions: Why does closing the socket in the parent make a difference? Why doesn't this trigger on windows (which surely has these basic TCP algorithms enabled by default too?)	[reply] [d/l]
Re^2: I know two prints are slower than one, but 200x slower?!?!? by cdarke (Prior) on Aug 16, 2007 at 11:03 UTC
The thing about the Nagle argorithm is that you can get wildly different results depending on the activity on the network. Perservsly a quiet network can give worse performance figures, since tehre are no extra packets for the ACK to piggy-back. BTW, the Nagle argorithm is there to reduce collisions, so be careful of using TCP_NODELAY - that will make a bad network worse.	[reply]
Re^3: I know two prints are slower than one, but 200x slower?!?!? by halley (Prior) on Aug 16, 2007 at 13:55 UTC
Exactly right. Candidates for turning off Nagle's Algorithm are applications which will create a trickle of a few small packets on a probably-quiet network, and yet should still be pretty low latency. For example, a user who is typing would find it really annoying to suffer lag on full-duplex telnet. They can't see what they typed until the character is echoed back. If your packets are big (multiple hundreds of bytes), if variable latency isn't a problem, or you are spewing many many packets to keep the network saturated much of the time, then let Nagle do its job. In the 90s (and still today), tons of games developers would think that they needed to reinvent all the "promises" of TCP by writing a huge and ugly app layer on top of a UDP datagrams protocol. It's folly to reinvent TCP at the app layer, especially when all the infrastructure is highly tuned to do TCP really really well. Turning off Nagle usually opened their eyes in disbelief. -- `[ e d @ h a l l e y . c c ]`	[reply]
Re: I know two prints are slower than one, but 200x slower?!?!? by blazar (Canon) on Aug 16, 2007 at 15:16 UTC
`if( $_ == $listener ){` [download] I don't know if this is common practice and if it can cause false positives, nor whether `==` is overloaded, although the answer is: not that I know of. But unless all this holds, I would go with `eq` instead.	[reply] [d/l] [select]
Re^2: I know two prints are slower than one, but 200x slower?!?!? (==) by tye (Sage) on Aug 16, 2007 at 16:51 UTC
If you know $x and $y are both objects, then I much prefer $x == $y over $x eq $y, your FUD not withstanding. If the objects may overload numification and/or stringification, then that changes things; it can make either or both of == / eq inappropriate. In that case, you can use something from overload but I'm not familiar with it. - tye	[reply]
Re^3: I know two prints are slower than one, but 200x slower?!?!? (==) by bart (Canon) on Aug 16, 2007 at 20:37 UTC
I think there's a function on recent versions of Scalar::Util, that returns the proper, unoverloaded reference, for use in Inside Out Objects for example. I guess it must be `refaddr` that is used for this.	[reply]
Re^4: I know two prints are slower than one, but 200x slower?!?!? (refaddr) by tye (Sage) on Aug 16, 2007 at 20:50 UTC
Re^3: I know two prints are slower than one, but 200x slower?!?!? (==) by blazar (Canon) on Aug 16, 2007 at 17:24 UTC
If you know $x and $y are both objects, then I much prefer $x == $y over $x eq $y, your FUD not withstanding. Sorry for giving the impression of spreading FUD, which has a definitely bad acceptation. Actually it was a fear, an uncertainty and a doubt of mine, as I hope it was clear enough. Now, AIUI from your explanation, both `==` and eq are equally fine from the technical POV, aren't they? I must admit I had never thought what the numification of a (blessed) reference could be: `cognac:~ [19:21:09]$ perl -le 'print for map {$_, 0+$_} bless \(my $x) +,"A"' A=SCALAR(0x814fe28) 135593512 cognac:~ [19:21:15]$ perl -le 'print 0x814fe28' 135593512` [download] Thank you for stimulating me to meditate and "experiment" about this!	[reply] [d/l] [select]
Re^3: I know two prints are slower than one, but 200x slower?!?!? (==) by Anno (Deacon) on Aug 17, 2007 at 15:37 UTC
...either or both of `==` / `eq` inappropriate. In that case, you can use something from overload but I'm not familiar with it. That would be `overload::StrVal()`, to be used with `eq`. Alternatively, `Scalar::Util::refaddr()` and `==`. Update: All this has been said, as I noticed after posting. Anno	[reply] [d/l] [select]