in reply to Re^20: Strange memory leak using just threads (forks.pm)
in thread Strange memory leak using just threads

And if every worker fork needs to modify 1 bit on every 4x page of 1 GB of shared data, it's gonna take forever.

Rubbish

use strict; use warnings; use 5.010; use Time::HiRes qw(time); my $data = 'x' x 500_000_000; my $t = time; $data =~ tr/x/y/; say time - $t; __END__ 0.469923973083496
use strict; use warnings; use 5.010; use Time::HiRes qw(time); my $data = 'x' x 500_000_000; my $t = time; unless (fork) { $data =~ tr/x/y/; exit; } wait; say time - $t; __END__ 0.765344142913818
use strict; use warnings; use 5.010; use threads; use Time::HiRes qw(time); my $data = 'x' x 500_000_000; my $t = time; threads->create( sub { $data =~ tr/x/y/; } )->join; say time - $t; __END__ 1.29563307762146

Replies are listed 'Best First'.
Re^22: Strange memory leak using just threads (forks.pm)
by BrowserUk (Patriarch) on Sep 23, 2010 at 17:23 UTC

    That's not quite what I said is it :)

    See how you fare with this:

    use strict; use warnings; use 5.010; use Time::HiRes qw(time); my $data = 'x' x 500 * 1024**2; my $t = time; for my $n ( 1 .. 100 ) { unless (fork) { substr( $data, 4096 * $_ + $n, 1 ) |= 1 for 0 .. 124; exit; } } waitall; say time - $t;
      This takes 5.25 seconds on my machine (4GB, 4 cores, Linux).

      The respective threads versions take:

      use 5.010; use threads; use Time::HiRes qw(time); my $data = 'x' x (500 * 1024**2); my $t = time; for my $n ( 1 .. 100 ) { threads->create( sub { substr( $data, 4096 * $_ + $n, 1 ) |= 1 for 0 .. 124; } )->join; } say time - $t; __END__ 164.842848062515
      use 5.010; use threads; use Time::HiRes qw(time); my $t = time; for my $n ( 1 .. 100 ) { threads->create( sub { my $data = 'x' x (500 * 1024**2); substr( $data, 4096 * $_ + $n, 1 ) |= 1 for 0 .. 124; } )->join; } say time - $t; __END__ 172.88907790184
        This takes 5.25 seconds on my machine (4GB, 4 cores, Linux).

        By "This", I assume you mean this:

        use 5.010; use Time::HiRes qw(time); my $data = 'x' x 500 * 1024**2; my $t = time; for my $n ( 1 .. 100 ) { unless (fork) { substr( $data, 4096 * $_ + $n, 1 ) |= 1 for 0 .. 124; exit; } } waitall; say time - $t;

        Sounds good, but it isn't doing what you think its doing, nor what I intended, because of uncorrected typos in the (untested) example that you've "ignored". So, lets correct the deficiencies in the forked code:

        I bet if you put back the strict and warnings, and fix the errors so that it is actually doing something:

        <code> use strict; use warnings; use 5.010; use Time::HiRes qw(time); my $data = 'x' x 500 * 1024**2; my $t = time; for my $n ( 1 .. 100 ) { unless (fork) { substr( $data, 4096 * $_ + $n, 1 ) |= chr(1) for 0 .. 124; exit; } } waitall; say time - $t;

        it will take a little longer. (I'll have to sort out and fire up my Unbuntu VM to be sure.)

        The respective threads versions take: 164.842848062515 172.88907790184

        Well yeah! If you use threads like fork, it will take a while.

        For a start, you're running each of the 100 threads serially.

        Second, you're building a new copy of the 500MB string in each thread, modifying 125 bits within it, and then discarding it.

        The point of the exercise was to modify the same string, not 100 * 500MB copies. That's 100 times what the forked version is sharing! You're surprised that it is slow?

        So, lets address those deficiencies of your threaded version:

        #! perl -slw use strict; use 5.010; use threads ( stack_size => 4096 ); use Thread::Queue; use Time::HiRes qw(time); use constant DATA_SIZE => (500 * 1024**2); sub thread { my( $Q, $n ) = @_; $Q->enqueue( 4096 * $_ + $n ) for 0 .. 124; $Q->enqueue( undef ); } my $t = time; my $Q = new Thread::Queue; my @t = map threads->create( \&thread, $Q, $_ ), 1 .. 100; my $data = 'x' x DATA_SIZE ; for(1..100) { substr( $data, $_, 1 ) |= chr(1) while $_ = $Q->dequeue; } $_->join for @t; say time - $t; printf "%d bits changed\n", DATA_SIZE - ( $data =~ tr[x][] ); __END__ c:\test>junk44 2.03600001335144 12500 bits changed

        So, 100 threads calculating modifications to 1/2GB of data, and 2.03 seconds.

        Oh. And when you work out how to print last line of output, we can try for the real range of 0 .. 124_999 :)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.