Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Inline::C's AoA is much bigger than Perl's

by tlm (Prior)
on Mar 16, 2009 at 21:47 UTC ( [id://751041]=perlquestion: print w/replies, xml ) Need Help??

tlm has asked for the wisdom of the Perl Monks concerning the following question:

Here's a question for the Perl-guts hackers in the audience.

I'm in the process of optimizing some Perl code, and for this purpose I availed myself of Inline::C. The resulting code is indeed faster, but it uses significantly more memory than the original version. Upon further investigation I narrowed the problem to a marked difference in the sizes of array of arrays (AoAs) between those generated by Perl and those generated by the C code. The following script illustrates the problem:

################################################################ # test_aoa.pl ################################################################ use warnings FATAL => 'all'; no warnings 'once'; use strict; use Inline 'C' => Config => OPTIMIZE => '-O2'; use Inline 'C'; use Time::HiRes 'gettimeofday'; my $START = my $ELAPSED = microseconds(); my $MAKE_AOA = !!( shift @ARGV ) ? \&make_aoa_c : \&make_aoa; my $BASELINE = mem_size(); for ( 1..5 ) { start(); my $table = $MAKE_AOA->( 1000, 1000 ); $ELAPSED = elapsed(); printf "%d: %d (%d us)\n", $_, mem_size() - $BASELINE, $ELAPSED; } sub mem_size { chomp( my $size = `ps -o rss= -p $$` ); return $size + 0; } sub make_aoa { my ( $n_rows, $n_cols ) = @_; return [ map [ ( 'foo' ) x $n_cols ], 1..$n_rows ]; } sub microseconds { my ( $sec, $microsec ) = gettimeofday(); return 1E6 * $sec + $microsec; } sub start { $START = microseconds(); } sub elapsed { return $START ? microseconds() - $START : 0; } __END__ __C__ /* get_mortalspace comes from "Extending and Embedding Perl" by Jenness and Cozens, p. 242 */ static void * get_mortalspace ( size_t nbytes ) { SV * mortal; mortal = sv_2mortal( NEWSV(0, nbytes ) ); return (void *) SvPVX( mortal ); } SV *make_aoa_c( int n_rows, int n_cols ) { int i; int n_items = n_rows * n_cols; char *foo = "foo"; SV **table; SV **row_ptr; table = ( SV ** ) get_mortalspace( n_rows * sizeof *table ); row_ptr = ( SV ** ) get_mortalspace( n_items * sizeof *row_ptr ); for ( i = 0; i < n_rows; i++ ) { int j; SV **row = row_ptr; for ( j = 0; j < n_cols; ++j ) { row[ j ] = sv_2mortal( newSVpv( foo, 0 ) ); ++row_ptr; } { AV *av = ( AV * ) sv_2mortal( av_make( ( I32 ) n_cols, row ) ); table[ i ] = sv_2mortal( newRV( ( SV * ) av ) ); } } return newRV( sv_2mortal( ( SV * ) av_make( ( I32 ) n_rows, table ) +) ); }

When given a "false" argument, the script uses the pure Perl function make_aoa to generate an AoA; otherwise it uses the C function make_aoa_c.

When I execute this script, this is what I get:

% perl test_aoa.pl 0 1: 78696 (192169 us) 2: 78700 (211009 us) 3: 78700 (159965 us) 4: 78700 (160709 us) 5: 78700 (167226 us) % perl test_aoa.pl 1 1: 125752 (356172 us) 2: 133572 (310640 us) 3: 133572 (263302 us) 4: 133572 (265807 us) 5: 133572 (267763 us)
As you can see, the AoA generated by make_aoa_c is almost twice the size than the one generated by make_aoa (1.7x the size, to be more precise).

To add insult to injury, my snazzy Inline::C code is much slower too. Fortunately, this is the case only in this little test script. In the real application I'm working on, the move to Inline::C did make a big difference. Still, I would also love to know why my C code is so much slower...

Anyway, after all these years and much reading on the subject, I remain as mystified as ever by the Perl internals, so I'm sure my code in make_aoa_c is doing something pretty clueless. Any words of wisdom would be appreciated.

the lowliest monk

Replies are listed 'Best First'.
Re: Inline::C's AoA is much bigger than Perl's
by Marshall (Canon) on Mar 17, 2009 at 00:33 UTC
    As far as a "guts" book goes, I have learned a lot from "Advanced Perl Programming" by Srinivasan. Chapter 20, Perl Internals is most instructive. I've learned a lot about how these internal Perl data structures are built. This book requires an intermediate level of C knowledge.
Re: Inline::C's AoA is much bigger than Perl's
by tlm (Prior) on Mar 16, 2009 at 22:24 UTC
    Update:Actually the problems with the code below, I just discovered, are far more serious than a mere memory leak. It blows up if I try to accesses the elements in the table... I guess it's time for me to call it a day!

    OK, I'm replying to myself here...

    After I posted the original query, I tried a different tack. I streamlined the C function make_aoa_c, as follows:

    SV *make_aoa_c( int n_rows, int n_cols ) { int i, j; char *foo = "foo"; AV *table = newAV(); AV *row; for ( i = 0; i < n_rows; ++i ) { row = ( AV * ) sv_2mortal( ( SV * ) newAV() ); for ( j = 0; j < n_cols; ++j ) { av_push( row, newSVpv( foo, 0 ) ); } av_push( table, sv_2mortal( newRV( ( SV * ) row ) ) ); } return newRV( ( SV * ) table ); }
    This took care of the size problem for the most part, and greatly improved the speed (though it's still slower than Perl).

    Unfortunately, the code has sprung a small memory leak that I can't identify! Changing the number of repetitions to 10, now the output for the C case looks like this:

    % perl test_aoa.pl 1 1: 78844 (280041 us) 2: 78884 (247865 us) 3: 78892 (237725 us) 4: 78900 (245755 us) 5: 78908 (235251 us) 6: 78916 (235712 us) 7: 78924 (246926 us) 8: 78932 (237128 us) 9: 78940 (237369 us) 10: 78948 (238528 us)
    With every iteration, the memory grows by at least 8kb.

    I have tried adding sv_2mortal around various items in the code (e.g. AV *table=(AV *)sv_2mortal((SV *)newAV())), but I get errors like:

    Attempt to free unreferenced scalar: SV 0x5b2c620, Perl interpreter: 0 +x603010 at test_aoa.pl line 22.
    If anyone can spot the memory leak here, I'd much appreciate it!

    the lowliest monk

      The following routine works:

      SV *make_aoa_c( int n_rows, int n_cols ) { int i, j; char *foo = "foo"; AV *table = newAV(); AV *row; for ( i = 0; i < n_rows; ++i ) { row = newAV(); for ( j = 0; j < n_cols; ++j ) { av_push( row, newSVpv( foo, 0 ) ); } av_push( table, newRV_noinc( row ) ); } return newRV_noinc( table ); }
      $ ./751041.pl 1 1: 78836 (233391 us) 2: 78872 (216509 us) 3: 78872 (206672 us) 4: 78872 (206775 us) 5: 78872 (206308 us) 6: 78872 (207777 us) 7: 78872 (206677 us) 8: 78872 (206739 us) 9: 78872 (206675 us) 10: 78872 (205979 us)

      No memory leak, and if I dump $table (e.g. using Data::Dumper, with a size of 3 x 3 or so), it holds the expected data...

      (I think you were just doing more mortalizing than necessary...  The newRV_noinc makes sure that the arrays' reference counts stay at 1, so they'll get freed, when the respective outer structure is being freed.)

      For comparison, the pure-Perl implementation (still slightly faster/smaller):

      $ ./751041.pl 0 1: 78696 (213442 us) 2: 78704 (208485 us) 3: 78704 (175431 us) 4: 78704 (175438 us) 5: 78704 (175422 us) 6: 78704 (175486 us) 7: 78704 (175667 us) 8: 78704 (175647 us) 9: 78704 (175682 us) 10: 78704 (175687 us)
        The remaining difference in memory is due to the Perl version knowing exactly how big the array will be from the start (because the whole list is assigned in one go):
        FILL = 999 MAX = 999

        The C version causes the arrays to grow and leaves space for growth:

        FILL = 999 MAX = 1021

        By pre-extending the arrays,

        SV *make_aoa_c( int n_rows, int n_cols ) { int i, j; char *foo = "foo"; AV *table = newAV(); av_extend(table, n_rows-1); /* <---------- */ for ( i = 0; i < n_rows; ++i ) { AV *row = newAV(); av_extend(row, n_cols-1); /* <---------- */ for ( j = 0; j < n_cols; ++j ) { av_push( row, newSVpv( foo, 0 ) ); } av_push( table, newRV_noinc( row ) ); } return newRV_noinc( table ); }

        both the Perl and the C data structures are identical.

        FILL = 999 MAX = 999

        and the process that calls the C version uses less memory (perhaps from reduced stack usage?)

        $ perl test_aoa.pl 1: 78688 (184871 us) 2: 78696 (298376 us) 3: 78696 (196999 us) 4: 78696 (204391 us) 5: 78696 (225786 us) $ perl test_aoa.pl use_xs 1: 78604 (321481 us) 2: 78616 (360377 us) 3: 78616 (219468 us) 4: 78616 (211587 us) 5: 78616 (209231 us)

        The times are comparable, but note this it a busy machine.

        The remaining difference in speed is most likely due to strlen of "foo" being recomputed every time in the inner loop (i.e. the zero in newSVpv(foo, 0) ).

        Precomputing it once (as Perl can do too, because the "foo" in "foo" x $n_cols is by definition fix) — i.e.

        SV *make_aoa_c( int n_rows, int n_cols ) { int i, j; char *foo = "foo"; AV *table = newAV(); AV *row; int len = strlen(foo); av_extend(table, n_rows-1); for ( i = 0; i < n_rows; ++i ) { row = newAV(); av_extend(row, n_cols-1); for ( j = 0; j < n_cols; ++j ) { av_push( row, newSVpv( foo, len ) ); // or newSVpvn(...) } av_push( table, newRV_noinc( row ) ); } return newRV_noinc( table ); }

        makes any XS vs. Perl speed difference go away (or at least statistically insignificant).

        (Without this optimisation I did observe a small, but consistent difference — approx. 5% on average.)

Re: Inline::C's AoA is much bigger than Perl's
by tlm (Prior) on Mar 17, 2009 at 20:53 UTC

    Marshall, thanks for reminding me of Srinavasan's book. It's one of my favorites (and it's clearly due for re-read).

    And many, many thanks to almut and ikegami. Your posts taught me what I needed to know to get this thing to work right.

    the lowliest monk

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://751041]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-03-29 15:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found