in reply to Re^3: Out of memory.
in thread Out of memory.

The postfix file that feeds the program is 113,957 bytes. The array right before the join that results in the memory error is 513,422,726 bytes. The program can be run to recreate the problem everytime on Windows XP box with 2gig of RAM running PERL 5.10.0 build 1003 by ActiveState.

Replies are listed 'Best First'.
Re^5: Out of memory.
by BrowserUk (Patriarch) on Oct 22, 2008 at 06:17 UTC

    Try this and see how you get on:

    #! perl -slw use strict; sub multiply { my( $ref1, $ref2 ) = @_; my @temp; for my $r1 ( @{ $ref1 } ) { for my $r2 ( @{ $ref2 } ) { push @temp, $r2 . $r1; } } return \@temp; } $/ = ','; my @stack; open POSTFIX, '<', '718318.dat' or die $!; while( my $op = <POSTFIX> ) { chomp $op; chop $op if $op =~ tr[\n][\n]; ## remove any trailing newline if( $op eq 'x' ) { push @stack, multiply( pop( @stack ), pop( @stack ) ); } elsif( $op eq '+' ) { push @stack, [ @{ pop @stack }, @{ pop @stack } ]; } elsif( $op =~ m[^\d+$] ) { push @stack, [ pack 'v', $op ]; } else { die "Bad '$op' at position;" . tell( POSTFIX ); } } print "stacksize: " . @stack; for my $item ( @stack ) { printf "%s", join ',', unpack 'v*', @{ $item }[ 0 ]; for my $group ( @{ $item }[ 1 .. $#$item ] ) { printf "+%s", join ',', unpack 'v*', $group; } print "\n"; }

    This does 4 things to try and save memory without unduly hitting performance:

    1. Instead of reading the entire postfix expression into memory as a single string and then splitting it into a large array, it sets $\ = ',' and reads the expression one operand at a time.
    2. Rather than storing the intermediate values as strings which have to be split & joined over and over, it stores them as arrays of substrings.

      Instead of an intermediate value being stored as 1,2,3+4,5,6+7,8,9, that same data is stored as [ '1,2,3', '4,5,6', '7,8,9' ].

      This makes the additive process just a case of coombining the two arrays on the top of the stack into one.

      And the multiply process combines the substrings using concatenation, and pushes them into a new array.

    3. Instead of storing the substrings as comma-delimited ASCII, it packs them together as 2-byte binary unsigned integers.

      This saves 33% memory for 2-digits numbers; 50% for 3-digits; 60% for 4-digits; and 66% for 5-digits. (Including the now unnecessary commas.)

    4. Finally, the numbers don't get ASCIIized or the ,s and +s added until output.

      And this is done in chunks to avoid building the entire string in memory. This could be taken a stage further by unpacking the numbers from the substrings one at a time, but it hasn't seemed necessary.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I ran the updated code and got outofmemory again, but it kept running at 1.2-1.4 gigs until I killed it an hour later. I converted the number vars to characters a-zA-Z and 'x' to '*' to compress the data:
        It almost got it all. Here is the rest...