esolkc has asked for the wisdom of the Perl Monks concerning the following question:

I have a weird for-loop issue, where my swap gets crazy large and the loop takes 600 seconds. Can someone let me know why this happens? And what can I do when $U << $S?
$U = $S = 100000; $dimMax = 5; for($u = 0; $u < $U; $u++ ) { $minDistance = 100; $group[$u] = -99; for($s = 0; $s < $S; $s++ ) { $distance[$u][$s] = 0; for($dim = 0; $dim < $dimMax; $dim++ ) { $distance[$u][$s] = $distance[$u][$s] + ($input_train[$dim][$u] +- $weight[$dim][$s])**2; } $distance[$u][$s] = sqrt($distance[$u][$s]) / $dimMax; if($distance[$u][$s] < $minDistance) { $minDistance = $distance[$u][$s]; $minS = $s; } } $group[$u] = $minS; }

Replies are listed 'Best First'.
Re: for-loop issue - swap gets crazy large
by BrowserUk (Patriarch) on Aug 31, 2011 at 20:06 UTC
    my swap gets crazy large and the loop takes 600 seconds. Can someone let me know why this happens?

    Everything you need to know is evident from these 3 lines:

    $U = $S = 100000; ... for($u = 0; $u < $U; $u++ ) { ... for($s = 0; $s < $S; $s++ ) { $distance[ $u ][ $s ] = 0;

    You are creating an Array of Arrays with 100,000 x 100,000 elements.

    Each sub array will require 3.2MB. 100,000 of those and you'll need 300 Gigabytes of virtual ram. Since you probably only have somewhere between 4GB and say 32GB of physical ram, the rest will need to be swapped to disk.

    If your system is doing that in 10 minutes you must have a pretty fast disk set-up. Rather than complaining, you should be impressed.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Stupid question. Can I "slice or destroy" $distance[$u-1][$s-1] after the last (-1)  if($distance[$u][$s] < $minDistance) The only thing I like to find is what $weight[$dim][$s] is closest to $input[$dim][$u]. I don't need this 2x-array anymore. BTW, the 600 seconds work for a small $S. With $S=100000 the process will be destroyed. :(
        my @last_distance; # Use instead of @{ $distance[$u-1] } my @this_distance; # Use instead of @{ $distance[$u-0] } for($u = 0; $u < $U; $u++ ) { ... @last_distance = @this_distance; }
Re: for-loop issue - swap gets crazy large
by ikegami (Patriarch) on Aug 31, 2011 at 20:14 UTC

    You are creating 100_000 arrays, each with references to arrays of 100_000 elements. That's 10_000_000_000 elements.

    $ perl -MDevel::Size=total_size -E'$x = 1.2; say total_size($x);' 24

    A scalar holding a floating point number takes 24 bytes, so we're up to 240 GB without counting the arrays themselves, the references and the overheard of the memory allocation system.

    In a given loop pass, you never use any index of @distance other than $distance[$u], so you don't have to keep the other elements in memory. You could dump the results to disk (or whatever it is you do with them) instead of keeping them in memory.

    [ Oops, too slow. I got interrupted mid-composition. ]

      ...but how can I "dump" them. do you have a -suggestion-?
        You could dump them as strings if you want something readable, or you could get the underlying bytes using pack 'F' to avoid precision loss.
Re: for-loop issue - swap gets crazy large
by BrowserUk (Patriarch) on Aug 31, 2011 at 23:19 UTC

    I'm fairly sure (untested) that this will produce the same results in @group whilst using minimal memory and much more quickly:

    #! perl -slw use strict; use List::Util qw[ sum ]; sub distance { my( $iref, $wref, $dim, $s, $u ) = @_; return sqrt( sum( map{ ( $iref->[ $_ ][ $u ] - $wref->[ $_ ][ $s ] ) **2 } 0 .. $dim - 1 ) ) / $dim; } my $U = my $S = 100000; my $dimMax = 5; my @group; my @input_train = ...; my @weights = ...; for my $u ( 0 .. $U - 1 ) { my $minDistance = 100; $group[$u] = -99; my $minS; for my $s ( 0 .. $S -1 ) { my $dist = distance( \@input_train, \@weights, $dimMax, $s, $u + ); if( $dist < $minDistance ) { $minDistance = $distance[ $u ][ $s ]; $minS = $s; } } $group[ $u ] = $minS; }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: for-loop issue - swap gets crazy large
by Anonymous Monk on Sep 01, 2011 at 19:30 UTC
    If most of the elements in your two-dimensional matrix are known to be zero, then you are dealing with a "sparse matrix" data structure. You're storing certain non-zero values at certain (x,y) coordinates and you're only interested in those, so you only need to store those. Simply search CPAN for the keyword, "sparse."