in reply to Mysterious slow down with large data set

I can't see what is causing the slowdown, but I can see one obvious thing that would speed it up a lot. You keep re-sorting the keys to %kernel every time when they do not change. Instead of:

foreach $w1 ( sort( keys %kernel ) ){ $totalsim = $maxsim = 0; @topX = (); $at2 = 0; foreach $w2 ( sort( keys %kernel ) ) { ...

Using:

my @sortedKeys = sort( keys %kernel ); foreach $w1 ( @sortedKeys ){ $totalsim = $maxsim = 0; @topX = (); $at2 = 0; foreach $w2 ( @sortedKeys ) {

may speed things up to the point that the slowdown becomes insignificant.

Also, using a sort to track top N is probably slower than a simple linear insertion and truncate if necessary.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^2: Mysterious slow down with large data set
by jsmagnuson (Acolyte) on Feb 26, 2012 at 23:44 UTC

    Yes, I can't believe I did that. Thank you!

    Regarding the top N, I had previously tried this, which worked, but seemed more complicated. Is this what you had in mind?

    } elsif ($sim > min(pdl(@topList))) { $theMin = grep { $topX[$_] eq min(pdl(@topX)) } 0..$#topX; # replace the smallest $topX[$theMin] = $sim; # add this one push @topX, $sim; }
    Thanks!

      I tried it this way:

      @topX = (-1) x 20; ... $topX[ $_ ] < $sim and splice( @topX, $_, 0, $sim ), pop( @top +X ), last for 0 .. 19;

      A short-ciruited, linear insertion is at worst O(N) rather than O(N logN).

      It speeds things a little, but doesn't address the slowdown which is happening exclusively (and inexplicably) inside PDL.

      Unfortunately, the PDL documentation spends more time telling you about their 'philosophy'; and indexing the indexes to the documentation than is does telling you what these functions actually do; or how to inspect the results of what they did :(


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?