in reply to Help tightening up a subroutine please

I am willing to bet your grep is the slowest bit, but have you profiled the code? Anyhow, assuming the grep bit is your bottleneck:

## your original code @{$sets{$fasta_id}[$setscounter]{$sitekey}} = grep { $_ >= $lowerlimit and $_ <= $upperlimit } @{$matches{$fasta_id}{$sitekey}};

Depending on your data, you may gain some performance by using a sorted array instead:

@{$sets{$fasta_id}[$setscounter]{$sitekey}} = (); foreach ( sort @{$matches{$fasta_id}{$sitekey}} ) { last if $_ > $upperlimit; next if $_ < $lowerlimit; push @{$sets{$fasta_id}[$setscounter]{$sitekey}}, $_; }

This will only have an advantage if sort is sufficiently fast and there are many elements above $upperlimit, so this suggestion comes with an extreme case of "know thy data".

<radiant.matrix>
Ramblings and references
The Code that can be seen is not the true Code
I haven't found a problem yet that can't be solved by a well-placed trebuchet

Replies are listed 'Best First'.
Re^2: Help tightening up a subroutine please
by mdunnbass (Monk) on Jan 23, 2007 at 20:32 UTC
    Your code here:
    @{$sets{$fasta_id}[$setscounter]{$sitekey}} = (); foreach ( sort @{$matches{$fasta_id}{$sitekey}} ) { last if $_ > $upperlimit; next if $_ < $lowerlimit; push @{$sets{$fasta_id}[$setscounter]{$sitekey}}, $_;

    Looks a lot like my original code, in the first bit, here:
    for my $hit (@{$matches{$fasta_id}{$sitekey}}) { next unless ($hit >= $lowerlimit); last unless ($hit <= $upperlimit); my $ggg = $hit + 0; push (@arrayA, $ggg);

    And indeed, it does have a speed advantage. I don't need the sort, as the elements are already sorted in numerical order, so I can (and do) leave that out. As for knowing thy data, this here's the tricky part. I am going through the numbers in the arrays, and basically setting aside clusters of EVERY group of numbers $span distance apart. Starting from the lowest element, and proceeding all the way to the highest. So, how many elements there are above $upperlimit is as variable as possible. It goes from all to none.

    Thanks for the input, though.
    Matt

      Hm, somehow I missed that. Given the extra bit of data you supplied, I might recommend the use of List::MoreUtil's firstidx and lastidx functions. The fact that it's XS ought to help with the speed.

      # for clarity... my $result; my $src = $matches{$fasta_id}{$sitekey}; # Use a slice and firstidx/lastidx to copy the range of values @newArray = @$src[ (firstidx {$_ >= $lowerlimit} @$src )..(lastidx {$_ <= $upperlimit} + @$src) ]; $result = \@newArray;
      <radiant.matrix>
      Ramblings and references
      The Code that can be seen is not the true Code
      I haven't found a problem yet that can't be solved by a well-placed trebuchet