in reply to Can I speed this up? (repetitively scanning ranges in a large array)

daverave:

I'd love to code it up, but I'm at work right now. Here's what I'm thinking:

  1. Normalize your lists:
    1. For each one that wraps, turn it into two lists: one extended left from the right coordinate, and one extended right from the left coordinate.
      For example: In a universe of 1..20, the range of (18,5) would normalize to (-2, 5) and (18,25).
    2. Sort your lists by the beginning member.
  2. For each column in your range:
    1. Add all lists containing the current column into your working set
    2. For each list in the working set:
      1. Compute T = 3+2*min(col-list[start],list[end]-col)
      2. The desired output is max(T)
    3. Remove from the working list any list having a lower result whose endpoint is less than the list with the maximum result.

...roboticus

Replies are listed 'Best First'.
Re^2: Can I speed this up?
by roboticus (Chancellor) on Nov 01, 2010 at 17:59 UTC

    daverave:

    I had a bit of time on my lunch break, so I whipped a little something up. It's pretty close, but there are still a few tweaks I have in mind for it...

    #!/usr/bin/perl use strict; use warnings; my @universe = (1, 10); my @ranges = ( [ 1, 3 ], [2, 7], [10, 2] ); # Normalize the lists my @nrm_rng = map { ($$_[0] < $$_[1]) ? $_ : ( [ $$_[0]-$universe[1], $$_[1] ], [ $$_[0], $$_[1] + $universe[1]-$universe[0] ] + ) } @ranges; # Dump lists printf("% 2u ",$_) for $universe[0] .. $universe[1]; print "\n"; print "-- " for $universe[0] .. $universe[1]; print "\n"; for my $R (@nrm_rng) { printf("%2s ", ($_>= $$R[0] and $_<=$$R[1]) ? "XX" : "") for $universe[0] .. $universe[1]; print ": ($$R[0], $$R[1])\n"; } my $arMax; my @out; my @working; for my $col ($universe[0] .. $universe[1]) { $arMax=undef; # Add and remove working lists based on column position push @working, grep { $col>=$$_[0] } @nrm_rng; @nrm_rng = grep { $col < $$_[0] } @nrm_rng; @working = grep { $col<=$$_[1] } @working; # Compute result for each list for (@working) { $$_[2] = 3+2*min(abs($col-$$_[0]),abs($col-$$_[1])); if (!defined($arMax) or ($$arMax[2]<$$_[2]) or ($$arMax[2]==$$_[2] and $$arMax[1]<$$_[1]) +) { $arMax=$_; } } $out[$col] = defined($arMax) ? sprintf("% 2u",$$arMax[2]) : ' +1'; # print "$col ", # "wrk: ", join(", ", map { "(".join(",",@$_).")" } @$_, @wo +rking), "\n"; # Remove lists that will no longer give us values @working = grep { $_ eq $arMax or $$arMax[1] < $$_[1] } @worki +ng; } print "FINAL: ", join(' ', @out), "\n"; sub min { my ($l, $r) = (@_); return ($l<$r) ? $l : $r; }

    ...roboticus

    Update:I reformatted one of the lines of code to get rid of an ugly line break.

    Update 2: Code edits: (1) Set ranges as per daverave's request, (2) added debug trace (commented out print), (3) bugfix ($arMax=undef at start of loop), (4) bugfix (list normalization). I didn't replace all the code, so I may have made an error, so let me know if you see something wrong. Currently, though, running it shows me:

    roboticus@Work: /Work/Perl/PerlMonks $ perl 868827_ranges_and_such.pl 1 2 3 4 5 6 7 8 9 10 -- -- -- -- -- -- -- -- -- -- XX XX XX : (1, 3) XX XX XX XX XX XX : (2, 7) XX XX : (0, 2) XX : (10, 11) 5 5 5 7 7 5 3 1 1 3 : FINAL
      Please try running your code with my simple 3 ranges example (and max_length = 10). I think you get different results, but I only had a quick look at your code and perhaps it's related to the fact your coordinates start from 0, not 1 as I mentioned?

      I tried shifting all my ranges by -1 and run your code again, but I still different results:

      my @universe = (0, 9); my @ranges = ( [ 0, 2 ], [ 1, 6 ], [ 9, 1 ] ); FINAL: 3 5 5 7 7 5 3 3 3 3
      Specifically, note that there are coordinates (8 and 9 or 7 and 8 in the shifted version) that are not covered by any range, thus you must have ones in your results.

      Update: I forgot to write I really appreciate you spending time in your lunch break (or any other break...) helping me. Thank you!

      Update following your corrected code: I ran your code against my original version on some relatively small example (max_length = 1M, some 6k ranges of a few k's each). My original version was done in about 3 minutes.

      The good news is the results are the same. The bad news is your version ran for more than 18 minutes. Perhaps all the grepping is expensive? (also remember this array is relatively small, only 6k ranges. I usually have about 25k ranges, which means the grepped arrays will be 4 times larger).

        daverave:

        It doesn't surprise me that it's slower than yours, as I was just trying to make it work first. I'll try to find some time to speed it up a bit. There are several tricks I've been thinking about. First, I'd prune the normalized ranges to remove the redundant ranges. Next, there are cases where you don't have to compute the value for all the entries of @working. Ideally, I'd remove that inner loop.

        ...roboticus