The data sample that you provided appears to be sorted in ascending order. If it is really the case of your data, then this provides a possible opportunity for vast performance improvements, which you waste if you store the data in a temporary hash (unsorted). The temporary hash is useless anyway and is a waste of time.
Assuming you are first storing your ranges in the form of upper bounds in a sorted array (@ranges), the loop could be very very fast, because you don't need to test each of the 50,000+ ranges, but just one for each input line. Try something like this (untested program, I can't test right now):
my %frequencies; my $overlap; foreach my $upper (@ranges) { if (defined $overlap) { next unless $overlap <= $upper; $frequencies{$upper} ++; # $overlap is within the range } while (<$IN_FILE>) { chomp; if ($_ <= $upper) { $frequencies{$upper} ++; } else { $overlap = $_; last; } } }
Even if your input file is not sorted, it is very likely that it will be faster to first sort your input and then use the algorithm above than having to check 50,000 ranges for each line (although a dichotomic search would of course reduce this burden).
Update Apr. 8, 16:45 UTC: corrected a small error in the program.
In reply to Re: Perl hit counter for multiple ranges, optimization or options in other languages?
by Laurent_R
in thread Perl hit counter for multiple ranges, optimization or options in other languages?
by SaktiMonk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |