comment on

The data sample that you provided appears to be sorted in ascending order. If it is really the case of your data, then this provides a possible opportunity for vast performance improvements, which you waste if you store the data in a temporary hash (unsorted). The temporary hash is useless anyway and is a waste of time.

Assuming you are first storing your ranges in the form of upper bounds in a sorted array (@ranges), the loop could be very very fast, because you don't need to test each of the 50,000+ ranges, but just one for each input line. Try something like this (untested program, I can't test right now):

my %frequencies;
my $overlap;
foreach my $upper (@ranges) {
    if (defined $overlap) { 
         next unless $overlap <= $upper;
         $frequencies{$upper} ++; # $overlap is within the range
    }
    while (<$IN_FILE>) {
          chomp;
          if ($_ <= $upper) {
               $frequencies{$upper} ++;
          } else {
               $overlap = $_;
               last;
          }
    }
}
[download]

Even if your input file is not sorted, it is very likely that it will be faster to first sort your input and then use the algorithm above than having to check 50,000 ranges for each line (although a dichotomic search would of course reduce this burden).

Update Apr. 8, 16:45 UTC: corrected a small error in the program.

In reply to Re: Perl hit counter for multiple ranges, optimization or options in other languages? by Laurent_R
in thread Perl hit counter for multiple ranges, optimization or options in other languages? by SaktiMonk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.