> sorry for deleting my ps comment , i thought it is completely irrelevant since i found "readmore" tags and since it wasn't related to the initially problem. ok these are some data from some simulation experiments. there is only one file.1 and only one file.2. sizes: there ate approximately 3 mil intervals specified in file.1 and about 50 distinct id's and each id in file.2 has approximately 5-7 * 10^9 lines. cheers
thats a lot
> ps: also the clustering of intervals cannot be assumed.
so you are not restricted to 1..21 !?!
Well one fast way to go is to prepare and count in "atomic intervals", that are intervals whose elements always appear together, i.e. are never divided by other intervals.
for instance:
a1 12 15 a2 12 17 a3 13 14 a4 14 19
so atomic intervals are [12],[13],[14],[15],[16..17],[18..19]
Since they are disjunct you don't need nested loops to count there appearance. After reading all data for an id, you can calculate the real intervals by summing the atomics up, e.g. $sum_a3 = sum @count[14,15,16,18] (intervals identified by starting point)
This looks silly for the sample data you provided because its already very dense, you're the one who can judge...
Another - similar - approach is to precalculate "minimal hulls" and "implications".
12 => a1 => a2 13 => a3 => a1 14 => [14] => a3,a4 ...
the hull of a point is found by cutting all ranges including that point, and often a hull is already identical to a range. (e.g. 18 => a4 )
Following the implications results in a minimum number off add-operations needed!
Both algorithms might look overly complicated but they scale very well, no matter if ranges are dense or sparse.
A simplified (but less efficient) version of above algorithms was already shown with counting points and summing over array-slices.
If intervals are sparse, i.e. they don't cover many potential points, then rather count in a hash and sum over a hash-slice:
$sum{a1} = sum @count{12..15}
... is left for exercise! =)
Cheers Rolf
( addicted to the Perl Programming Language)
In reply to Re: Counter - number of tags per interval
by LanX
in thread Counter - number of tags per interval
by baxy77bax
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |