baxy77bax:

It took a while to figure out what your algorithm was, but I finally doped it out. It seemed overly complex for what you're doing, though. So I coded it up in a form that's a little easier to understand--at least for me! I don't know if it's any faster than yours, or not, so give it a try and let me know how it looks.

Basically, I just keep a list of active intervals, and increment all of them when I see a '1'. When I hit a column that begins a new interval, I move it from the "waiting" list and into the "current" list. When I hit an end column, I remove all expired lists from the active list. That's pretty much it.

Please let me know if it's any better for you...

$ cat 1032018_d.pl #!/usr/bin/perl use warnings; use strict; use Getopt::Long; my ($i1, $i2, %hash1); GetOptions ('i=s' => \$i1, 't=s' => \$i2); open (IN, "<", $i1)|| die "$!"; open (IN1, "<", $i2)|| die "$!"; # Build intervals for each ID while (<IN1>){ chomp; my ($id, $beg, $end) = split /\s+/,$_; push @{$hash1{$id}}, [ $beg, $end, 0 ]; } close IN1; my $p =0; my $id = ""; my @add_intervals; my @del_intervals; my @cur_intervals; #count tags per interval while(<IN>){ s/\s+$//; #print "$.: $_\n"; if (/#(.*)/){ $p=0; $id = $1; # Intervals for this ID @add_intervals = sort {$a->[0] <=> $b->[0]} @{$hash1{$id}}; # List of interval ends my %uniq = map { $_->[1], 0 } @add_intervals; @del_intervals = sort { $a <=> $b } keys %uniq; # Start with no active intervals @cur_intervals = ( ); next; } $p++; # sometimes the first column is set to 0 /^(\d+)\s+(\d+)/ or next; # add new intervals that start on this column while (@add_intervals and ($p == $add_intervals[0][0])) { #print "Adding interval $id:$add_intervals[0][0] .. $add_inter +vals[0][1]\n"; push @cur_intervals, shift @add_intervals; } # Increment all active intervals when we find a hit if (@cur_intervals and $2 eq '1') { ++$_->[2] for @cur_intervals; } # remove ranges that end on this column if (@del_intervals and $p==$del_intervals[0]) { #print "Deleting intervals ending on column $p\n"; shift @del_intervals; @cur_intervals = grep { $_->[1] > $p } @cur_intervals; } } close IN; for my $id (sort keys %hash1) { for my $interval (@{$hash1{$id}}) { print "$id ($interval->[0] .. $interval->[1]) : $interval->[2] +\n"; } } $ perl 1032018_d.pl -i=1032018.file_2 -t=1032018.file_1 a (12 .. 15) : 2 a (12 .. 17) : 4 a (13 .. 14) : 1 a (14 .. 19) : 5 b (10 .. 15) : 5 b (12 .. 15) : 4

Note: I think you might mean to use the first capture in the last regex as your column ($p), but I didn't change it.

...roboticus

When your only tool is a hammer, all problems look like your thumb.


In reply to Re: Counter - number of tags per interval by roboticus
in thread Counter - number of tags per interval by baxy77bax

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.