comment on

baxy77bax:

It took a while to figure out what your algorithm was, but I finally doped it out. It seemed overly complex for what you're doing, though. So I coded it up in a form that's a little easier to understand--at least for me! I don't know if it's any faster than yours, or not, so give it a try and let me know how it looks.

Basically, I just keep a list of active intervals, and increment all of them when I see a '1'. When I hit a column that begins a new interval, I move it from the "waiting" list and into the "current" list. When I hit an end column, I remove all expired lists from the active list. That's pretty much it.

Please let me know if it's any better for you...

$ cat 1032018_d.pl 
#!/usr/bin/perl
use warnings;
use strict;
use Getopt::Long;

my ($i1, $i2, %hash1);

GetOptions ('i=s' => \$i1, 't=s' => \$i2);

open (IN, "<", $i1)|| die "$!";
open (IN1, "<", $i2)|| die "$!";

# Build intervals for each ID
while (<IN1>){
    chomp;
    my ($id, $beg, $end) = split /\s+/,$_;
    push @{$hash1{$id}}, [ $beg, $end, 0 ];
}

close IN1;

my $p =0;
my $id = "";

my @add_intervals;
my @del_intervals;
my @cur_intervals;

#count tags per interval
while(<IN>){
    s/\s+$//;
    #print "$.: $_\n";
    if (/#(.*)/){
        $p=0;
        $id = $1;

        # Intervals for this ID
        @add_intervals = sort {$a->[0] <=> $b->[0]} @{$hash1{$id}};

        # List of interval ends
        my %uniq = map { $_->[1], 0 } @add_intervals;
        @del_intervals = sort { $a <=> $b } keys %uniq;

        # Start with no active intervals
        @cur_intervals = ( );
        next;
    }
    $p++; # sometimes the first column is set to 0

    /^(\d+)\s+(\d+)/ or next;

    # add new intervals that start on this column
    while (@add_intervals and ($p == $add_intervals[0][0])) {
        #print "Adding interval $id:$add_intervals[0][0] .. $add_inter
+vals[0][1]\n";
        push @cur_intervals, shift @add_intervals;
    }

    # Increment all active intervals when we find a hit
    if (@cur_intervals and $2 eq '1') {
        ++$_->[2] for @cur_intervals;
    }

    # remove ranges that end on this column
    if (@del_intervals and $p==$del_intervals[0]) {
        #print "Deleting intervals ending on column $p\n";
        shift @del_intervals;
        @cur_intervals = grep { $_->[1] > $p } @cur_intervals;
    }
}
close IN;

for my $id (sort keys %hash1) {
    for my $interval (@{$hash1{$id}}) {
        print "$id ($interval->[0] .. $interval->[1]) : $interval->[2]
+\n";
    }
}

$ perl 1032018_d.pl -i=1032018.file_2 -t=1032018.file_1
a (12 .. 15) : 2
a (12 .. 17) : 4
a (13 .. 14) : 1
a (14 .. 19) : 5
b (10 .. 15) : 5
b (12 .. 15) : 4
[download]

Note: I think you might mean to use the first capture in the last regex as your column ($p), but I didn't change it.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

In reply to Re: Counter - number of tags per interval by roboticus
in thread Counter - number of tags per interval by baxy77bax

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.