Re: Perl Multiline Regex

You need to keep track of previously viewed rows. Assuming the data is sorted by sequence id,

my $last_seq;

while (<>) {
    my ($seq, $p1, $p2) = (split)[0, 2, 3];
    ($p2, $p1) = ($p1, $p2) if $p2 < $p1;

    if (defined($last_seq)) {
        if ($seq eq $last_seq) {
            print(",");
        } else {
            print("\n$seq ");
        }
    }

    print("$p1..$p2");

    $last_seq = $seq;
}

print("\n") if defined($last_seq);
[download]

Update: More like your code and less like your explanation:

sub extractseq {
    my ($seq, $ranges) = @_;

    system(extractseq =>
        "-sequence=$seq.seq",
        "-auto",
        "-stdout",
        "-separate",
        "-reg=" . join(',', map { "$_->[0]..$_->[1]" } @$ranges),
    )
        and die("system: $?/$!\n");
}


my $last_seq;
my @ranges;

while (<>) {
    my ($seq, $p1, $p2) = (split)[0, 2, 3];
    ($p2, $p1) = ($p1, $p2) if $p2 < $p1;

    if (defined($last_seq) && $seq ne $last_seq) {
        extractseq($last_seq, \@ranges);
        @ranges = ();
    }

    $last_seq = $seq;
    push @ranges, [ $p1, $p2 ];
}

extractseq($last_seq, \@ranges) if defined($last_seq);
[download]

Update: Finally, if the input isn't sorted or if you prefer something simpler (at the cost of using more memory),

my %ranges_by_seq;

while (<>) {
    my ($seq, $p1, $p2) = (split)[0, 2, 3];
    ($p2, $p1) = ($p1, $p2) if $p2 < $p1;
    push @{ $ranges_by_seq{$seq} }, [ $p1, $p2 ];
}

for my $seq (keys(%ranges_by_seq)) {
    my $ranges = $ranges_by_seq{$seq};

    system(extractseq =>
        "-sequence=$seq.seq",
        "-auto",
        "-stdout",
        "-separate",
        "-reg=" . join(',', map { "$_->[0]..$_->[1]" } @$ranges),
    )
        and die("system: $?/$!\n");
}
[download]

Comment on Re: Perl Multiline Regex Select or Download Code

Replies are listed 'Best First'.
Re^2: Perl Multiline Regex by joomanji (Acolyte) on May 29, 2009 at 18:24 UTC
Dear Ikegami, Thank you very much for your effort! I really did not expect some one to reply in such a short time and the solution actually worked! I just hit the F5 button and somebody just replied to my question and it was you! Before I could reply you came up with another update! I will definitely learn from the example you gave me and understand it thoroughly and applied on other script as well.I've modified the script and applied on other scripts!! Thank you! But the script you gave me was hang after the first input. Giving me the error message of "system: 0/Bad file descriptor". But when i commented out the line " or die("system: $?/$!\n"); it works just fine! Do you mind to explain this more?	[reply]
Re^3: Perl Multiline Regex by ikegami (Patriarch) on May 29, 2009 at 18:58 UTC
Oops, that should be "`and die`" instead of "`or die`". `system` is unusual in its return value. Fixed.	[reply] [d/l] [select]
Re^4: Perl Multiline Regex by joomanji (Acolyte) on May 29, 2009 at 20:13 UTC
Cool! Thanks! It works perfectly now! Thank you!	[reply]