in reply to Perl Multiline Regex
You need to keep track of previously viewed rows. Assuming the data is sorted by sequence id,
my $last_seq; while (<>) { my ($seq, $p1, $p2) = (split)[0, 2, 3]; ($p2, $p1) = ($p1, $p2) if $p2 < $p1; if (defined($last_seq)) { if ($seq eq $last_seq) { print(","); } else { print("\n$seq "); } } print("$p1..$p2"); $last_seq = $seq; } print("\n") if defined($last_seq);
Update: More like your code and less like your explanation:
sub extractseq { my ($seq, $ranges) = @_; system(extractseq => "-sequence=$seq.seq", "-auto", "-stdout", "-separate", "-reg=" . join(',', map { "$_->[0]..$_->[1]" } @$ranges), ) and die("system: $?/$!\n"); } my $last_seq; my @ranges; while (<>) { my ($seq, $p1, $p2) = (split)[0, 2, 3]; ($p2, $p1) = ($p1, $p2) if $p2 < $p1; if (defined($last_seq) && $seq ne $last_seq) { extractseq($last_seq, \@ranges); @ranges = (); } $last_seq = $seq; push @ranges, [ $p1, $p2 ]; } extractseq($last_seq, \@ranges) if defined($last_seq);
Update: Finally, if the input isn't sorted or if you prefer something simpler (at the cost of using more memory),
my %ranges_by_seq; while (<>) { my ($seq, $p1, $p2) = (split)[0, 2, 3]; ($p2, $p1) = ($p1, $p2) if $p2 < $p1; push @{ $ranges_by_seq{$seq} }, [ $p1, $p2 ]; } for my $seq (keys(%ranges_by_seq)) { my $ranges = $ranges_by_seq{$seq}; system(extractseq => "-sequence=$seq.seq", "-auto", "-stdout", "-separate", "-reg=" . join(',', map { "$_->[0]..$_->[1]" } @$ranges), ) and die("system: $?/$!\n"); }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Perl Multiline Regex
by joomanji (Acolyte) on May 29, 2009 at 18:24 UTC | |
by ikegami (Patriarch) on May 29, 2009 at 18:58 UTC | |
by joomanji (Acolyte) on May 29, 2009 at 20:13 UTC |