in reply to Formatting clue
or here:my @tokens = split /\t+/; # split on one or more consecutive tabs
But of course you wouldn't want to make both changes, because that wouldn't work.elsif ((scalar @tokens) == 6) { # if there are 6 fields (and 3 ar +e empty)
I think it can be risky to base a solution on just two lines of sample data. In a case like this, we can hope that data lines always come in pairs, that each pair always has the same values in the first two columns, that the first of each pair always has 5 adjacent non-empty fields, that the second always has the 2 "repeated" fields, 3 empty fields and "number=\d+" in a 6th field, that there aren't extra spaces next to any of the field-delimiting tabs, and so on. Wouldn't that be nice...
The question is, what sorts of "deviations" from those patterns do you need to worry about, and what should the script do when those sorts of things pop up (as they almost certainly will)? Just guessing:
use strict; use warnings; my @comp; # open FH in some suitable way... while(<FH>) { s/^\s+//; s/\s+$//; my @flds = split( / *\t */ ); # tabs might have spaces around the +m if ( @flds == 5 ) { # presumably first line of pair warn "Input line $. replaces previous first-line data: @comp\n +" if ( @comp ); @comp = @flds; } elsif ( @flds == 6 and $flds[5] =~ /number=(\d+)/ and $flds[0].$flds[1] eq $comp[0].$comp[1] ) { push @comp, $1; print join( "\t", @comp ), "\n"; @comp = (); } else { warn "Input line $. ignored: $_\n"; } }
|
|---|