in reply to Formatting clue

If I understand the OP description and sample data correctly, I think toolic's solution with split would need to change slightly -- either here:
my @tokens = split /\t+/; # split on one or more consecutive tabs
or here:
elsif ((scalar @tokens) == 6) { # if there are 6 fields (and 3 ar +e empty)
But of course you wouldn't want to make both changes, because that wouldn't work.

I think it can be risky to base a solution on just two lines of sample data. In a case like this, we can hope that data lines always come in pairs, that each pair always has the same values in the first two columns, that the first of each pair always has 5 adjacent non-empty fields, that the second always has the 2 "repeated" fields, 3 empty fields and "number=\d+" in a 6th field, that there aren't extra spaces next to any of the field-delimiting tabs, and so on. Wouldn't that be nice...

The question is, what sorts of "deviations" from those patterns do you need to worry about, and what should the script do when those sorts of things pop up (as they almost certainly will)? Just guessing:

use strict; use warnings; my @comp; # open FH in some suitable way... while(<FH>) { s/^\s+//; s/\s+$//; my @flds = split( / *\t */ ); # tabs might have spaces around the +m if ( @flds == 5 ) { # presumably first line of pair warn "Input line $. replaces previous first-line data: @comp\n +" if ( @comp ); @comp = @flds; } elsif ( @flds == 6 and $flds[5] =~ /number=(\d+)/ and $flds[0].$flds[1] eq $comp[0].$comp[1] ) { push @comp, $1; print join( "\t", @comp ), "\n"; @comp = (); } else { warn "Input line $. ignored: $_\n"; } }