Combining matching lines in a TAB seperated textfile

Wobbel has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Combining matching lines in a TAB seperated textfile by Eliya (Vicar) on Apr 27, 2011 at 13:36 UTC
What exactly are you having problems with? Reading/processing lines in pairs, determining if there is a match, averaging the length values, writing out the resulting lines, or something else? The monks are usually more inclined to help if you show some initial effort, i.e. the code you've tried so far.	[reply]
Re^2: Combining matching lines in a TAB seperated textfile by Wobbel (Acolyte) on Apr 27, 2011 at 14:04 UTC
Correct! But what is the right tool for this problem? I've seen fantastic Perl one liners, but probably it's more elegant to work with a "new" technique (I'm afraid of hashes :-) ). But serious, I don't mind to learn something new, if I don't waste several days by wandering in the fog. It's more "the noble art of programming". Show me the way, and I'll try something new! (but elegant Perl on liners are always welcome...). Summary: Skip the correct single lines and recognize the right pairs (match on 4 columns) and transform them to a correct single line (with the right vrt, lng, lat values). The output text file contains only correct single lines.	[reply]
Re^3: Combining matching lines in a TAB seperated textfile by Eliya (Vicar) on Apr 27, 2011 at 16:00 UTC
You still haven't really answered what exactly the problem is, so I'm not going to provide a directly usable solution either :) It's okay to be looking for new, elegant, or whatever techniques, but before that keeps you from getting the work done, you could rather start with the basics you're familiar with, and see how far you get... If you run into a roadblock or feel things are getting unwieldy, you can still look for other more fancy ways around it. And if you'd like to know if there's a more idiomatic/elegant/faster/etc. way than what you've eventually come up with, nothing keeps you from presenting your work here and asking for comments. That said, here's my take at it as a starting point. It handles a simplified case (less columns), and as I wasn't entirely sure whether you want to skip or keep the 'single' lines, I chose to pass them through: #!/usr/bin/perl -w use strict; use constant { # column indices FOO => 0, BAR => 1, LEN => 2, }; my @col1; # '1st-line-of-pair' buffer while (<DATA>) { # read line chomp; my @col2 = split /\t/; # split line on tabs if (@col1) { # two lines read, i.e. pair available? if ( $col1[FOO] eq $col2[FOO] and $col1[BAR] eq $col2[BAR] ) { # is pair matching? # average length $col1[LEN] = sprintf "%.1f", ($col1[LEN] + $col2[LEN]) / 2 +; write_out(@col1); # write out modified/merged line @col1 = (); # clear buffer next; # skip rest } else { write_out(@col1); # write out non-paired line } } @col1 = @col2; # store line (previous=current) } write_out(@col1) if @col1; # take care of last line sub write_out { print join("\t", @_), "\n"; } __DATA__ abc def 3.5 abc def 4.5 ghi jkl 13.2 mno pqr 2.8 mno pqr 2.4 stu vwx 10.0 [download] Output: `abc def 4.0 ghi jkl 13.2 mno pqr 2.6 stu vwx 10.0` [download]	[reply] [d/l] [select]
Re^4: Combining matching lines in a TAB seperated textfile by Wobbel (Acolyte) on Apr 27, 2011 at 19:27 UTC
Re^5: Combining matching lines in a TAB seperated textfile by Eliya (Vicar) on Apr 27, 2011 at 19:59 UTC
Re^5: Combining matching lines in a TAB seperated textfile by Wobbel (Acolyte) on May 11, 2011 at 11:10 UTC