Wobbel has asked for the wisdom of the Perl Monks concerning the following question:

This is the challenge:
The file contains "pairs" of lines (2D-2D measurements on two lines).
Most -correct- measurements are on one line (column 3, 7, 8 and 9 are unique).
If there is a "match" on column 3, 7, 8 and 9 (name, date, time, machine),
then replace the two lines by one line.
(Print Column 1,2,3... 21, 22, 23 and the modified 10,11,12).
Column 10, 11 (first line) are "Vertical" and "Length".
Column 11, 12 (second line) are "Length" and "Lateral".
I want to "average" the length values of column 11.
What's a good approach? (any comparable examples on Perlmonks?)
The text file contains about 10.000 lines and I don't prefer an (VBA) Excel solution.
  • Comment on Combining matching lines in a TAB seperated textfile

Replies are listed 'Best First'.
Re: Combining matching lines in a TAB seperated textfile
by Eliya (Vicar) on Apr 27, 2011 at 13:36 UTC

    What exactly are you having problems with?  Reading/processing lines in pairs, determining if there is a match, averaging the length values, writing out the resulting lines, or something else?

    The monks are usually more inclined to help if you show some initial effort, i.e. the code you've tried so far.

      Correct!
      But what is the right tool for this problem?
      I've seen fantastic Perl one liners, but probably it's more elegant to work with a "new" technique (I'm afraid of hashes :-) ).
      But serious, I don't mind to learn something new, if I don't waste several days by wandering in the fog.
      It's more "the noble art of programming".
      Show me the way, and I'll try something new!
      (but elegant Perl on liners are always welcome...).
      Summary:
      Skip the correct single lines and recognize the right pairs (match on 4 columns) and transform them to a correct single line (with the right vrt, lng, lat values).
      The output text file contains only correct single lines.

        You still haven't really answered what exactly the problem is, so I'm not going to provide a directly usable solution either :)

        It's okay to be looking for new, elegant, or whatever techniques, but before that keeps you from getting the work done, you could rather start with the basics you're familiar with, and see how far you get...  If you run into a roadblock or feel things are getting unwieldy, you can still look for other more fancy ways around it.

        And if you'd like to know if there's a more idiomatic/elegant/faster/etc. way than what you've eventually come up with, nothing keeps you from presenting your work here and asking for comments.

        That said, here's my take at it as a starting point.  It handles a simplified case (less columns), and as I wasn't entirely sure whether you want to skip or keep the 'single' lines, I chose to pass them through:

        #!/usr/bin/perl -w use strict; use constant { # column indices FOO => 0, BAR => 1, LEN => 2, }; my @col1; # '1st-line-of-pair' buffer while (<DATA>) { # read line chomp; my @col2 = split /\t/; # split line on tabs if (@col1) { # two lines read, i.e. pair available? if ( $col1[FOO] eq $col2[FOO] and $col1[BAR] eq $col2[BAR] ) { # is pair matching? # average length $col1[LEN] = sprintf "%.1f", ($col1[LEN] + $col2[LEN]) / 2 +; write_out(@col1); # write out modified/merged line @col1 = (); # clear buffer next; # skip rest } else { write_out(@col1); # write out non-paired line } } @col1 = @col2; # store line (previous=current) } write_out(@col1) if @col1; # take care of last line sub write_out { print join("\t", @_), "\n"; } __DATA__ abc def 3.5 abc def 4.5 ghi jkl 13.2 mno pqr 2.8 mno pqr 2.4 stu vwx 10.0

        Output:

        abc def 4.0 ghi jkl 13.2 mno pqr 2.6 stu vwx 10.0