in reply to Re: Combining matching lines in a TAB seperated textfile
in thread Combining matching lines in a TAB seperated textfile

Correct!
But what is the right tool for this problem?
I've seen fantastic Perl one liners, but probably it's more elegant to work with a "new" technique (I'm afraid of hashes :-) ).
But serious, I don't mind to learn something new, if I don't waste several days by wandering in the fog.
It's more "the noble art of programming".
Show me the way, and I'll try something new!
(but elegant Perl on liners are always welcome...).
Summary:
Skip the correct single lines and recognize the right pairs (match on 4 columns) and transform them to a correct single line (with the right vrt, lng, lat values).
The output text file contains only correct single lines.
  • Comment on Re^2: Combining matching lines in a TAB seperated textfile

Replies are listed 'Best First'.
Re^3: Combining matching lines in a TAB seperated textfile
by Eliya (Vicar) on Apr 27, 2011 at 16:00 UTC

    You still haven't really answered what exactly the problem is, so I'm not going to provide a directly usable solution either :)

    It's okay to be looking for new, elegant, or whatever techniques, but before that keeps you from getting the work done, you could rather start with the basics you're familiar with, and see how far you get...  If you run into a roadblock or feel things are getting unwieldy, you can still look for other more fancy ways around it.

    And if you'd like to know if there's a more idiomatic/elegant/faster/etc. way than what you've eventually come up with, nothing keeps you from presenting your work here and asking for comments.

    That said, here's my take at it as a starting point.  It handles a simplified case (less columns), and as I wasn't entirely sure whether you want to skip or keep the 'single' lines, I chose to pass them through:

    #!/usr/bin/perl -w use strict; use constant { # column indices FOO => 0, BAR => 1, LEN => 2, }; my @col1; # '1st-line-of-pair' buffer while (<DATA>) { # read line chomp; my @col2 = split /\t/; # split line on tabs if (@col1) { # two lines read, i.e. pair available? if ( $col1[FOO] eq $col2[FOO] and $col1[BAR] eq $col2[BAR] ) { # is pair matching? # average length $col1[LEN] = sprintf "%.1f", ($col1[LEN] + $col2[LEN]) / 2 +; write_out(@col1); # write out modified/merged line @col1 = (); # clear buffer next; # skip rest } else { write_out(@col1); # write out non-paired line } } @col1 = @col2; # store line (previous=current) } write_out(@col1) if @col1; # take care of last line sub write_out { print join("\t", @_), "\n"; } __DATA__ abc def 3.5 abc def 4.5 ghi jkl 13.2 mno pqr 2.8 mno pqr 2.4 stu vwx 10.0

    Output:

    abc def 4.0 ghi jkl 13.2 mno pqr 2.6 stu vwx 10.0
      Wow!
      I think I can see through the mist (sorry, my English is very bad).
      I recognize most of the code, but there is some syntaxis I have to investigate (sprintf "%.1f").
      It's no problem to use 23 # column indices and 10.000 lines?
      I have a "comparable" Perl snippet, that reads a text logfile and generates a html/css page.
      I think the hill is not to steep.
      Thanks for the usefull advice!
      And if it works...
      What kind of construction would a Perl expert use? I'll never be a pro, but I'm eager to learn a little bit more every day! (Wobbel, buy a Navigator...)
        It's no problem to use 23 # column indices and 10.000 lines?

        No problem at all.  As there are no more than two lines of data kept in memory at any point in time, the number of lines is essentially irrelevant. And 23 columns isn't really a lot either... (BTW, note that you only have to give names (constants) to column indices that you actually need to access or modify, like 3,7,8,9 and 10,11,12).

        I'm very close to the solution!
        (But it takes a lot of time...)

        It is about the last three lines.
        With line one and two I'll get the right values. Adding or averaging (/2) results in "0.0".
        sprintf tricks doesn't seem to work.
        Summary:
        How can I add or average two indexed values?
        (I'm new to hashes/arrays etc.)

        my @col1; # '1st-line-of-pair' buffer
        foreach (@unpaired) { # comment
        open(DATA, $_) or die "Couldn't open $_ for reading: $!\n"; # Open file
        chomp($_ = <DATA>); # input regel 1 and remove newline, skip headers
        while (<DATA>) { # read line
        chomp;
        my @col2 = split /\t/; # split line on tabs
        if (@col1) { # two lines read, i.e. pair available?
        if ( $col1[$h{'A'}[0]] eq $col2[$h{'A'}[0]] # match on ID_1
        && $col1[$h{'O'}[0]] eq $col2[$h{'O'}[0]] # match on plan ID
        && $col1[$h{'Q'}[0]] eq $col2[$h{'Q'}[0]] # match on session date
        && $col1[$h{'R'}[0]] eq $col2[$h{'R'}[0]] # match on session time
        ) {
        $col1[$h{'B'}[0]] .= "-".$col2[$h{'B'}[0]]; # Concatenate B for both lines: <val1>-<val2>
        $col1[$h{'D'}[0]] = "Paired_Perl"; # This set is modified by Perl
        $col1[$h{'Y'}[0]] = ($col1[$h{'Y'}[0]]); # correct value
        $col2[$h{'Y'}[0]] = ($col2[$h{'Y'}[0]]); # correct value
        $col1[$h{'Y'}[0]] = ($col1[$h{'Y'}[0]] + $col2[$h{'Y'}[0]]); # adding goes wrong "0.0" ....
        Thanks again!