Re^2: Combining matching lines in a TAB seperated textfile

Replies are listed 'Best First'.
Re^3: Combining matching lines in a TAB seperated textfile by Eliya (Vicar) on Apr 27, 2011 at 16:00 UTC
You still haven't really answered what exactly the problem is, so I'm not going to provide a directly usable solution either :) It's okay to be looking for new, elegant, or whatever techniques, but before that keeps you from getting the work done, you could rather start with the basics you're familiar with, and see how far you get... If you run into a roadblock or feel things are getting unwieldy, you can still look for other more fancy ways around it. And if you'd like to know if there's a more idiomatic/elegant/faster/etc. way than what you've eventually come up with, nothing keeps you from presenting your work here and asking for comments. That said, here's my take at it as a starting point. It handles a simplified case (less columns), and as I wasn't entirely sure whether you want to skip or keep the 'single' lines, I chose to pass them through: #!/usr/bin/perl -w use strict; use constant { # column indices FOO => 0, BAR => 1, LEN => 2, }; my @col1; # '1st-line-of-pair' buffer while (<DATA>) { # read line chomp; my @col2 = split /\t/; # split line on tabs if (@col1) { # two lines read, i.e. pair available? if ( $col1[FOO] eq $col2[FOO] and $col1[BAR] eq $col2[BAR] ) { # is pair matching? # average length $col1[LEN] = sprintf "%.1f", ($col1[LEN] + $col2[LEN]) / 2 +; write_out(@col1); # write out modified/merged line @col1 = (); # clear buffer next; # skip rest } else { write_out(@col1); # write out non-paired line } } @col1 = @col2; # store line (previous=current) } write_out(@col1) if @col1; # take care of last line sub write_out { print join("\t", @_), "\n"; } __DATA__ abc def 3.5 abc def 4.5 ghi jkl 13.2 mno pqr 2.8 mno pqr 2.4 stu vwx 10.0 [download] Output: `abc def 4.0 ghi jkl 13.2 mno pqr 2.6 stu vwx 10.0` [download]	[reply] [d/l] [select]
Re^4: Combining matching lines in a TAB seperated textfile by Wobbel (Acolyte) on Apr 27, 2011 at 19:27 UTC
Wow! I think I can see through the mist (sorry, my English is very bad). I recognize most of the code, but there is some syntaxis I have to investigate (sprintf "%.1f"). It's no problem to use 23 # column indices and 10.000 lines? I have a "comparable" Perl snippet, that reads a text logfile and generates a html/css page. I think the hill is not to steep. Thanks for the usefull advice! And if it works... What kind of construction would a Perl expert use? I'll never be a pro, but I'm eager to learn a little bit more every day! (Wobbel, buy a Navigator...)	[reply]
Re^5: Combining matching lines in a TAB seperated textfile by Eliya (Vicar) on Apr 27, 2011 at 19:59 UTC
It's no problem to use 23 # column indices and 10.000 lines? No problem at all. As there are no more than two lines of data kept in memory at any point in time, the number of lines is essentially irrelevant. And 23 columns isn't really a lot either... (BTW, note that you only have to give names (constants) to column indices that you actually need to access or modify, like 3,7,8,9 and 10,11,12).	[reply]
Re^5: Combining matching lines in a TAB seperated textfile by Wobbel (Acolyte) on May 11, 2011 at 11:10 UTC
I'm very close to the solution! (But it takes a lot of time...) It is about the last three lines. With line one and two I'll get the right values. Adding or averaging (/2) results in "0.0". sprintf tricks doesn't seem to work. Summary: How can I add or average two indexed values? (I'm new to hashes/arrays etc.) my @col1; # '1st-line-of-pair' buffer foreach (@unpaired) { # comment open(DATA, $_) or die "Couldn't open $_ for reading: $!\n"; # Open file chomp($_ = <DATA>); # input regel 1 and remove newline, skip headers while (<DATA>) { # read line chomp; my @col2 = split /\t/; # split line on tabs if (@col1) { # two lines read, i.e. pair available? if ( $col1[$h{'A'}[0]] eq $col2[$h{'A'}[0]] # match on ID_1 && $col1[$h{'O'}[0]] eq $col2[$h{'O'}[0]] # match on plan ID && $col1[$h{'Q'}[0]] eq $col2[$h{'Q'}[0]] # match on session date && $col1[$h{'R'}[0]] eq $col2[$h{'R'}[0]] # match on session time ) { $col1[$h{'B'}[0]] .= "-".$col2[$h{'B'}[0]]; # Concatenate B for both lines: <val1>-<val2> $col1[$h{'D'}[0]] = "Paired_Perl"; # This set is modified by Perl $col1[$h{'Y'}[0]] = ($col1[$h{'Y'}[0]]); # correct value $col2[$h{'Y'}[0]] = ($col2[$h{'Y'}[0]]); # correct value $col1[$h{'Y'}[0]] = ($col1[$h{'Y'}[0]] + $col2[$h{'Y'}[0]]); # adding goes wrong "0.0" .... Thanks again!	[reply]