in reply to Re: Merging larges files by columns
in thread Merging larges files by columns

I would suggest sorting file2 by number. unix sort command. There is a Windows version also.

The idea is to have both file1 and file2 in an order such that it is just a stepwise ladder walk. Our mythical ladder walker has one foot on ladder1 and one foot on ladder2 and if a foot moves, it moves upward.

This takes a long time:

foreach line in file1... { loop through lines in file2... { ...many comparisons... decide which one is next } }
A step-wise walker algorithm is a merge of 2 sorted files and is much faster because all that is needed is to decide who goes next, like 2 cars at a stop sign.
foreach line in file1... { # insert file2 line(s) that should go before the # current line from file1 while {the current line in file2 is "next" in total order} { output that line from file2; move to next line in file2 } output the file1 line } output rest of file2 lines (if any)

Replies are listed 'Best First'.
Re^3: Merging larges files by columns
by aaron_baugher (Curate) on Sep 17, 2011 at 11:53 UTC

    That (like most of the answers) assumes that the lines in the second file are numbered consecutively with no numbers missing. I wasn't sure from the question whether that was the case, or if when he said, "The file which I am appending a single column has the line number in the first column," he meant it had the number of the line it needed to be attached to.

    In other words, the second file could be:

    2 red 4 brown 3 blue 6 orange 7 yellow

    If 'yellow' is supposed to be appended to the 7th line of file1, then sorting file2 numerically and stepping through both files equally won't work, because 'lines' 1 and 5 are missing.

    By using the numbers in file2 as keys in a lookup hash, it doesn't matter if file2 is missing a number somewhere. I guess it really just depends on how sure you are that you can count on the numbers truly being line numbers.

      That (like most of the answers) assumes that the lines in the second file are numbered consecutively with no numbers missing.
      Not so. I suggested sorting file 2 so that the numbers are always increasing. That way a simple walker program suffices and could account for missing numbers. The simple sort strategy is often very effective as the system sort command is usually efficient enough - certainly will be a huge improvement upon read file2 N times!!

      In this case, the original poster seems to have disappeared and we are left speculating about requirements that we can't know unless the OP tells us.