smbs has asked for the wisdom of the Perl Monks concerning the following question:

I have 2 text files file "A" has about 1000 lines. File "B" has more than 100,000 lines (size about 5mega). Contents of file "A" is something like the following -Comes in pairs (2 lines) every 2 lines is a pair in my explanation
abc
123
cfg
12345
fghh
1aaaaa
Missing in file "B" is each pairs first line - I want to insert into file "B" (or create third file-might be even better!) missing line above matching second line. Pls note lines may appear many times. I tried and have a very "newbie file" (not for all to see) which works but takes more than an hour to finish due to brute-force loop I use!! Thanx

Replies are listed 'Best First'.
Re: Insert row from file
by Gilimanjaro (Hermit) on Dec 31, 2004 at 11:16 UTC
    Untested code below, but I think it should do the trick...
    # First build a hash we can use to look up lines (keyed on 2nd line) my %pairs; open A, "<A"; while(<A>) { $pairs{<A>}=$_ } close A; # We use a third file open C, ">C"; open B, "<B"; while(<B>) { # If this line matches a known 2nd line, insert it's 1st print C $pairs{$_} if exists $pairs{$_}; # Always print the original line print C $_; } close B; close C;
      Also to add to Gilimanjaro's post, easy way to overwrite the original is by using rename.
      rename (C,B);
      Although when overwriting old files, be sure you are doing the right thing as returning to old might prove difficult. This depends on where the original file came from of course.
      If u have the patience --can u explain how it works!!!
      Thanx
      Smbs
      Thanx work just great
      Smbs
      At last I more or less understand it --but you mentioned in first line comment "(keyed on 2nd line)" --how is this done -the file was read in and every 2nd line is the key --is this some sort of default-what if I wanted every odd line (lines 1,3,5 etc) to be the keys??
      Thanx
      Smbs

        It works because of the way the while loop works: The conditional in the while loop always gets evaluated first, before the content of the loop block. The <A> operator, when used in a while conditional, assigns the line gotten from file A to $_, starting with the 1st line.

        The body of the loop, again uses the <A> operator, which gets the next line; you can access a line only once using the <> operator. I use it directly as the key to the %pairs hash, so the key is the 2nd line. It assigns the value of $_ (which was set in the while conditional) to this element of %pairs.

        The loop evaluates the conditional again, gets the 3rd line, and assigns it to $_, and the body does the same all over again with the fourth line. So the even/odd mechanism is caused by the double use of <A>.

        If I wanted to key on the odd lines, I'd've used $pairs{$_}=<A>;

Re: Insert row from file
by BrowserUk (Patriarch) on Dec 31, 2004 at 10:54 UTC
    I want to insert into file "B" (...) missing line above matching second line

    Could you clarify that?

    It's not at all clear (to me?) what want to do here:

    1. Is it the first or second of each pair that must be inserted into the second file?
    2. Where do you want the new values added to the second file?

      At the end? At some specific position?

    3. How do you decide which values need adding?
    4. Pls note lines may appear many times.

      Which lines appear many times in which file?

    As you can see, your description left me very confused. Often in these cases the best expaination is a simple example?


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
      Explanation by example
      File "a"
      abc
      123
      cfg
      12345
      fghh
      1aaaaa


      file "b"
      ssssss
      ffff
      33
      123
      gggggkl
      ffggg
      1aaaaa

      file "c" shoud read
      ssssss
      ffff
      33
      abc ----- note this inserted because "123" found in files "A","B"
      123
      gggggkl
      ffggg
      fghh ---- note this inserted because 1aaaaa found in files "A","B"
      1aaaaa
      Thanx
      Smbs
Re: Insert row from file
by TedPride (Priest) on Jan 01, 2005 at 00:20 UTC
    This actually turned out almost exactly like Gilimanjaro's solution, but his is more efficient (if it works). I've included mine only because it may be more clear, and because it's tested.
    use strict; use warnings; my ($in, $out, $key, $val, %hash); my $goodf = 'inp1.txt'; my $badf = 'inp2.txt'; my $outf = 'out.txt'; open($in, $goodf); while ($val = <$in>, $key = <$in>) { chomp($val, $key); $hash{$key} = $val; } close($in); open($in, $badf); open($out, ">$outf"); while ($key = <$in>) { chomp($key); print $out "$hash{$key}\n" if exists $hash{$key}; print $out "$key\n"; } close($in); close($out);