vighneshmufc has asked for the wisdom of the Perl Monks concerning the following question:

I have two files
File1
COA213345
COA213345
COA213445
DOB213345
EOA213345
File2
COA213345|a|b|c|
COA213345|a|b|c|
LOA213345|a|b|c|
kOB213345|a|b|c|
LOA213345|a|b|c|

Now i have to create a file File3 which has Same values from Both Files So the output has to a new file with content
File3
COA213345|a|b|c|
COA213345|a|b|c|
How do i get this ?

Replies are listed 'Best First'.
Re: Filtering Output from two files
by LanX (Saint) on Feb 04, 2018 at 11:12 UTC
      i am actually new to scripting languages i didn't quite follow what you meant in step 2

        pvighneshmufc:

        He meant that the following lines are inside of a loop, like this:

        # read file1 into a %hash ... code to do that here ... # inside a loop while (my $line = <$file2>) { # read file2 line by line ... this was done in the loop condition above ... # split the $line to @fields at | # if the first $fields[0] exists in the %hash, # print the whole $line to file 3 }

        This is a relatively common question, so LanX gave you the outline of a good solution to the problem.

        A frequent mistake is to try to read *both* files inside the loop, giving one of two bad outcomes:

        • Either the first file is completely read in the first pass of the loop, so the code can only find a single match if it happens to be the first line in the second file, or
        • the code re-opens the first file each time, and therefore can find all the matches, but runs extremely slowly(1) because it reads the first file completely for each line in the second file.

        (1) Extremely slowly in the relative sense--for small files you may not notice it. But if your files get large enough, you'll wonder why such a fast computer is so freakin' slow.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

      use strict;
      use warnings;
      use Data::Dumper;
      my $file1 = 'file1';
      my $file2 = 'file2';
      #reading file1 into a hash
      my %hash=();
      open (my $fh,'<',$file2) or die $!;
      while(my $line=<$fh>)
      {
      chomp $line;
      $hash{line}=1;
      print Dumper %hash;
      }
      close $fh;

      #reading file2 line by line
      open (my $fh2,'<',$file1) or die $!;
      while (my $row = <$fh2>) {
      chomp $row;

      my @fields = split(/\|/, $row);
      print $row if exists $hash{$fields[0]};
      }
      close $fh2; ~
        You have at least one bug of forgetting sigil $ in line once, but yes this was the basic idea.

        And dumping inside the loop is costly.

        Furthrmore you might want to use <code> tags next time. :)

        Minor nitpick: when setting value 1, you don't need exists anymore. :)

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Wikisyntax for the Monastery

      #!/nairvigv/bin/perl use strict; use warnings; use Data::Dumper; my $file1 = 'BBIDs.txt'; my $file2 = 'fixedincomeTransparency.out.px.derived.updates'; #reading file1 into a hash my %hash; my @fields; open (my $fh,'<',$file1) or die $!; while(my $line=<$fh>) { chomp $line; next if $line =~ /^\s*$/; $hash{$line}=1; } #print Dumper(\%hash); close $fh; open (my ($fh2),$file2) or die $!; while (my $row = <$fh2>) { chomp $row; print "$row\n"; next if $row =~ /^\s*$/; my (@fields) = split(/\|/, $row); print "$fields[0]\n "; if (exists $hash{$fields[0]}) { print "$row\n"; } } close $fh2; ~
      Hello this works for a program with less lines i.e 10. However when i run it on a big file say about 700k lines nothing happens. Any idea what can be causing this?
Re: Filtering Output from two files
by Marshall (Canon) on Feb 04, 2018 at 22:07 UTC
    LanX++ gave a good algorithm.

    I am not sure if this is homework or not? If so, you should tell us.
    However, I will give you some actual code.
    I process text files frequently - Perl is great at this.
    Skipping blank lines in the input is a normal "reflex reaction" by me and I show a common way to do that.

    #!/usr/bin/perl use warnings; use strict; use Inline::Files; my %File1Hash; while (my $line = <FILE1>) { next if $line =~ /^\s*$/; # skip blank lines $line =~ s/\s*$//; # remove all trailing space, # including the line ending $File1Hash{$line}++; } while (my $line = <FILE2>) { next if $line =~ /^\s*$/; # skip blank lines my ($id) = split /\|/,$line; # get the first field print $line if exists $File1Hash{$id}; } =Prints COA213345|a|b|c| COA213345|a|b|c| =cut __FILE1__ COA213345 COA213345 COA213445 DOB213345 EOA213345 __FILE2__ COA213345|a|b|c| COA213345|a|b|c| LOA213345|a|b|c| kOB213345|a|b|c| LOA213345|a|b|c|
    Update: I read more of the posts in this thread. If file 1 is 700K lines, this should work just fine on a modern computer. My ancient (now dead) XP laptop would have had some issues with a hash of that size due to memory issues. A modern 64 bit computer won't even blink. If there are issues, there are ways to reduce the memory footprint. Let's not go there unless it is necessary.