sandy1028 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have two file, text1.txt and text2.txt. text1.txt is the tab delimited file which has two columns.
2343/45/45/cal/ca-1.xml 2343/45/45/ca-1 6534/534/34/car/ca-5.xml 6534/534/34/ca-5
and text2.txt is a file which contains only single column
6534/534/34/ca-5 5676/435/734/da-1
How to read two files simultaneously and print the line of text1.txt in which the string is not present in text2.txt. In the above example, the string '6534/534/34/ca-5' matches with text1.txt of column2, so I have to print the line '2343/45/45/cal/ca-1.xml 2343/45/45/ca-1' as the string '2343/45/45/ca-1' doesnot match in text1.txt. Please help me. Thanks in advance

Replies are listed 'Best First'.
Re: Read two files and print
by GrandFather (Saint) on Feb 26, 2009 at 04:25 UTC

    Trying to process the two files in parallel is a bad idea. Assuming that text2 is not huge, read it first and create a hash entry for each file entry.

    Then read text1 and use the hash to check to see if there is a matching entry in text2:

    use strict; use warnings; my $text1 = <<'END_TEXT'; 2343/45/45/cal/ca-1.xml 2343/45/45/ca-1 6534/534/34/car/ca-5.xml 6534/534/34/ca-5 END_TEXT my $text2 = <<'END_TEXT'; 6534/534/34/ca-5 5676/435/734/da-1 END_TEXT my %text2Entries; # Build a hash table of text2 entries open my $inFile, '<', \$text2 or die "Unable to read text2: $!\n"; while (<$inFile>) { chomp; ++$text2Entries{lc $_}; } close $inFile; # Read the text1 entries and print any that don't have a text2 entry open $inFile, '<', \$text1 or die "Unable to read text1: $!\n"; while (my $line = <$inFile>) { my ($part1, $part2) = split /\s+/, $line; chomp $part2; next if ! exists $text2Entries{lc $part2}; print $line; } close $inFile;

    Prints:

    6534/534/34/car/ca-5.xml 6534/534/34/ca-5

    Don't get hung up on the two $text strings - that's just so I can provide a runnable test script without requiring external files.


    True laziness is hard work
      This is a great idea. I would just add a few comments that might clarify a few things for previous posts.

      1. split /\s+/, $line; splits on any whitespace character, this includes space,\f,\r,\n,\t. Since \n is in this set, you don't need to chomp($part2); doesn't hurt but it is not necessary here. The reason in previous post that "\t" didn't work is that you need a regex for the first arg to split./\t/ would have worked but /\s+/ is usually better. The \t idea would result in a \n in $part2 and of course since you can't see these non-printing characters it is possible that there is are some plain spaces in there!

      2.The best way to get the 2nd thing from the split is with list slice. my $part2 = (split /\s+/, $line)[1]; Since you don't use $part1, there is no need to assign it. It often occurs that you are working with a line with a bunch of things on it and you just want a couple of them. Using list slice allows you to assign meaningful names to these things like maybe: my($temperature,$city)=(split /\s+/,$line)[3,8];. This is a lot better than say, $line[3] because you don't need any comments to explain that thing 3 means temperature.

      Of course here the op probably has some other name in mind for $part2 that would make the code even more clear.

        The files are very huge. I tried something like
        open FH, '<file1.txt'; @data = <FH>; open FH1, '<file2.txt'; @data1=<FH1>; my $text1 = <<END_TEXT; @data END_TEXT my $text2 = <<END_TEXT1; @data1; END_TEXT1
        @data inside <<END_TEXT prints only one row. How can I print entire array inside <<END_TEXT
Re: Read two files and print
by hbm (Hermit) on Feb 26, 2009 at 03:18 UTC

    How about this:

    use strict; use warnings; my %required; open(IN,"<","text2.txt") or die; while(<IN>){ chomp; $required{$_}++ } close IN; open(IN,"<","text1.txt") or die; while(<IN>){ if (/\S+\s+(\S+)/ && !exists $required{$1}) { print "$1 doesnot match in text1.txt. Please help me.\n" } } close IN;
      Hi, How to extract only second field from the text1.txt
      open(FILE1, 'text1.txt'); open(FILE2, 'text2.txt'); while ($line = <FILE1>) { @lines=split("\t",$line); print $lines[1]; if($lines[1] != <FILE2>){ # print "$line"; } }
      I tried something like this. But split doesnot works here. How can I check <FILE2> with second column of text1.txt
        Hi,
        Try this..
        open(FILE1, 'File1.txt'); open(FILE2, 'File2.txt'); while ($line = <FILE1>) { @lines=split(' ',$line); print $lines[1]; if($lines[1] != <FILE2>){ print "$line"; } }

        In split use space as a delimiter, that will exactly splits the line.