sureshsmr has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I just started with perl and seek your advice. I have one large pipe '|' separated text file with many columns. I have another small text file with only one column. I need to write out or extract only those records (lines) from the large file, to a new file, if the value in column 1 matches with the value in column 1 of the smaller file. Here is my code and I am sure it is faulty and so seek your advice :) it is actually returning all lines from File1.txt. But what I need is only those lines where column 1 data is matching. Thanks in advance.

#!c:\perl\bin\perl # Set file paths $FILE1 = "D\:\\Data\\WES\\File1.txt"; $FILE2 = "D\:\\Data\\WES\\File2.txt"; $FILE3 = "D\:\\Data\\WES\\File3.txt"; # print "$FILE1\n"; # print "$FILE2\n"; # print "$FILE3\n"; my %F1hash; open(F1,'<', $FILE1) or die "Can't open $FILE1\n"; open(F2,'<', $FILE2) or die "Can't open $FILE2\n"; open(F3,'>', $FILE3) or die "Can't open $FILE3\n"; print F3 "\n"; # print LOGFILE "$start_date \n"; print F3 "===========================================\n"; print F3 "The following Partnumbers have been synchronized\n"; print F3 "===========================================\n"; print F3 "\n"; my %F1hash = (); while (<F2>) { $ptfkey = $_; $ptfpn = $ptfkey; $ptfpn =~ s/^\s+|\s+$//g; $F1hash{$ptfkey, $ptfpn} = $ptfpn; } close F2; while (<F1>) { #chomp; $uline = $_; @ufields = split(/\|/, $uline); # print "$ufields"; $PartNumber = $ufields[0]; $Std_Cost = $ufields[1]; $Last_Paid_Price = $ufields[2]; $Qty_In_Stock = $ufields[3]; $Moto_Preferred_Part = $ufields[7]; $Rev = $ufields[9]; $Agile_Description = $ufields[10]; if ($PartNumber =~ $Flhash{$PartNumber}) { #print "$ufields[0] \n"; print F3 "$PartNumber|$Std_Cost|$Last_Paid_Price|$Qty_ +In_Stock|$Moto_Preferred_Part|$Rev|$Agile_Description\n"; } } close F1; close F3; exit;

Replies are listed 'Best First'.
Re: Question on file compare
by choroba (Cardinal) on Mar 13, 2012 at 21:11 UTC
    %F1hash is not %Flhash. strict would have told you. Also, $F1hash{$ptfkey, $ptfpn} = $ptfpn; probably does not do what you think - use Data::Dumper to see what your structures contain.
Re: Question on file compare
by Marshall (Canon) on Mar 13, 2012 at 22:39 UTC
    A few comments:
    1) get in the habit of using the Unix "hash bang" line (more portable code). Windows does not use the path at all - so might as well use the Unix path. BUT, Windows does pay attention to the flags, e.g. -w means "use warnings;".

    2) code was a bit overly complicated - a possible re-formulation is below.

    3) A positive comment: the way that you opened all the files at the beginning is a good idea. no need to do a whole bunch of work, only to find out later that for example you can't open the output file..

    Code untested:

    #!/usr/bin/perl -w use strict; my $FILE1 = "D:/Data/WES/File1.txt"; # You can use "/" my $FILE2 = "D:/Data/WES/File2.txt"; # avoids this \\ stuff my $FILE3 = "D:/Data/WES/File3.txt"; open(F1,'<', $FILE1) or die "Can't open $FILE1\n"; open(F2,'<', $FILE2) or die "Can't open $FILE2\n"; open(F3,'>', $FILE3) or die "Can't open $FILE3\n"; #F3 header goes here my %File2Parts; # why call it F1parts? these numbers are # coming from file 2. names matter. while (my $PartNumber = <F2>) { chomp $PartNumber; next if $PartNumber =~ /^\s*$/; # skip blank lines # often a "unseen" trailing blank line # can cause troubles $File2Parts{$PartNumber} = 1; } close F2; while (my $uline = <F1>) { chomp; #you need this when splitting on other than white space my @ufields = split(/\|/, $uline); my ($PartNumber, $Std_Cost, $Last_Paid_Price, $Qty_In_Stock, $Moto_Preferred_Part, $Rev, $Agile_Description) = (@ufields)[0,1,2,3,7,9,10]; if ($File2Parts{$PartNumber}) { print F3 "$PartNumber|$Std_Cost|$Last_Paid_Price|$Qty_In_Stock +|$Moto_Preferred_Part|$Rev|$Agile_Description\n"; } }

      Thanks a million to all for helpful suggestions. Dear Marshall, I made all changes as suggested. I am not getting any errors But looks like the "if ($File2Parts{$PartNumber})" is not working as expected. If I put the "print F3 ..." line outside the if condition it is writing File3. Any ideas? Here is my modified code:

      #!/usr/bin/perl -w use strict; my $FILE1 = "D:/Data/WES/File1.txt"; # You can use "/" my $FILE2 = "D:/Data/WES/File2.txt"; # avoids this \\ stuff my $FILE3 = "D:/Data/WES/File3.txt"; open(F1,'<', $FILE1) or die "Can't open $FILE1\n"; open(F2,'<', $FILE2) or die "Can't open $FILE2\n"; open(F3,'>', $FILE3) or die "Can't open $FILE3\n"; #F3 header goes here my %File2Parts; # why call it F1parts? these numbers are # coming from file 2. names matter. while (my $PartNumber = <F2>) { chomp $PartNumber; next if $PartNumber =~ /^\s*$/; # skip blank lines # often a "unseen" trailing blank line # can cause troubles $File2Parts{$PartNumber} = 1; } close F2; while (my $uline = <F1>) { chomp $uline; #you need this when splitting on other than white s +pace my @ufields = split(/\|/, $uline); my ($PartNumber, $Std_Cost, $Last_Paid_Price, $Qty_In_Stock, $Moto_Preferred_Part, $Rev, $Agile_Description) = (@ufields)[0,1,2,3,7,9,10]; # print F3 "$PartNumber|$Std_Cost|$Last_Paid_Price|$Qty_In_Stock|$ +Moto_Preferred_Part|$Rev|$Agile_Description\n"; if ($File2Parts{$PartNumber}) { print F3 "$PartNumber|$Std_Cost|$Last_Paid_Price|$Qty_In_Stock +|$Moto_Preferred_Part|$Rev|$Agile_Description\n"; } } close F1; close F3;
        Well something is wrong with if ($File2Parts{$PartNumber}) That statement checks for "truthfullness" of the value of that hash key's value.

        Add: use Data::Dumper; at the top of the code. Then before the F1 loop, print Dumper \%$File2Parts; That will show you what is actually in the hash table. Data::Dumper is a "core module" meaning that it is already "pre-installed" in Perl. Look for spaces or other characters that would cause the hash key from file2 to not compare with the PartNumber from file1.

        Of course use some abbreviated files for testing otherwise you will get a lot of output that is not helpful!

Re: Question on file compare
by muppetjones (Novice) on Mar 13, 2012 at 21:28 UTC

    First, you can shorten your code just a bit by removing the use of $_.

    while (my $ptfkey = <F2>)

    The regex where you read in file 2 has a couple of issues. First, it forces start and end of line matching (^ and $, respectively). Next, it only matches the pipe and the whitespace on either side of it, so it will only match lines with ' | \n'. Lastly, it deletes everything that it matches.

    If you only need to identify information in the first column, you only need to do this:

    while (<F2>) { $ptfkey = $_; # you can skip this using above suggestion $ptfpn = $ptfkey; # don't need this if just matching $ptfpn =~ m/^(\w+)/; # if only letters and numbers $ptfpn =~ m/^([\d\-\.]+)/; # if it's a float $F1hash{$1} = 1; }

    Secondly, the line that checks for the value should be throwing you an error:

    if ($PartNumber =~ $Flhash{$PartNumber}) {

    (Or, there are some languages that treat '=~' as a negation, kind of like '!=', so it could just be a typo.)

    All you need to do is check to see if the value exists in the hash we created above:

    if (exists $F1hash{$PartNumber}) {

    Hope this helps.

    Edit: I also just realized you declared %F1hash twice -- you should add the following to your code:

    #!c:\perl\bin\perl -w # note the addition of the -w use diagnostics; use strict;
      The regex where you read in file 2 has a couple of issues. First, it forces start and end of line matching (^ and $, respectively). Next, it only matches the pipe and the whitespace on either side of it, so it will only match lines with ' | \n'. Lastly, it deletes everything that it matches.

      The pipe character, if un-escaped means "or" in a regex. This regex would delete the white space in a line containing only white space - not too useful. I don't see that /g does anything. Not sure what the OP wanted to happen. But you are right, this almost assuredly not it.

Re: Question on file compare
by JavaFan (Canon) on Mar 14, 2012 at 07:49 UTC
    $F1hash{$ptfkey, $ptfpn} = $ptfpn;
    This looks very fishy to me. Are you sure you want to use this as the hash key? The key will be a concatenation of complete line (including the newline), the value of $; (defaults to "\x{1C}"), and then the line again, but with trailing and leading whitespace removed.

    I think you just want $F1hash{$ptfpn} = 1; here.