tux242 has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks, I have a question, and I am a newbie. I was wondering, what is the best way to change this code so that if you do not find the matching key from the first file in the second file do not bother to leave it present in the ongoing iteration, so that when I print the end result out only the matching keys found in the first file that are also in the second file are joined and printed to the endfile. Is this a kind of a null output situation? Thanks.

#!/usr/bin/perl use strict; use warnings; my %file1; my $key; my $value; #usage() unless @ARGV == 0; my $inf1="file2"; open (IN1, "$inf1"); my $output_file = pop @ARGV; # read first file while (<IN1>) { chomp; ($key,$value) = split /:/, $_, 2 or warn "Bad data on line $. in file $ARGV, "; $file1{$key} = $value or warn "Bad data on line $. in file $ARGV, "; } continue { # reset line numbers for warning messages # end loop if ( eof ) # note special form of eof { close IN1; last; } } my $inf2="file3"; open (IN2, "$inf2"); my $outf1=">endfile"; open (OUT1,"$outf1"); # read second file while (<IN2>) { chomp; ($key,$value) = split /:/, $_, 2 or warn "Bad data on line $. in file $ARGV, "; if ( exists( $file1{$key} ) ) { $file1{$key} .= ''. $value or warn "Bad data on line $. in file $ARGV, "; } else { warn "Can\'t find key matching <$key> (line $.) " . "in file <$ARGV>, "; $file1{$key} = $value or warn "Bad data on line $. in file $ARGV, "; } } continue { last if ( eof ) # note special form of eof } #open( OUT, ">", $output_file ) #or die "Error opening $output_file for writing, "; foreach my $k ( sort keys %file1 ) { print OUT1 "$k:$file1{$k}\n"; } close (OUT1);

Edit, BazB: added readmore tag.

Replies are listed 'Best First'.
Re: null output on hashes
by Limbic~Region (Chancellor) on Nov 14, 2003 at 21:18 UTC
    tux242,
    As I explained in the CB, I understand how frustrating it can be just getting started. If you can put in clear and concise terms what you are trying to accomplish, we will be much more able to help you. What I discerned from the CB is this:

    "You are trying to get a list of items that appears in both file1 and file2, concatenate them together, and disregard the rest".

    #!/usr/bin/perl -w use strict; my %data; my @files = qw(file1.dat file2.dat); for my $file ( @files ) { parse_file( $file, \%data ); } for my $key ( grep $data{$_}->[1] > 1 , keys %data ) { print join ':' , $key , $data{$key}->[0]; } sub parse_file { my ($file, $data) = @_; open (INPUT, $file) or die "Unable to open $file : $!"; while ( <INPUT> ) { my ($key, $value) = split /:/ , $_ , 2; $data->{$key}[0] = $value; $data->{$key}[1]++; } }
    Tailor as needed.
    Explanation:

    Cheers - L~R

Re: null output on hashes
by Art_XIV (Hermit) on Nov 14, 2003 at 21:05 UTC

    You could use delete to remove the key/value pairs from your first hash if no match is found in your second file.

    There are probably more idiomatic was of doing so, but you've got to walk before you can run. ;)

    Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"
Re: null output on hashes
by Anonymous Monk on Nov 14, 2003 at 19:20 UTC
    If you want to understand every line of the posted code you will need to learn Perl. The perlfaq2 pod lists several books of a tutorial nature that will help you.
Re: null output on hashes
by graff (Chancellor) on Nov 16, 2003 at 05:45 UTC
    L~R's suggestions are a good place to start, though there might be some issues, like he doesn't concatenate values from the two files (as you try to do), and there might be some unexpected results if a given "key" string occurs two or more times in one file but not at all in the other file(s).

    As for the code you posted, you're working too hard on things that don't need work, and you've made some wrong assumptions about error checking. In particular:

    • It looks like you were using ARGV at some point, then changed to hard-coded file names later on. Stick with using ARGV, and get your input file names from the command line. You can also use the redirection operator (>) on the command line to create the output file -- the command-line interpreter (i.e. the shell, whether windows or *nix) will open the output file for you, and the perl script just needs to print to STDOUT. (Also, the "> outfile" part of the command line does not get placed into @ARGV.):
      my $Usage = "Usage: $0 file1 file2 > combined.file\n"; die $Usage unless ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] ); my ($inf1,$inf2) = @ARGV;
    • You need to check for errors when opening a file. There may be rare cases when you wouldn't want the script to die on an open failure, but even then, it's good to know whether the open succeeded. (In this case, "die" is called for.)
      open( IN1, $inf1 ) or die "Can't open $inf1: $!";
    • Your warnings about "Bad data on line ..." when split() returns false might not do what you want; split only returns false if its input string is empty or undef; ($x,$y)=split(/:/,'foo',2); returns true, and assigns nothing to $y. The same applies to  $x.=$y or warn "..."; assignment, which only returns false if both $x and $y are empty strings. You need to figure out what condition(s) will identify unusable data (or just the usable data), and test for that specifically; e.g.:
      my %file1; while (<IN1>) { next unless (/.:./); # maybe warnings aren't needed here chomp; my ($key,$val) = split( /:/, $_, 2 ); if ( $key =~ /^\s*$/ ) { warn "$inf1:$.:Empty key field\n"; next; } if ( exists( $file1{$key} )) { warn "$inf1:$.:Duplicate key string\n"; next; # might want to say more about that } # might want to check $val too... # get here if everything's okay: $file1{$key} = $val; } close IN1;
    • Your "continue" blocks seem unnecessary; you would reset line numbers at an eof() if you were taking multiple file names from the command line and reading them all in sequence via a single  while(<>) loop; but you aren't doing that (neither is my suggestion above).
    • When you concatenate $val from the second file onto the value string from the first file, you probably want some sort of separater (space? semicolon? vertical bar?) -- remember that the original colon between key and value was taken away by split:
      open( IN2, $inf2 ) or die "Can't open $inf2: $!"; while (<IN2>) { next unless ( /.:./ ); my ($key,$val) = split( /:/, $_, 2 ); # error checking similar to what was done for $inf1 # ... # if $key did not occur in $inf1, and if this means that you # don't want to list it in the final output, then don't put # it into the hash in the first place unless ( exists( $file1{$key} )) { warn "$inf2:$.:Key $key not found in $inf1\n"; next; } $file1{$key} .= ";$val"; } close IN2;

    Note that when you redirect STDOUT to a file on the command line, the stuff you print to STDERR will still show up on the terminal (and won't go into the file), which is usually just what you want. If you need to save the warning and error messages in a separate file, some shells (e.g. bash and other "Bourne" variants) let you do this on the command line:

    perl_script infile.a infile.b > outfile 2> script.errs
    Or you can just  open(STDERR,">script.errs"); at the start of your perl script.