null output on hashes

tux242 has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks, I have a question, and I am a newbie. I was wondering, what is the best way to change this code so that if you do not find the matching key from the first file in the second file do not bother to leave it present in the ongoing iteration, so that when I print the end result out only the matching keys found in the first file that are also in the second file are joined and printed to the endfile. Is this a kind of a null output situation? Thanks.

#!/usr/bin/perl

use strict;
use warnings;

my %file1;
my $key;
my $value;


#usage() unless @ARGV == 0;
my $inf1="file2";
open (IN1, "$inf1");



my $output_file = pop @ARGV;

# read first file
while (<IN1>)
{
    chomp;
    ($key,$value) = split /:/, $_, 2
      or warn "Bad data on line $. in file $ARGV, ";
    $file1{$key} = $value 
      or warn "Bad data on line $. in file $ARGV, ";
}
continue
{
    # reset line numbers for warning messages
    # end loop
    if ( eof ) # note special form of eof
    {
        close IN1;
        last;
    }
}
my $inf2="file3";
open (IN2, "$inf2");
my $outf1=">endfile";
open (OUT1,"$outf1");


# read second file
while (<IN2>)
{
    chomp;
    ($key,$value) = split /:/, $_, 2
      or warn "Bad data on line $. in file $ARGV, ";
    if ( exists( $file1{$key} ) )
    {
        $file1{$key} .= ''. $value
          or warn "Bad data on line $. in file $ARGV, ";
    }
    else
    {
        warn "Can\'t find key matching <$key> (line $.) " 
          . "in file <$ARGV>, ";
        $file1{$key} = $value
          or warn "Bad data on line $. in file $ARGV, ";
    }
}        
continue
{
    last if ( eof ) # note special form of eof
}

#open( OUT, ">", $output_file )
  #or die "Error opening $output_file for writing, ";
  
foreach my $k ( sort keys %file1 )
{
    print OUT1 "$k:$file1{$k}\n";
}
close (OUT1);
[download]

Edit, BazB: added readmore tag.

Comment on null output on hashes Download Code

Replies are listed 'Best First'.
Re: null output on hashes by Limbic~Region (Chancellor) on Nov 14, 2003 at 21:18 UTC
tux242, As I explained in the CB, I understand how frustrating it can be just getting started. If you can put in clear and concise terms what you are trying to accomplish, we will be much more able to help you. What I discerned from the CB is this: "You are trying to get a list of items that appears in both file1 and file2, concatenate them together, and disregard the rest". `#!/usr/bin/perl -w use strict; my %data; my @files = qw(file1.dat file2.dat); for my $file ( @files ) { parse_file( $file, \%data ); } for my $key ( grep $data{$_}->[1] > 1 , keys %data ) { print join ':' , $key , $data{$key}->[0]; } sub parse_file { my ($file, $data) = @_; open (INPUT, $file) or die "Unable to open $file : $!"; while ( <INPUT> ) { my ($key, $value) = split /:/ , $_ , 2; $data->{$key}[0] = $value; $data->{$key}[1]++; } }` [download] Tailor as needed. Explanation: Read more... (416 Bytes) Cheers - L~R	[reply] [d/l]
Re: null output on hashes by Art_XIV (Hermit) on Nov 14, 2003 at 21:05 UTC
You could use `delete` to remove the key/value pairs from your first hash if no match is found in your second file. There are probably more idiomatic was of doing so, but you've got to walk before you can run. ;) Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"	[reply] [d/l]
Re: null output on hashes by Anonymous Monk on Nov 14, 2003 at 19:20 UTC
If you want to understand every line of the posted code you will need to learn Perl. The perlfaq2 pod lists several books of a tutorial nature that will help you.	[reply]
Re: null output on hashes by graff (Chancellor) on Nov 16, 2003 at 05:45 UTC
L~R's suggestions are a good place to start, though there might be some issues, like he doesn't concatenate values from the two files (as you try to do), and there might be some unexpected results if a given "key" string occurs two or more times in one file but not at all in the other file(s). As for the code you posted, you're working too hard on things that don't need work, and you've made some wrong assumptions about error checking. In particular: It looks like you were using ARGV at some point, then changed to hard-coded file names later on. Stick with using ARGV, and get your input file names from the command line. You can also use the redirection operator (>) on the command line to create the output file -- the command-line interpreter (i.e. the shell, whether windows or nix) will open the output file for you, and the perl script just needs to print to STDOUT. (Also, the "> outfile" part of the command line does not get placed into @ARGV.): `my $Usage = "Usage: $0 file1 file2 > combined.file\n"; die $Usage unless ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] ); my ($inf1,$inf2) = @ARGV;` [download] You need to check for errors when opening a file. There may be rare cases when you wouldn't want the script to die on an open failure, but even then, it's good to know whether the open succeeded. (In this case, "die" is called for.) `open( IN1, $inf1 ) or die "Can't open $inf1: $!";` [download] Your warnings about "Bad data on line ..." when split() returns false might not do what you want; split only returns false if its input string is empty or undef; `($x,$y)=split(/:/,'foo',2);` returns true, and assigns nothing to $y. The same applies to `$x.=$y or warn "...";` assignment, which only returns false if both $x and $y are empty strings. You need to figure out what condition(s) will identify unusable data (or just the usable data), and test for that specifically; e.g.: `my %file1; while (<IN1>) { next unless (/.:./); # maybe warnings aren't needed here chomp; my ($key,$val) = split( /:/, $_, 2 ); if ( $key =~ /^\s$/ ) { warn "$inf1:$.:Empty key field\n"; next; } if ( exists( $file1{$key} )) { warn "$inf1:$.:Duplicate key string\n"; next; # might want to say more about that } # might want to check $val too... # get here if everything's okay: $file1{$key} = $val; } close IN1;` [download] Your "continue" blocks seem unnecessary; you would reset line numbers at an eof() if you were taking multiple file names from the command line and reading them all in sequence via a single `while(<>)` loop; but you aren't doing that (neither is my suggestion above). When you concatenate $val from the second file onto the value string from the first file, you probably want some sort of separater (space? semicolon? vertical bar?) -- remember that the original colon between key and value was taken away by split: `open( IN2, $inf2 ) or die "Can't open $inf2: $!"; while (<IN2>) { next unless ( /.:./ ); my ($key,$val) = split( /:/, $_, 2 ); # error checking similar to what was done for $inf1 # ... # if $key did not occur in $inf1, and if this means that you # don't want to list it in the final output, then don't put # it into the hash in the first place unless ( exists( $file1{$key} )) { warn "$inf2:$.:Key $key not found in $inf1\n"; next; } $file1{$key} .= ";$val"; } close IN2;` [download] Note that when you redirect STDOUT to a file on the command line, the stuff you print to STDERR will still show up on the terminal (and won't go into the file), which is usually just what you want. If you need to save the warning and error messages in a separate file, some shells (e.g. bash and other "Bourne" variants) let you do this on the command line: `perl_script infile.a infile.b > outfile 2> script.errs` [download] Or you can just `open(STDERR,">script.errs");` at the start of your perl script.	[reply] [d/l] [select]