L~R's suggestions are a good place to start, though there might be some issues, like he doesn't concatenate values from the two files (as you try to do), and there might be some unexpected results if a given "key" string occurs two or more times in one file but not at all in the other file(s).
As for the code you posted, you're working too hard on things that don't need work, and you've made some wrong assumptions about error checking. In particular:
- It looks like you were using ARGV at some point, then changed to hard-coded file names later on. Stick with using ARGV, and get your input file names from the command line. You can also use the redirection operator (>) on the command line to create the output file -- the command-line interpreter (i.e. the shell, whether windows or *nix) will open the output file for you, and the perl script just needs to print to STDOUT. (Also, the "> outfile" part of the command line does not get placed into @ARGV.):
my $Usage = "Usage: $0 file1 file2 > combined.file\n";
die $Usage unless ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] );
my ($inf1,$inf2) = @ARGV;
- You need to check for errors when opening a file. There may be rare cases when you wouldn't want the script to die on an open failure, but even then, it's good to know whether the open succeeded. (In this case, "die" is called for.)
open( IN1, $inf1 ) or die "Can't open $inf1: $!";
- Your warnings about "Bad data on line ..." when split() returns false might not do what you want; split only returns false if its input string is empty or undef; ($x,$y)=split(/:/,'foo',2); returns true, and assigns nothing to $y. The same applies to $x.=$y or warn "..."; assignment, which only returns false if both $x and $y are empty strings. You need to figure out what condition(s) will identify unusable data (or just the usable data), and test for that specifically; e.g.:
my %file1;
while (<IN1>) {
next unless (/.:./); # maybe warnings aren't needed here
chomp;
my ($key,$val) = split( /:/, $_, 2 );
if ( $key =~ /^\s*$/ ) {
warn "$inf1:$.:Empty key field\n";
next;
}
if ( exists( $file1{$key} )) {
warn "$inf1:$.:Duplicate key string\n";
next; # might want to say more about that
}
# might want to check $val too...
# get here if everything's okay:
$file1{$key} = $val;
}
close IN1;
- Your "continue" blocks seem unnecessary; you would reset line numbers at an eof() if you were taking multiple file names from the command line and reading them all in sequence via a single while(<>) loop; but you aren't doing that (neither is my suggestion above).
- When you concatenate $val from the second file onto the value string from the first file, you probably want some sort of separater (space? semicolon? vertical bar?) -- remember that the original colon between key and value was taken away by split:
open( IN2, $inf2 ) or die "Can't open $inf2: $!";
while (<IN2>) {
next unless ( /.:./ );
my ($key,$val) = split( /:/, $_, 2 );
# error checking similar to what was done for $inf1
# ...
# if $key did not occur in $inf1, and if this means that you
# don't want to list it in the final output, then don't put
# it into the hash in the first place
unless ( exists( $file1{$key} )) {
warn "$inf2:$.:Key $key not found in $inf1\n";
next;
}
$file1{$key} .= ";$val";
}
close IN2;
Note that when you redirect STDOUT to a file on the command line, the stuff you print to STDERR will still show up on the terminal (and won't go into the file), which is usually just what you want. If you need to save the warning and error messages in a separate file, some shells (e.g. bash and other "Bourne" variants) let you do this on the command line:
perl_script infile.a infile.b > outfile 2> script.errs
Or you can just
open(STDERR,">script.errs"); at the start of your perl script.