L~R's suggestions are a good place to start, though there might be some issues, like he doesn't concatenate values from the two files (as you try to do), and there might be some unexpected results if a given "key" string occurs two or more times in one file but not at all in the other file(s).
As for the code you posted, you're working too hard on things that don't need work, and you've made some wrong assumptions about error checking. In particular:
- It looks like you were using ARGV at some point, then changed to hard-coded file names later on. Stick with using ARGV, and get your input file names from the command line. You can also use the redirection operator (>) on the command line to create the output file -- the command-line interpreter (i.e. the shell, whether windows or *nix) will open the output file for you, and the perl script just needs to print to STDOUT. (Also, the "> outfile" part of the command line does not get placed into @ARGV.):
my $Usage = "Usage: $0 file1 file2 > combined.file\n";
die $Usage unless ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] );
my ($inf1,$inf2) = @ARGV;
- You need to check for errors when opening a file. There may be rare cases when you wouldn't want the script to die on an open failure, but even then, it's good to know whether the open succeeded. (In this case, "die" is called for.)
open( IN1, $inf1 ) or die "Can't open $inf1: $!";
- Your warnings about "Bad data on line ..." when split() returns false might not do what you want; split only returns false if its input string is empty or undef; ($x,$y)=split(/:/,'foo',2); returns true, and assigns nothing to $y. The same applies to $x.=$y or warn "..."; assignment, which only returns false if both $x and $y are empty strings. You need to figure out what condition(s) will identify unusable data (or just the usable data), and test for that specifically; e.g.:
my %file1;
while (<IN1>) {
next unless (/.:./); # maybe warnings aren't needed here
chomp;
my ($key,$val) = split( /:/, $_, 2 );
if ( $key =~ /^\s*$/ ) {
warn "$inf1:$.:Empty key field\n";
next;
}
if ( exists( $file1{$key} )) {
warn "$inf1:$.:Duplicate key string\n";
next; # might want to say more about that
}
# might want to check $val too...
# get here if everything's okay:
$file1{$key} = $val;
}
close IN1;
- Your "continue" blocks seem unnecessary; you would reset line numbers at an eof() if you were taking multiple file names from the command line and reading them all in sequence via a single while(<>) loop; but you aren't doing that (neither is my suggestion above).
- When you concatenate $val from the second file onto the value string from the first file, you probably want some sort of separater (space? semicolon? vertical bar?) -- remember that the original colon between key and value was taken away by split:
open( IN2, $inf2 ) or die "Can't open $inf2: $!";
while (<IN2>) {
next unless ( /.:./ );
my ($key,$val) = split( /:/, $_, 2 );
# error checking similar to what was done for $inf1
# ...
# if $key did not occur in $inf1, and if this means that you
# don't want to list it in the final output, then don't put
# it into the hash in the first place
unless ( exists( $file1{$key} )) {
warn "$inf2:$.:Key $key not found in $inf1\n";
next;
}
$file1{$key} .= ";$val";
}
close IN2;
Note that when you redirect STDOUT to a file on the command line, the stuff you print to STDERR will still show up on the terminal (and won't go into the file), which is usually just what you want. If you need to save the warning and error messages in a separate file, some shells (e.g. bash and other "Bourne" variants) let you do this on the command line:
perl_script infile.a infile.b > outfile 2> script.errs
Or you can just
open(STDERR,">script.errs"); at the start of your perl script.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.