A few general remarks about your code:
#!/usr/bin/perl use warnings; use strict;
Good!
It doesn't seem to me that you use @ARGV[1,2] anywhere, and rather than relying on $#ARGV it would be clearer to rely on @ARGV's value in scalar context. Incidentally, I'd keep variable naming consistent and not mix camelcase with different conventions. Also, there's a specialized function for early exit from a program, thus all in all:if ($#ARGV != 4) { print "usage: run batch file 'run' not this one\n"; exit; } my $wordfile = $ARGV[0]; my $textfile=$ARGV[3]; my $OutPutFile=$ARGV[4];
die "Usage: run batch file 'run' not this one\n" unless @ARGV==3; my ($wordfile, $textfile, $outfile)=@ARGV;
open (IF1,"$wordfile")|| die "cannot open the file"; open (PF, "$textfile")|| die "cannot open the file"; open (OF,">$OutPutFile")|| die "cannot open the file";
The general rule of a thumb is to use low precedence logical operators for flow control. Also, we generally recommend using lexical filehandles and the three args form of open nowadays, althiough this is sometimes debated. Last, it would be nice to indicate in the error message which file failed to open and why. Oh, and there's no need to quote plain scalars. Thus:
open my $wfh, '<', $wordfile or die "Can't open `$wordfile': $!\n"; open my $tfh, '<', $textfile or die "Can't open `$textfile': $!\n"; open my $ofh, '>', $outfile or die "Can't open `$outfile': $!\n";
my $List1Ref=ReadDataInHash (*IF1);
You only need the to read from the wordfile in that sub, thus I'd pass the filename instead. Also, I see that later on you only dereference $List1Ref, to use it like a regular hash, so why not use a hash from the beginning (and call it sensibly, e.g. %words) instead?
my %words=ReadDataInHash($wordfile);
I'll soon look into ReadDataInHash():
sub ReadDataInHash()
As others wrote, you're indicating an empty prototype, but that
Anyway prototypes are generally discouraged nowadays, except in certain situations. So just don't!
{ my $x = shift; my %list1=(); while (my $line =<$x>) { chomp $line; my @arr=split /\s/,$line; for (my $i=0;$i<=$#arr ;$i++) { if ($i==0) { $list1{$arr[$i]}={}; } else{ ${%{$list1{$arr[0]}}}{$arr[$i]} = 1; } } } return {%list1}; }
IIUC, you read from a space separated file, and you want to create a HoH where the "primary key" is the first word of the line, and the "secondary one" is given the value of each of the remaining ones, and all the values are 1. As a comment to your code, it is at times rather clumsy because of excessive referencing and dereferencing, and you should know that in addition to C-style for loops you have Perl-style ones for iterating over a list. In this case I'd use the latter, for maximum clarity. Taking into account all that has been written above, I'd rewrite the sub like thus:
sub ReadDataInHash { my $file=shift; open my $fh, '<', $file or die "Can't open `$file': $!\n"; my %words; while (<$fh>) { chomp; my ($first, @rest)=split; $words{$first}{$_}=1 for @rest; } %words; }
while (my $line=<PF>) { chomp($line); my @arrAbs=split (/\|/,$line); my $ID=$arrAbs[0]; my $Title=$arrAbs[1]; my $Abs=$arrAbs[2]; @arrAbs=split (/\./,$Abs); print OF"$ID|"; for (my $SentenceNumber=0;$SentenceNumber<=$#arrAbs ;$SentenceNumb +er++) { my $i=$SentenceNumber+1; print OF "<".$i.">"; my $Sentence=$arrAbs[$SentenceNumber]; my @arrAbsSen=split (' ',$Sentence); foreach my $word(@arrAbsSen) { #to match terms in the list, stored in %{$List1Ref}. if (exists(${%{$List1Ref}}{uc($word)})) { print OF "$word "; } else { foreach my $p (sort keys (%{$List1Ref})) { if (exists(${%{${%{$List1Ref}}{$p}}}{uc($word)})) +{ print OF "mainterm:$p:matchedterm:$word "; last; } } } } @arrAbsSen=(); } print OF "\n"; @arrAbs=(); }
Ok, here it seems you want to
I'm not entering into the details of your code, but
I'd rewrite it like thus:
while (my $line=<$tfh>) { chomp $line; my ($id, $title, $abs)=split /\|/, $line; my @sentences=split /\./, $abs; print $ofh "$id|"; for my $idx (0..$#sentences) { print $ofh '<', $idx+1, '>'; for my $word (split ' ', $sentences[$idx]) { if ($words{uc $word}) { print $ofh "$word " } else { for (sort keys %words) { if ($words{$_}{uc $word}) { print $ofh "mainterm:$_:matchedterm:$word "; last; } } } } } print $ofh "\n"; }
Summing up, all in all I'd rewrite your script as follows:
#!/usr/bin/perl use strict; use warnings; die "Usage: run batch file 'run' not this one\n" unless @ARGV==3; my ($wordfile, $textfile, $outfile)=@ARGV; open my $tfh, '<', $textfile or die "Can't open `$textfile': $!\n"; open my $ofh, '>', $outfile or die "Can't open `$outfile': $!\n"; my %words=ReadDataInHash($wordfile); while (my $line=<$tfh>) { chomp $line; my ($id, $title, $abs)=split /\|/, $line; my @sentences=split /\./, $abs; print $ofh "$id|"; for my $idx (0..$#sentences) { print $ofh '<', $idx+1, '>'; for my $word (split ' ', $sentences[$idx]) { if ($words{uc $word}) { print $ofh "$word " } else { for (sort keys %words) { if ($words{$_}{uc $word}) { print $ofh "mainterm:$_:matchedterm:$word "; last; } } } } } print $ofh "\n"; } sub ReadDataInHash { my $file=shift; open my $fh, '<', $file or die "Can't open `$file': $!\n"; my %words; while (<$fh>) { chomp; my ($first, @rest)=split; $words{$first}{$_}=1 for @rest; } %words; } __END__
As a last note, since you only print to the output file ($ofh in my version of your script) you may have well wanted to select it. OTOH, since this is in your words "a pattern matching exercise", you may have wanted to save yourself the hassle of managing input and output files yourself too, and resort to the ARGV magic filehandle and print to STDOUT, just passing the words file as a first argument. This would simplify your exercise to:
#!/usr/bin/perl use strict; use warnings; die "Usage: $0 <words file> [<file(s)>]\n" if @ARGV < 1; { my %words; my $file=shift; open my $fh, '<', $file or die "Can't open `$file': $!\n"; while (<$fh>) { chomp; my ($first, @rest)=split; $words{$first}{$_}=1 for @rest; } sub matchword { my $word=shift; my $uword=uc $word; return $word if $words{$uword}; $words{$_}{$uword} and return "mainterm:$_:matchedterm:$word" for sort keys %words; return; # nothing if nothing is found } } while (<>) { chomp; my ($id, undef, $abs)=split /\|/; my @sentences=split /\./, $abs; print "$id|"; for (0..$#sentences) { print '<', $_+1, '>', join ' ', map matchword($_), split ' ', $sentences[$_]; } print "\n"; } __END__
Here I didn't write a separate sub for reading from the words file into a hash, but I made a separate one for checking words, so that the main cycles can be simple enough to rely on default $_ without any risk of confusion. I also hid a loop into a map, which makes everything clearer IMHO. (Another minor change is that I removed the $title variable since you don't use it at all.) As a very last note, although many respectable Perl hackers would disagree, I think it would be clearer if instead of iterating over 0..$#sentences, one would maintain a counter him/herself:
while (<>) { chomp; my ($id, undef, $abs)=split /\|/; print "$id|"; my $i; print '<', ++$i, '>', join ' ', map matchword($_), split for split /\./, $abs; print "\n"; }
In reply to Re: Error: "called too early to check prototype" and is word search using nested hash optimal?
by blazar
in thread Error: "called too early to check prototype" and is word search using nested hash optimal?
by newbio
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |