A few general remarks about your code:

#!/usr/bin/perl use warnings; use strict;

Good!

if ($#ARGV != 4) { print "usage: run batch file 'run' not this one\n"; exit; } my $wordfile = $ARGV[0]; my $textfile=$ARGV[3]; my $OutPutFile=$ARGV[4];

It doesn't seem to me that you use @ARGV[1,2] anywhere, and rather than relying on $#ARGV it would be clearer to rely on @ARGV's value in scalar context. Incidentally, I'd keep variable naming consistent and not mix camelcase with different conventions. Also, there's a specialized function for early exit from a program, thus all in all:

die "Usage: run batch file 'run' not this one\n" unless @ARGV==3; my ($wordfile, $textfile, $outfile)=@ARGV;
open (IF1,"$wordfile")|| die "cannot open the file"; open (PF, "$textfile")|| die "cannot open the file"; open (OF,">$OutPutFile")|| die "cannot open the file";

The general rule of a thumb is to use low precedence logical operators for flow control. Also, we generally recommend using lexical filehandles and the three args form of open nowadays, althiough this is sometimes debated. Last, it would be nice to indicate in the error message which file failed to open and why. Oh, and there's no need to quote plain scalars. Thus:

open my $wfh, '<', $wordfile or die "Can't open `$wordfile': $!\n"; open my $tfh, '<', $textfile or die "Can't open `$textfile': $!\n"; open my $ofh, '>', $outfile or die "Can't open `$outfile': $!\n";
my $List1Ref=ReadDataInHash (*IF1);

You only need the to read from the wordfile in that sub, thus I'd pass the filename instead. Also, I see that later on you only dereference $List1Ref, to use it like a regular hash, so why not use a hash from the beginning (and call it sensibly, e.g. %words) instead?

my %words=ReadDataInHash($wordfile);

I'll soon look into ReadDataInHash():

sub ReadDataInHash()

As others wrote, you're indicating an empty prototype, but that

Anyway prototypes are generally discouraged nowadays, except in certain situations. So just don't!

{ my $x = shift; my %list1=(); while (my $line =<$x>) { chomp $line; my @arr=split /\s/,$line; for (my $i=0;$i<=$#arr ;$i++) { if ($i==0) { $list1{$arr[$i]}={}; } else{ ${%{$list1{$arr[0]}}}{$arr[$i]} = 1; } } } return {%list1}; }

IIUC, you read from a space separated file, and you want to create a HoH where the "primary key" is the first word of the line, and the "secondary one" is given the value of each of the remaining ones, and all the values are 1. As a comment to your code, it is at times rather clumsy because of excessive referencing and dereferencing, and you should know that in addition to C-style for loops you have Perl-style ones for iterating over a list. In this case I'd use the latter, for maximum clarity. Taking into account all that has been written above, I'd rewrite the sub like thus:

sub ReadDataInHash { my $file=shift; open my $fh, '<', $file or die "Can't open `$file': $!\n"; my %words; while (<$fh>) { chomp; my ($first, @rest)=split; $words{$first}{$_}=1 for @rest; } %words; }
while (my $line=<PF>) { chomp($line); my @arrAbs=split (/\|/,$line); my $ID=$arrAbs[0]; my $Title=$arrAbs[1]; my $Abs=$arrAbs[2]; @arrAbs=split (/\./,$Abs); print OF"$ID|"; for (my $SentenceNumber=0;$SentenceNumber<=$#arrAbs ;$SentenceNumb +er++) { my $i=$SentenceNumber+1; print OF "<".$i.">"; my $Sentence=$arrAbs[$SentenceNumber]; my @arrAbsSen=split (' ',$Sentence); foreach my $word(@arrAbsSen) { #to match terms in the list, stored in %{$List1Ref}. if (exists(${%{$List1Ref}}{uc($word)})) { print OF "$word "; } else { foreach my $p (sort keys (%{$List1Ref})) { if (exists(${%{${%{$List1Ref}}{$p}}}{uc($word)})) +{ print OF "mainterm:$p:matchedterm:$word "; last; } } } } @arrAbsSen=(); } print OF "\n"; @arrAbs=(); }

Ok, here it seems you want to

  1. split on dots to get a list of "sentences",
  2. print a successive number,
  3. split each "sentence" on whitespace to get words,
  4. compare words with those got from your "words file" and print them if found as the main term or print an informative string if they are a synonim of some main term.

I'm not entering into the details of your code, but

  1. removing unnecessary and not clarifying intermediate variables,
  2. simplifying some referencing-and-dereferencing acrobatics,
  3. removing some calls to exists (because the keys either do not exist or have a true value),
  4. using some lighter looping,

I'd rewrite it like thus:

while (my $line=<$tfh>) { chomp $line; my ($id, $title, $abs)=split /\|/, $line; my @sentences=split /\./, $abs; print $ofh "$id|"; for my $idx (0..$#sentences) { print $ofh '<', $idx+1, '>'; for my $word (split ' ', $sentences[$idx]) { if ($words{uc $word}) { print $ofh "$word " } else { for (sort keys %words) { if ($words{$_}{uc $word}) { print $ofh "mainterm:$_:matchedterm:$word "; last; } } } } } print $ofh "\n"; }

Summing up, all in all I'd rewrite your script as follows:

#!/usr/bin/perl use strict; use warnings; die "Usage: run batch file 'run' not this one\n" unless @ARGV==3; my ($wordfile, $textfile, $outfile)=@ARGV; open my $tfh, '<', $textfile or die "Can't open `$textfile': $!\n"; open my $ofh, '>', $outfile or die "Can't open `$outfile': $!\n"; my %words=ReadDataInHash($wordfile); while (my $line=<$tfh>) { chomp $line; my ($id, $title, $abs)=split /\|/, $line; my @sentences=split /\./, $abs; print $ofh "$id|"; for my $idx (0..$#sentences) { print $ofh '<', $idx+1, '>'; for my $word (split ' ', $sentences[$idx]) { if ($words{uc $word}) { print $ofh "$word " } else { for (sort keys %words) { if ($words{$_}{uc $word}) { print $ofh "mainterm:$_:matchedterm:$word "; last; } } } } } print $ofh "\n"; } sub ReadDataInHash { my $file=shift; open my $fh, '<', $file or die "Can't open `$file': $!\n"; my %words; while (<$fh>) { chomp; my ($first, @rest)=split; $words{$first}{$_}=1 for @rest; } %words; } __END__

As a last note, since you only print to the output file ($ofh in my version of your script) you may have well wanted to select it. OTOH, since this is in your words "a pattern matching exercise", you may have wanted to save yourself the hassle of managing input and output files yourself too, and resort to the ARGV magic filehandle and print to STDOUT, just passing the words file as a first argument. This would simplify your exercise to:

#!/usr/bin/perl use strict; use warnings; die "Usage: $0 <words file> [<file(s)>]\n" if @ARGV < 1; { my %words; my $file=shift; open my $fh, '<', $file or die "Can't open `$file': $!\n"; while (<$fh>) { chomp; my ($first, @rest)=split; $words{$first}{$_}=1 for @rest; } sub matchword { my $word=shift; my $uword=uc $word; return $word if $words{$uword}; $words{$_}{$uword} and return "mainterm:$_:matchedterm:$word" for sort keys %words; return; # nothing if nothing is found } } while (<>) { chomp; my ($id, undef, $abs)=split /\|/; my @sentences=split /\./, $abs; print "$id|"; for (0..$#sentences) { print '<', $_+1, '>', join ' ', map matchword($_), split ' ', $sentences[$_]; } print "\n"; } __END__

Here I didn't write a separate sub for reading from the words file into a hash, but I made a separate one for checking words, so that the main cycles can be simple enough to rely on default $_ without any risk of confusion. I also hid a loop into a map, which makes everything clearer IMHO. (Another minor change is that I removed the $title variable since you don't use it at all.) As a very last note, although many respectable Perl hackers would disagree, I think it would be clearer if instead of iterating over 0..$#sentences, one would maintain a counter him/herself:

while (<>) { chomp; my ($id, undef, $abs)=split /\|/; print "$id|"; my $i; print '<', ++$i, '>', join ' ', map matchword($_), split for split /\./, $abs; print "\n"; }

In reply to Re: Error: "called too early to check prototype" and is word search using nested hash optimal? by blazar
in thread Error: "called too early to check prototype" and is word search using nested hash optimal? by newbio

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.