g man has asked for the wisdom of the Perl Monks concerning the following question:

Sorry about the previous post...I am still learning the rules around here
As stated before I want to read in text file, find three items, combine them, and create a code
I am having trouble trying to figure out what's going on with the subroutine, and several other places
my apologies in advance
#! /usr/bin/perl5 require "hash.pl" #! /usr/bin/perl5 $infile="specimen100.txt"; open(IN,"<$infile"); while ($line=<IN> { tr/[A-Z]/[a-z]/; #lowercase all characters s/\s+/ /; #remove extra spaces @sample=split(/[0-9][.], $line); for each (@sample) { $code=&encode($_); last if ($code); } print "code \n"; } sub encode { my(@sample=@_); $site=&findsite($sample); $specimen=&findtissue($sample); $procedure=&findproc($sample); my($code)=&assigncode($site,$specimen,$procedure); $code; } #in the subroutine below, i am trying to match the input with a hash t +able to find longest #common procedure, tissue, or site word #combine all three together to form a new code sub findproc { foreach $key (sort keys %procedure){ if ($_ =~ /$key/){ print "$procedure{$key}\n"; } } sub findtissue { foreach $key (sort keys %specimen){ if ($_ =~ /$key/){ print "$specimen{$key}\n"; } } sub findsite { # ditto for site foreach $key (sort keys %site){ if ($_ =~ /$key/){ print "$site{$key}\n"; } } if ($code) { print STDERR "$code $line"; } else { print $line; } close (IN);

Edit: g0n - fixed code tags and formatting

Replies are listed 'Best First'.
Re: filehandles and such
by turnstep (Parson) on Apr 19, 2000 at 17:37 UTC
    It's tough to figure out what is trying to be done, but here is my stab at a rewrite:
    $infile = shift || die "Need a filename!\n"; open(INFILE, "$infile") || die "Could not open $infile: $!\n"; while(<INFILE>) { chop; tr/A-Z/a-z/; ## lowercase all characters s/\s+/ /g; ## remove extra spaces for $x (split(/\d*\./, $_)) { ## Set each to "0" if not found in the hash: $site = $site{$x} || "0"; $specimin = $specimin{$x} || "0"; $procedure = $procedure{$x} || "0"; ## Do not go on if any were not found: next unless ($site && $specimin && $procedure); ## Grab the results of assigncode: $code=&assigncode($site,$specimen,$procedure); ## Exit while loop if a good $code is found: last if $code; } } close(IN); print "$code " if $code; print "$line\n";
    Notes:
    • You don't need all the subs. Looks like you are coming from a C background?
    • The whitespace substitution needs a 'g' on the end
    • All the things that btrott noted above
    • The subroutines were missing closing brackets
    • Parenthesis on 'my' must be to the left of the equal sign:
      my(@sample)=@_;
    • You don't need to sort they keys if you are just looping through to look for a specific value, unless the matches are more likely to be found at the start of the alphabet
    • Adding a chop is probably a good idea, or you get a newline on the final value returned from the split
    • The above code assumes that you have &assigncode and $line defined elsewhere.
      Clearer Explanation (I hope):
      Given:
      a hash table with %procedure, %specimen, %site
      A text file containing medical reports (one report per line) which contain mulitple procedures or specimens
      Write a perl program that does the following:
      read in one line at a time, find all procedures and associated site and specimen
      if found, give it a code (for now just the words)
      if nothing found on the line, print the line to a different place to keep track off what lines are not being processed
      go to next line
      there may be more than one procedure or specimen per line
      there is a problem in interpeting some data because these words can appear in different order, usage, context (this is an aside)
      a simple way to look for multiple entries at this time is matching for 1) or a) or a: or 1: , can be any number or letter really

      weekly reports are generated, and i took an educated guess about the types of words found and how they occur in these reports
      i want to put these three simple concepts together to create codes, these codes are commonly occurring groups of words which represent some concept
      if this program works correctly, it should identify the type of sample, and disregard "junk"
      if someone comes in asks do you have thus and such, i want to say thus and such (pun intended)

      sorry about the long winded explanation, i am having trouble even writing a basic program which works
      additional complexity will come in trying to match for terms and looking for effective strategies to identify in what order things could appear
      at present, i would just be happy with something that spits out at least the first occurence of the sample with the basic info

      thanks again for your assistance, your code seemed a little easier to follow than what i copied from someone else and did not fully understand
Re: filehandles and such
by btrott (Parson) on Apr 19, 2000 at 10:24 UTC
    You have several problems here. One big one is that I really can't tell what you're trying to do, for one thing. :)

    But as for Perl problems, you've got some things incorrect:

    1. You have "for each (@sample) {". "foreach" is one word, not two.

    2. You have the following code:
      sub encode { my(@sample=@_); $site=&findsite($sample);
      There are some real problems here, the biggest being that @sample is not the same as $sample. So you're passing an undefined argument into each of your subroutines. The other problem is w/ those subroutines...
    3. You're using the return values of the subroutines, but you're not actually returning anything!
    Your code doesn't even compile, so you can't actually be running this. Could you describe more clearly what it is that you're trying to do? Maybe then we could help you out more.