in reply to How keep the count...

Anonymous Monk,
First you have to define a "word". You also need to specify if order of the words is important or what to do if the words appear multiple times in the same line. Do you want min or max offset. This is also not the most efficient way, but as it smells like homework to me as well it will be left as an excersise of the reader to improve upon it.
my @offsets; while ( my $line = <INPUT> ) { chomp $line; my @words = split " ", $line; next if ! grep {$_ eq 'lazy'} @words && ! grep {$_ eq 'dog'} @word +s; my $first; for ( 0 .. $#words ) { my $word = $words[$_]; if ( $word eq 'dog' || $word eq 'lazy' ) { if ( $first ) { push @offsets, $_ - $first + 1; $first = 0; last; } $first = $_; } } } print "The number of matches is : ", scalar @offsets, "\n"; print "The offesets are :\n"; print "$_\n" for @offsets;
Cheers - L~R

Replies are listed 'Best First'.
Re: Re: How keep the count...
by Anonymous Monk on Mar 03, 2004 at 17:00 UTC
    Thank you Monk
    Maybe my question was not clear, What I need to know is how to keep track of those sentences where the offset was x (any 2, 3 ,etc) for example. Should I use hashes and how. Thanks again!
      Anonymous Monk,
      Your insistence on using a hash strengthens my feeling that this is a homework problem. While I do not mind homework problems nearly as much as other monks, if this is homework you should state:
      • That it is homework
      • What the specific requirements are
      • What you have tried so far
      • What you "think" may work but do not know how to code
      Now to answer your questions, there is no need to use a hash. you could change:
      # push @offsets, $_ - $first + 1; # to push @offesets, [ $line , $_ - $first + 1 ]; # and # print "$_\n" for @offsets; # to print "$_->[0] : $_->[1]\n" for @offsets;
      Cheers - L~R
        This is not a "homework", is a project in language generation. There are not specific requirements, we have to find ways to generate natural phrases, idioms, expressions etc. This was one of the ideas: train a large file and find those sentences where the co-occurrence of n words occur in n specific gap. Find the most frequent offset and retrieve the sentence to check in which context they appear. The insistence of hashes is dumb, I just wanted to prove my point... somebody that validate my idea that hash was the only solution..:-(
        Sorry that I did not provide more information before, I only wanted to keep it simple. Thanks