my @words = qw(foo bar);
my @sentences = (
"let's say we have a function called foo",
"I'm going to the bar",
);
for my $word (@words) {
@occurrences = grep /\b$word\b/, @sentences;
print "$word occurs in ", join("\n", @occurrences), "\n";
}
--
:wq | [reply] [d/l] |
Although the previous answers have explained the technique, I think that there are some pitfalls that a full example will show better.
The example below works fine, provided that the sentences are not too many (i.e., if you have enough memory to take them all into memory).
#!/usr/bin/perl -w
use strict;
open WORDS, "< words" or die "cant' open words";
my (@words, @sentences);
while (<WORDS>) {
chomp;
push @words, $_;
}
close WORDS;
open SENTENCES, "< sentences" or die "cant' open sentences";
push @sentences, $_ while <SENTENCES>;
close SENTENCES;
for my $word (@words) {
my @found = grep /\b$word\b/, @sentences;
if (@found) {
print $word, ": \n\t", join "\t", @found;
}
}
__END__
contents of file "words"
-------------------------------
first
second
third
fourth
-------------------------------
contents of file "sentences"
-------------------------------
I am the first
I always wanted to be the first
I never liked to be second
I second your request
I will never appear in the output
Better second than third
-------------------------------
program's output
-------------------------------
first:
I am the first
I always wanted to be the first
second:
I never liked to be second
I second your request
Better second than third
third:
Better second than third
In this example, I have "slurped" into memory all the words and all the sentences. This is due to the requirements that the matching sentences should be shown for each word, and that each sentence could belong to more than one word.
I have the feeling that in a real life situation you could not afford the "slurp" luxury. If this is the case, then you need either a database engine or an algorithm that will read the words first, then store the matching lines as file addresses into a hash, and finally for each word retrieve the matching lines using the stored addresses.
Notice that the if you want to show the results in the opposite way (for each sentence, which words it matches), then you can read all the words (which presumably should fit in memory), do the matching for each sentence you read and print the results immediately.
#!/usr/bin/perl -w
use strict;
open WORDS, "< words" or die "cant' open words";
my (@words, @sentences);
while (<WORDS>) {
chomp;
push @words, [$_, qr/\b$_\b/];
}
close WORDS;
open SENTENCES, "< sentences" or die "cant' open sentences";
while (<SENTENCES>) {
my $printed = 0;
for my $word (@words) {
if (/$word->[1]/) {
print $_ unless $printed++;
print "\t", $word->[0];
}
}
print "\n" if $printed;
}
close SENTENCES;
__END__
program's output:
-------------------------------
I am the first
first
I always wanted to be the first
first
I never liked to be second
second
I second your request
second
Better second than third
second third
In this second script, as an additional measure, I coded the words with the qr operator, which compiles them as regular expressions. So the program will run much faster, since the regex for each word is compiled only once.
Hope these examples give you the elements to solve your problem.
_ _ _ _
(_|| | |(_|><
_|
| [reply] [d/l] [select] |
You could extend this as necessary to accomodate a longer list of words to be tested.
$word = shift;
%sent = (
one => "This is sentence one. I wonder what the word is.",
two => "This would be sentence two.",
thr => "Is this sentence three. I think it is. I sure hope it is.",
);
@matches = grep $sent{$_} =~ /\b$word\b/, keys %sent;
print "The matching sentences are:\n";
for (@matches) {
print; print " => ", $sent{$_}, "\n";
}
Note that grep also makes use of regular expressions(in this case anyway). So realize that it may not have been your regular expressions that weren't working but rather the logic of your code.
Hope this helps ;0)
Amel | [reply] [d/l] |
#!/usr/bin/perl -w
use strict;
my @words = qw(foo bar);
my @lines = ("foo before bar\n",
"the word is foo\n",
"one goes over the mountain\n",
"one reaches the bar\n",
"Here foo is in the middle\n",
"foo on you\n",
"this foobar will be regected\n");
my $istrue = 0;
my $line;
my $word;
foreach $line (@lines) {
$istrue = 0;
foreach $word (@words) {
if ($line =~ /\b$word\b/) {
$istrue = 1;
}
}
if ($istrue) {
print $line;
}
}
John | [reply] [d/l] |