Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

Yesterday I saw a question from gzayzay that I took interest in. I had been fumbling around to figure out if it was possible to search the data for files that contained two or more of the words entered.

print "\nEnter your words (separated by spaces): >> "; chomp(my $line = <STDIN>); my @words = split /\s+/, $line; foreach $word(@words) { if() { #Set a condition to match more than one word, i.e to match combinat +ion of multiple words } else { open(FH, $kwords)&& open(MN, ">>$sResult") || die ("Cannot open [$k +words,$sResult], $!\n"); print RESULT "==============================\n"; print RESULT "\t ", uc($word),"\n"; print RESULT "==============================\n"; { local $/=undef; print RESULT grep { $_ =~ /$word/i } <SEARCH> =~ m!(<MS_\d+>.*?</ +MS_\d+>)!gs; } print MN "\n"; close MN; close FH; } __DATA__ <MS_1> <loc>c:\data\cat.xml</loc> <words>dog, cat, fish, bird</words> </MS_1> <MS_2> <loc>c:\data\cow.xml</loc> <words>dog, cat, fish, bird, cow, goat</words> </MS_2> <MS_3> <loc>c:\data\snake.xml</loc> <words>dog, cat, fish, bird, snake, orange</words> </MS_3>

As the code is above, suppose there is another file having combination of the user input. if the user enters cat, eagle and a file MS_4 has,

<MS_4> <loc>c:\data\fat.xml</loc> <words>snail, cat, eagle, fly, chicken, elephant</words> </MS_4>

Can any Monk help with a code that will promt the user that a file contain both or all of the input and continue to find other files that have only one of the inputs? I thought this was a good try but it is giving me a great day.

Happy coding to all Monks attempting....

2006-04-07 Retitled by g0n, as per Monastery guidelines
Original title: 'Addition to gzayzay question'

Replies are listed 'Best First'.
Re: Finding some/all of given words in a file
by eff_i_g (Curate) on Apr 06, 2006 at 20:58 UTC
    Note: This isn't Perl, but still related.

    The other day I was trying to figure out how to do this with find on Unix, and tye informed me of the method below:
    find . -name 'filename' -exec sh -c 'grep string1 $0 | grep string2' { +} \; -print
    You could also use egrep, or any other command that would do the trick.

    Update: This requires that the two strings be on the same line.
Re: Finding some/all of given words in a file
by InfiniteSilence (Curate) on Apr 06, 2006 at 20:33 UTC
    I am guessing that you are looking for code that will print entries that contain a certain, minimum, number of matches. If so:
    use strict; use Data::Dumper; my @searchwords = qw|Bird SnaKe|; $|++; print "MINIMUM NUMBER OF MATCHES PLEASE: "; my $minMatches; $minMatches = <STDIN>; chomp($minMatches); use XML::Simple; undef $/; open(H, qq|oo.dat|) or die $!; my $filestuff = <H>; close(H); #print $filestuff; my $newXML = XMLin($filestuff); my $ref = $newXML; #stripe your values foreach my $word (@searchwords) { foreach(keys %$ref){ if (index($ref->{$_}->{words},lc($word)) > 0){ $ref->{$_}->{confidence}++; push @{$ref->{$_}->{WORDSFOUND}}, $word; } } } #print the ones you like foreach(keys %$ref){ my $outline = qq|FOUND @{$ref->{$_}->{WORDSFOUND}} in $_\t| . $ref- +>{$_}->{words} . qq|\n|; print $outline if ($ref->{$_}->{confidence} >= $minMatches); }
    Prints:
    perl useoo.pl MINIMUM NUMBER OF MATCHES PLEASE: 1 FOUND Bird in MS_1 dog, cat, fish, bird FOUND Bird in MS_2 dog, cat, fish, bird, cow, goat FOUND Bird SnaKe in MS_3 dog, cat, fish, bird, snake, orange C:\Temp>perl useoo.pl perl useoo.pl MINIMUM NUMBER OF MATCHES PLEASE: 2 FOUND Bird SnaKe in MS_3 dog, cat, fish, bird, snake, orange

    Celebrate Intellectual Diversity

    A reply falls below the community's threshold of quality. You may see it by logging in.