Searching through a word file using an array element

coldwish has asked for the wisdom of the Perl Monks concerning the following question:

Hello perl experts! I have a problem and i can`t figure out what to do. The problem sounds like: I have an array with a few words. I need to search in a few word files for every word in the array and if the word MATCH i will create a txt log. All good till now but i don`t know why if I have multiple words in the array, my code only find the first element of the array. :(...Here is my code:

use File::Find::Rule;
use Win32;

@cuvintecareconteaza=('BA','test','folk');

my ($sec,$min,$hour,$day,$month,$year) = localtime(time);
my $ymd = sprintf("%04d-%02d-%02d--%02d-%02d",$year+1900,$month+1,$day
+,$hour,$min);
my $outfile = "$ymd.txt";
my $path = "c:/TEST";
my $base_dir ='c:/TEST/';
my $find_rule = File::Find::Rule->new;
my $find_rule1 = File::Find::Rule->new;

# Do not descend past the first level
$find_rule->maxdepth(1);

# Only return directories
$find_rule->directory;
$find_rule1->file;

  #dimensiune fisier de baza
find({follow => 0, wanted => sub {
        $size_total += -s $File::Find::name || 0;
    }}, $base_dir);
          $size_total = sprintf("%.02f",$size_total / 1024 / 1024);
  ###########################

my @files = $find_rule1->in($base_dir);
 open(FILE, ">>$path/$outfile") || die("Couldn't open file");
                                 printf FILE "DIMENSIUNE '$base_dir' =
+ $size_total MB\n";
                 printf FILE "\n";
                 printf FILE "Fisiere care indeplinesc normele cautate
+:\n";
                 printf FILE "\n";
                 printf FILE "\n";

foreach my $fisier (@files)
{
    if ( ($fisier ne ".") and ($fisier ne "..") )
        {
                  use File::Find;
                  my $size=0;
                  find( sub {-f and ($size += -s)},$fisier);
                  $size = sprintf("%.02f",$size / 1024 / 1024);
                  use File::Basename;
              my $filename = basename( $fisier );
                  open(FILE_IN, $fisier) || die("Couldn't open file");
                  @read=<FILE_IN>;

            foreach $cuvintecareconteaza(@cuvintecareconteaza) {
                    chomp @read;
                    Win32::MsgBox("'$cuvintecareconteaza' ---- '$fisie
+r'" ,48,'Alerta');
                   if(grep(/\b$cuvintecareconteaza\b/i,@read))
                         {
                                printf FILE "'$fisier' ---- '$cuvintec
+areconteaza'\n";

                         }
                         else
                           {
                      printf FILE "Sorry, this word was not found in '
+$fisier'\n";
                          }

                 }
                  close(FILE_IN);
        }
}
close(FILE);
[download]

Please help me....

Comment on Searching through a word file using an array element Download Code

Replies are listed 'Best First'.
Re: Searching through a word file using an array element by toolic (Bishop) on Nov 10, 2011 at 23:53 UTC
Are you sure your input files have all the strings you are searching for? Tips... Use warnings and fix the warning message. Basic debugging checklist http://sscce.org	[reply]
Re: Searching through a word file using an array element by johnny_carlos (Scribe) on Nov 11, 2011 at 00:35 UTC
I agree with the previous poster, your code looks more complex than it needs to be. Anyways, this line looks suspicious to me: if(grep(/\b$cuvintecareconteaza\b/i,@read)) Not sure why do you have a matching operator inside a grep? If I were to approach this problem I think I would do something more like(untested): `my $line, $word, @array; while( $line = <FILE> ){ foreach $word ( @array) ){ do_something if( $line =~ /\b$word\b/g ); } }` [download]	[reply] [d/l]
Re: Searching through a word file using an array element by ansh batra (Friar) on Nov 11, 2011 at 06:08 UTC
your code is a little hard to understand!!! please let me know if this solves your problem #! /usr/bin/perl -w use strict; my @arr=('perl','monks','website'); our @files=('file1.txt','file2.txt','file3.txt'); open(LOG,">","output.txt"); my $file; foreach $file(@files) { open(FILE,"<",$file); my @lines=<FILE>; close(FILE); my $count=0; my $line; foreach $line(@lines) { $count++; if($line =~ /.$arr[0]/) { print LOG "$arr[0] found in $file at line number $count\n"; } if($line =~ /.$arr[1]/) { print LOG "$arr[1] found in $file at line number $count\n"; } if($line =~ /.*$arr[2]/) { print LOG "$arr[2] found in $file at line number $count\n"; } } } close(LOG); [download] file1.txt `ansh batra perl postgres apache monks` [download] file2.txt `ansh batra is in perl monks apache postgres` [download] file3.txt `website monks perl website ansh ansh website` [download] output.txt `perl found in file1.txt at line number 3 monks found in file1.txt at line number 6 perl found in file2.txt at line number 1 monks found in file2.txt at line number 1 website found in file3.txt at line number 1 perl found in file3.txt at line number 2 monks found in file3.txt at line number 2 website found in file3.txt at line number 2 website found in file3.txt at line number 5` [download] thanks!!! ansh batra	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Searching through a word file using an array element by coldwish (Initiate) on Nov 12, 2011 at 14:10 UTC
hey... i think it fails because I try searching in .doc MS Word files...please help with an example of how to do it because i cannot find any tutorial on how to find in a ms word file using perl...:(	[reply]
Re^2: Searching through a word file using an array element by afoken (Chancellor) on Nov 12, 2011 at 18:02 UTC
i think it fails because I try searching in .doc MS Word files... Right. The "old" MS-Word files (.doc) are binary garbage, with fragments of the text you have entered. The "new" MS-Word files (.docx) are renamed ZIP files containing several XML files and some helper files. The embedded XML files should be searchable. Basic concept: Open the .docx file as a ZIP file (e.g. using Archive::Zip), extract all .xml files (or, if you know the format better than I currently do [HINT: Use a search engine to find the spec!], just the one that actually contains the text), extract the text from the XML file(s) (e.g. using XML::LibXML, apply the `textContent()` method to the root node), search the extracted text like you would do with a plain text document. The old format should be readable, too. There are several libraries to make sense of the binary garbage, one of them is wv, formerly known as mswordview. Try to find perl bindings for it, or use one of the available command line tool to convert the binary garbage to a readable format (HTML via wvHtml, RTF via wvRtf, plain text via wvText). Then proceed as before. For a grep-like search, plain text is probably the most useful format. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l]