nisha has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks, I have an issue with regular expressions and matching. I have a text file of the format given below.
C:\EITV10CM\base\repair\Companion_Repair\File2File\F2F.COM ... is OK. C:\EITV10CM\base\repair\Companion_Repair\File2SelfExt\F2SEXT.CMP ... i +s OK. C:\EITV10CM\base\repair\Companion_Repair\File2SelfExt\F2SEXT.COM ... F +ound the CMPAN/File2SelfExt virus !!! The virus has been removed from the file. Checking for another virus in the file ... C:\EITV10CM\base\repair\Companion_Repair\File2SelfExt\F2SEXT.COM ... i +s OK. C:\EITV10CM\base\repair\Companion_Repair\File2SelfFile\F2SFILE.COM ... + Found the CMPAN/FIle2SelfFile virus !!! The virus has been removed from the file. Checking for another virus in the file ... C:\EITV10CM\base\repair\Companion_Repair\File2SelfFile\F2SFILE.COM ... + is OK.
I have to write a perl script which accepts as an input a list of filenames possibly in an arraqy; for every filename in the array; i shud find a match in a text file as pasted above in the sample. There are cases where there can be multiple entries for the same filename in the text file. For example in the above sample file pasted C:\EITV10CM\base\repair\Companion_Repair\File2SelfExt\F2SEXT.COM is listed twice. Ihave to extract the first occurence of the filename in the text file which matches the filename in the array and extract out the remaining statement i.e Found xxxxxxx. I did try a piece of code...but i am not able to match the same. Could you please help me fix this problem. Here is the piece of code that i tried out.
#!/usr/bin/perl #!/usr/bin/perl #Program to read the command line report file; and extract the virus d +etection name for a file. $cmdrep = "d:/eitv10cm_cln.rep"; open FH, "<$cmdrep" or die "Cannot open $cmdrep. \n"; $fname = "C:\EITV10CMD\base\detection\basic_detection\Filename\DOIFFIL +E.COM"; while (<FH>) { chomp; if($_ =~ /^$fname/i) #Instead $fname tried giving C:\EITV10CMD\bas +e\detection\basic_detection\Filename\DOIFFILE.COM. { chomp($_); /\s(\.){3}.*?)$/; #Trying to extract everthing after ... $found = $1; } if ($found eq "is OK.") { $virusname = "OK"; print "The virus detection name is $virusname\n"; } if ($found =~ /^Found: (.*?) NOT a virus[.]/) { $virusname = $1; print "The virus detection name is $virusname\n"; } elsif ($found =~ /^Found the (.*?) (virus|trojan) !!!/) { $virusname = $1; print "The virus detection names is $virusname\n"; #exit(); } elsif ($found =~ /^Found potentially unwanted program (.*?)[.]/) { $virusname = $1; print "PuPs $virusname \n"; } elsif ($found =~ /Found (virus or variant|application) (.*?)( !!!| +[.])/) { $virusname = $2; print "virus or variant $virusname \n \n"; } }
Thanks, Nisha

Replies are listed 'Best First'.
Re: Matching and extracting from file.
by GrandFather (Saint) on Mar 07, 2006 at 09:26 UTC

    Ok, not a bad start. However you really, really, really should use strictures: use strict; use warnings;

    The first problem is that you have provided a directory path string wiht \ characters - in Perl, as in C, those are quote characters and turn the following character into something magical. To get a single \ you need to "quote" it: \\. use strict; would generate the warning: "Unrecognised escape \d passed through".

    The next problem is that /\s(\.){3}.*?)$/ has an extra ) in it. Again strictures would have picked that up for you.

    After that, and having declared your various variables using my, all the static errors are cleaned up. What happens when you run? Now the warnings kick in. The first one is "Unrecognized escape \E passed through in regex; ...". You need to quote the file name that is interpolated directly into the regex: /^\Q$fname\E/i

    Fix that and run again. The next warning is "... Use of uninitialized value in pattern match (m//) ...". Now there is a logic problem reveiled - what happens if the file name doesn't match? (In this case because of a trailing end of line character.) Fix that by changing the first if to skip if no match: next if $_ !~ /^\Q$fname\E/i;.

    Ok, run again - runs with no errors or warnings, that's good. Doesn't produce any output, that's bad. At this point I give up because the file name being matched (C:\\EITV10CMD\\base\\detection\\basic_detection\\Filename\\DOIFFILE.COM) doesn't match any of the data lines. As A quick test I change the file name to something that will match and run again. Still no output.

    Set a breakpoint and trace through with the debugger. Heh, that pesky regex we got the warning about is wrong. I didn't the comment and just removed the "extra" ). What it should look like is /\s\.{3}\s(.*)/. Fix that, run again, now we get some output


    DWIM is Perl's answer to Gödel
      A bunch of thanks :) It worked. Thanks, Nisha
Re: Matching and extracting from file.
by GrandFather (Saint) on Mar 07, 2006 at 06:01 UTC

    You should post the code you have tried then we can see something of how you are thinking and can help you much more than just giving you some sort of canned solution that you may not understand and thatmay not actually work will in your actual situation.

    When you post your code remember to give a sample of the erronious output and a sample of the output you would like to see.

    In the mean time, think hashes. That is, maybe you can slurp the file (if it's not very large) and build a hash using the file name as the key and the extra text as the value. Then processing the array becomes a simple matter of looking up hte file name in the hash and spitting out the result.


    DWIM is Perl's answer to Gödel
      I have updated the post with the script that i tried out. Please help
Re: Matching and extracting from file.
by cognizant (Pilgrim) on Mar 07, 2006 at 09:11 UTC
    Hi Nisha,

    I've made below changes in ur program. Try it. I don't know whether i've understand ur problem correctly or not.

    $fname = "C:\\EITV10CMD\\base\\detection\\basic\_detection\\Filename\\ +DOIFFILE.COM"; while (<FH>) { chomp; my $found; if($_ =~ /^\Q$fname\E/i) { chomp($_); /\s*(\.){3}\s*(.*?)$/; #Trying to extract everthing after ... $found = $2; }

    Cheers

    --C