minixman has asked for the wisdom of the Perl Monks concerning the following question:

All
I seem to have a problem reading a log file.
The following code seems to produce more output than i ask for.
my $get = "Starting"; open(FH, "$pd_zvkk") || die ("Unable to open pd_zvkk log file: $pd_ +zvkk : !$ \n"); printf STDERR ("My get = $get \n"); foreach (<FH>) { #printf STDERR if /$get/; chomp(); unless($_ !~ /$get/) { printf STDERR ("String = $_ \n"); } }

Then the output i get when i parse a log file is.
[Mon Nov 14 09:54:54 2005] [error] [client 10.142.204.242] My get = St +arting [Mon Nov 14 09:54:54 2005] [error] [client 10.142.204.242] String = Fr +i Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Starting. [Mon Nov 14 09:54:54 2005] [error] [client 10.142.204.242] My get = [Mon Nov 14 09:54:54 2005] [error] [client 10.142.204.242] String = Fr +i Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Starting. [Mon Nov 14 09:54:54 2005] [error] [client 10.142.204.242] String = Fr +i Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Removing old clientac +c files. [Mon Nov 14 09:54:54 2005] [error] [client 10.142.204.242] String = Fr +i Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Running the stored pr +oc. [Mon Nov 14 09:54:54 2005] [error] [client 10.142.204.242] String = [Mon Nov 14 09:54:54 2005] [error] [client 10.142.204.242] String = PL +/SQL procedure successfully completed.
So it is printout out more data than required, can't seem to tell it when you find the result stop and parse it.

Replies are listed 'Best First'.
Re: regex log file too many results
by tirwhan (Abbot) on Nov 14, 2005 at 10:21 UTC

    First, what's with the double negation?

    unless($_ !~ /$get/) { printf STDERR ("String = $_ \n"); }
    can be written much more clearly as
    if (m/$get/) { printf STDERR ("String = $_ \n"); }

    And then it seems to me like you're setting $get to an empty string (or maybe a whitespace character) somewhere else in your code, because your logfile shows the program matching on that.

    Hope that helps.

    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
      Thanks for the info, i did try that, and i think i found where the problem sort of is.

      This is a extract from a CGI program, so the web interface, reads a log file and parses the info back into html.
      So when i use the following code.
      #!/usr/bin/perl my $get = "Starting"; open(FH, "/home/cuthbe/programs/perlweb/PCAMWebStatus/log/pd_zvkk.log" +) || die ("Unable to open pd_zvkk log file : !$ \n"); printf ("My get = $get \n"); while(<FH>) { chomp(); #next if ($_ !~ /$get/); if($_ =~ m/$get/) { printf ("String found: \"$_\" : \n"); } $ perl test.pl My get = Starting String found: "Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Start +ing." : (cuthbe@ferrari)-(10:27 AM Mon Nov 14)-(cgi-bin) $
      It works perfectly, now this is on the command line.
      When i put this into the cgi code, and get it to read the log file."with replacing the log file for variable"
      open(FH, "$pd_zvkk") || die ("Unable to open pd_zvkk log file: $pd_ +zvkk : !$ \n"); printf STDERR ("My get = $get \n"); while(<FH>) { chomp(); #next if ($_ !~ /$get/); if(m/$get/) { printf STDERR ("String found: \"$_\" : \n"); } } <code> Then i get this in the apache error log.<br> <code> [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] My get = St +arting [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Starting." : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] My get = [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Starting." : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Removing old cl +ientacc files." : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Running the sto +red proc." : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "" : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "PL/SQL procedure successfully completed." : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "" : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "could not open /dev/kbd to get keyboard type US keyboard assumed" + :

      So it seems like when running in a CGI env it reads and prints the whole log file, very strange.

        So it seems like when running in a CGI env it reads and prints the whole log file, very strange.

        I think you're wrong in your conclusion. Take a look through your CGI code. You're setting $get to a different value somewhere in there.


        Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
Re: regex log file too many results
by Aristotle (Chancellor) on Nov 14, 2005 at 10:46 UTC

    $get is somehow being reset; this can’t be happening in the code you show, so you must be omitting things from your script. Once it’s set to an empty string, that will result in an empty pattern, and an empty pattern means “reuse the last successfully matched pattern,” whatever that happens to be.

    Note that you’re using printf like print. Don’t do that, it opens you up to format string vulnerabilities. In Perl, always use print unless you have a specific reason to prefer printf.

    Also, you’re using foreach(<>). Basically, this is always a mistake. You want while(<>) instead. The two loops do nearly the same thing, with the salient difference being that foreach(<>) slurps the entire file into memory at once, whereas while(<>) only nibbles at it as it processes it.

    open(FH, "$pd_zvkk") is problematic. You are using two-argument open without specifying a mode. If $pd_zvkk is user input or derived from it, someone could exploit the lack of explicit mode to overwrite stuff on your server. But even if you specify an explicit mode, two-argument open is problematic; use three-argument open. Also, a lone variable is quoted; that’s unnecessary, and while it’s no harm here other than adding useless noise to your code, in other contexts it will bite you. Don’t do that. So so far, you should write the expression like this: open( FH, '<', $pd_zvkk ). It’d be even better if instead of the global filehandle, you use a lexical one – open( my $fh, '<', $pd_zvkk ) and replace FH in the rest of the code with $fh – for all the reasons for which it is better to use lexical rather than global variables.

    The double negation in unless($_ !~ /$get/) has been commented on.

    Lastly, this is just a hunch, but your code smells like the typical CGI script written without strict and taint mode. If my hunch is right, you should really read Ovid’s excellent CGI course before you shoot yourself in the foot (or rather, someone else does it).

    Makeshifts last the longest.

Re: regex log file too many results
by Moron (Curate) on Nov 14, 2005 at 10:55 UTC
    The missing output for the third line of the given code suggests that the code and output simply don't belong to each other. On that basis no further speculation as to why it doesn't DWIM seems reasonable. Look at it from our point of view -- there could be about anything in the real code/output matching pair.

    -M

    Free your mind

      Thanks for the heads up, seems when called the routing the value i was parsing to regex was not complete, so hence the print of all lines into the output file. Thanks team
Re: regex log file too many results
by GrandFather (Saint) on Nov 14, 2005 at 10:45 UTC

    What do you expect to get? What was the code that produced the actual output? It seems unlikely to be the code shown - how were the time stamps etc. generated? What does the log file data actually look like?

    Try adding a sample of your actual data to the code below (replace the sample stuff following the __DATA__ line) and see if you can reproduce your problem. If so, reply with a better explanation of the problem. If not, tell us what is different about your situation.

    use strict; use warnings; my $get = "Starting"; printf ("My get = $get \n"); foreach (<DATA>) { chomp(); printf ("String = $_ \n") unless($_ !~ /$get/); } __DATA__ Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Starting. Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Removing old clienta +cc files. Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Running the stored p +roc. PL/SQL procedure successfully completed.

    Prints:

    My get = Starting String = Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Starting.

    Perl is Huffman encoded by design.
Re: regex log file too many results
by Cody Pendant (Prior) on Nov 14, 2005 at 10:51 UTC
    die ("Unable to open pd_zvkk log file : !$ \n");

    You won't get much out of

    !$

    I'm afraid,

    $!

    is presumably what you meant.


    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print