regex log file too many results

minixman has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: regex log file too many results by tirwhan (Abbot) on Nov 14, 2005 at 10:21 UTC
First, what's with the double negation? `unless($_ !~ /$get/) { printf STDERR ("String = $_ \n"); }` [download] can be written much more clearly as `if (m/$get/) { printf STDERR ("String = $_ \n"); }` [download] And then it seems to me like you're setting `$get` to an empty string (or maybe a whitespace character) somewhere else in your code, because your logfile shows the program matching on that. Hope that helps. Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan	[reply] [d/l] [select]
Re^2: regex log file too many results by minixman (Beadle) on Nov 14, 2005 at 10:41 UTC
Thanks for the info, i did try that, and i think i found where the problem sort of is. This is a extract from a CGI program, so the web interface, reads a log file and parses the info back into html. So when i use the following code. `#!/usr/bin/perl my $get = "Starting"; open(FH, "/home/cuthbe/programs/perlweb/PCAMWebStatus/log/pd_zvkk.log" +) \|\| die ("Unable to open pd_zvkk log file : !$ \n"); printf ("My get = $get \n"); while(<FH>) { chomp(); #next if ($_ !~ /$get/); if($_ =~ m/$get/) { printf ("String found: \"$_\" : \n"); } $ perl test.pl My get = Starting String found: "Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Start +ing." : (cuthbe@ferrari)-(10:27 AM Mon Nov 14)-(cgi-bin) $` [download] It works perfectly, now this is on the command line. When i put this into the cgi code, and get it to read the log file."with replacing the log file for variable" open(FH, "$pd_zvkk") \|\| die ("Unable to open pd_zvkk log file: $pd_ +zvkk : !$ \n"); printf STDERR ("My get = $get \n"); while(<FH>) { chomp(); #next if ($_ !~ /$get/); if(m/$get/) { printf STDERR ("String found: \"$_\" : \n"); } } <code> Then i get this in the apache error log.<br> <code> [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] My get = St +arting [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Starting." : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] My get = [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Starting." : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Removing old cl +ientacc files." : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Running the sto +red proc." : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "" : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "PL/SQL procedure successfully completed." : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "" : [Mon Nov 14 10:28:16 2005] [error] [client 10.142.204.242] String foun +d: "could not open /dev/kbd to get keyboard type US keyboard assumed" + : [download] So it seems like when running in a CGI env it reads and prints the whole log file, very strange.	[reply] [d/l] [select]
Re^3: regex log file too many results by tirwhan (Abbot) on Nov 14, 2005 at 10:52 UTC
So it seems like when running in a CGI env it reads and prints the whole log file, very strange. I think you're wrong in your conclusion. Take a look through your CGI code. You're setting `$get` to a different value somewhere in there. Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan	[reply] [d/l]
Re: regex log file too many results by Aristotle (Chancellor) on Nov 14, 2005 at 10:46 UTC
`$get` is somehow being reset; this can’t be happening in the code you show, so you must be omitting things from your script. Once it’s set to an empty string, that will result in an empty pattern, and an empty pattern means “reuse the last successfully matched pattern,” whatever that happens to be. Note that you’re using printf like print. Don’t do that, it opens you up to format string vulnerabilities. In Perl, always use print unless you have a specific reason to prefer printf. Also, you’re using `foreach(<>)`. Basically, this is always a mistake. You want `while(<>)` instead. The two loops do nearly the same thing, with the salient difference being that `foreach(<>)` slurps the entire file into memory at once, whereas `while(<>)` only nibbles at it as it processes it. `open(FH, "$pd_zvkk")` is problematic. You are using two-argument open without specifying a mode. If `$pd_zvkk` is user input or derived from it, someone could exploit the lack of explicit mode to overwrite stuff on your server. But even if you specify an explicit mode, two-argument open is problematic; use three-argument open. Also, a lone variable is quoted; that’s unnecessary, and while it’s no harm here other than adding useless noise to your code, in other contexts it will bite you. Don’t do that. So so far, you should write the expression like this: `open( FH, '<', $pd_zvkk )`. It’d be even better if instead of the global filehandle, you use a lexical one – `open( my $fh, '<', $pd_zvkk )` and replace `FH` in the rest of the code with `$fh` – for all the reasons for which it is better to use lexical rather than global variables. The double negation in `unless($_ !~ /$get/)` has been commented on. Lastly, this is just a hunch, but your code smells like the typical CGI script written without strict and taint mode. If my hunch is right, you should really read Ovid’s excellent CGI course before you shoot yourself in the foot (or rather, someone else does it). Makeshifts last the longest.	[reply] [d/l] [select]
Re: regex log file too many results by Moron (Curate) on Nov 14, 2005 at 10:55 UTC
The missing output for the third line of the given code suggests that the code and output simply don't belong to each other. On that basis no further speculation as to why it doesn't DWIM seems reasonable. Look at it from our point of view -- there could be about anything in the real code/output matching pair. -M Free your mind	[reply]
Re^2: regex log file too many results by minixman (Beadle) on Nov 14, 2005 at 11:15 UTC
Thanks for the heads up, seems when called the routing the value i was parsing to regex was not complete, so hence the print of all lines into the output file. Thanks team	[reply]
Re: regex log file too many results by GrandFather (Saint) on Nov 14, 2005 at 10:45 UTC
What do you expect to get? What was the code that produced the actual output? It seems unlikely to be the code shown - how were the time stamps etc. generated? What does the log file data actually look like? Try adding a sample of your actual data to the code below (replace the sample stuff following the __DATA__ line) and see if you can reproduce your problem. If so, reply with a better explanation of the problem. If not, tell us what is different about your situation. `use strict; use warnings; my $get = "Starting"; printf ("My get = $get \n"); foreach (<DATA>) { chomp(); printf ("String = $_ \n") unless($_ !~ /$get/); } __DATA__ Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Starting. Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Removing old clienta +cc files. Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Running the stored p +roc. PL/SQL procedure successfully completed.` [download] Prints: `My get = Starting String = Fri Aug 26 05:56:01 BST 2005 INFO: PD/ZVKK Load: Starting.` [download] Perl is Huffman encoded by design.	[reply] [d/l] [select]
Re: regex log file too many results by Cody Pendant (Prior) on Nov 14, 2005 at 10:51 UTC
`die ("Unable to open pd_zvkk log file : !$ \n");` [download] You won't get much out of !$ I'm afraid, $! is presumably what you meant. ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss') =~y~b-v~a-z~s; print	[reply] [d/l]

!$

$!