madbombX has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I am iterating over a group of log files which has, in total, a few million lines easily. I am doing this with a while loop going line by line. Every so often, I come across a line that isn't getting parsed correctly and thus not correctly inserted into my hash and thus throws a "Use of uninitialized value in concatenation ..." warning. (I am using this to find the "exceptions" in the log file). I catch that warning, and print out the variables associated with it. So far so good, but I recreate the signal catcher every iteration (every line) otherwise I get an error of undefined variables:

while (my $line = <LOG>) { my (@entry) = split(/\s+/, $line); my (@edate) = Parse_Date(substr($entry[0],0,10)); my ($http_code) = substr($entry[3], -3); my ($url) = "$1$2" if ($entry[6] =~ m|^http://.*v1=(.*)&v2=(.*)&v3=.*$|); $SIG{'__WARN__'} = sub { print "WARNING: ". $_[0] ."\n". " http_code: $http_code\n". " url: $url\n". " edate[0]: $edate[0]\n". " edate[1]: $edate[1]\n". " edate[2]: $edate[2]\n". " edate[3]: $edate[3]\n" if ($_[0] =~ /^Use of uninitialized value in hash element.*$/); }; $STATUS{$http_code}{$url}{$edate[0]}{$edate[1]}{$edate[2]}{$edate[3]} +++; }

Optimally, I would like to only run that warning handler once at the beginning of the script, but since those variable are locally scoped and recreated every iteration (i'm sure there is a better way to do that too that I am not aware of), I have to setup the handler after all the variables have been delcared. Can someone tell me the best way to handle this or if this is good enough?

Replies are listed 'Best First'.
Re: Catching Warnings and Showing Uninitialized Variables
by BrowserUk (Patriarch) on Feb 14, 2007 at 18:45 UTC

    As ikegami pointed out to me yesterday, the sub will only be compiled once, so it's only the assignment to $SIG{__WARN__} that will be duplicated. You could avoid that by only setting it if it's not already set:

    while (my $line = <LOG>) { my (@entry) = split(/\s+/, $line); my (@edate) = Parse_Date(substr($entry[0],0,10)); my ($http_code) = substr($entry[3], -3); my ($url) = "$1$2" if ($entry[6] =~ m|^http://.*v1=(.*)&v2=(.*)&v3=.*$|); $SIG{'__WARN__'} ||= sub { print "WARNING: ". $_[0] ."\n". " http_code: $http_code\n". " url: $url\n". " edate[0]: $edate[0]\n". " edate[1]: $edate[1]\n". " edate[2]: $edate[2]\n". " edate[3]: $edate[3]\n" if ($_[0] =~ /^Use of uninitialized value in hash element.*$/); }; $STATUS{$http_code}{$url}{$edate[0]}{$edate[1]}{$edate[2]}{$edate[3]} +++; }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Catching Warnings and Showing Uninitialized Variables
by kyle (Abbot) on Feb 14, 2007 at 18:32 UTC

    One simple way to avoid recreating the sub every time is to move the lexical variables out of the loop.

    my $http_code; my $url; my @edate; $SIG{'__WARN__'} = sub { print "WARNING: ". $_[0] ."\n". " http_code: $http_code\n". " url: $url\n". " edate[0]: $edate[0]\n". " edate[1]: $edate[1]\n". " edate[2]: $edate[2]\n". " edate[3]: $edate[3]\n" if ($_[0] =~ /^Use of uninitialized value in hash element.*$/); }; while (my $line = <LOG>) { my (@entry) = split(/\s+/, $line); @edate = Parse_Date(substr($entry[0],0,10)); $http_code = substr($entry[3], -3); $url = "$1$2" if ($entry[6] =~ m|^http://.*v1=(.*)&v2=(.*)&v3=.*$|); $STATUS{$http_code}{$url}{$edate[0]}{$edate[1]}{$edate[2]}{$edate[3]} +++; }

    If you want to keep those variables from existing too far away from the loop, you can put the whole thing in a new scope.

    { my $http_code; my $url; my @edate; $SIG{'__WARN__'} = sub { ... }; while (my $line = <LOG>) { ... } }
Re: Catching Warnings and Showing Uninitialized Variables
by chudpi (Initiate) on Feb 14, 2007 at 19:20 UTC
    Sometimes it's simpler to escalate warnings to the level of errors with

    use warnings FATAL => 'all';

    and then handling them just like an exception with an eval block.

    More on warnings.

    HTH.
Re: Catching Warnings and Showing Uninitialized Variables
by Util (Priest) on Feb 15, 2007 at 16:19 UTC
    I think that $SIG{'__WARN__'} is the wrong approach to this problem. The warning-generating condition can be detected before the warning occurs, so why not detect it yourself and handle it more cleanly than with an action-at-a-distance %SIG handler? Here are two methods that I frequently use:
    1. Warn and skip as soon as the line is known to be bad. In your case, do it when the URL RE fails. This method tends to be efficient, since any parsing code lower in the loop is bypassed on error. However, it only reports the first parse error for any particular line, and requires warn/skip code at each parse step that could fail.
    2. $entry[6] =~ m{^http://.*v1=(.*)&v2=(.*)&v3=.*$} or do { warn "Skipping unparseable line $. :\n" . " http_code: $http_code\n" . " url: $url\n" . " edate[0]: $edate[0]\n" . " edate[1]: $edate[1]\n" . " edate[2]: $edate[2]\n" . " edate[3]: $edate[3]\n" ; next; }; my $url = "$1$2"; $STATUS{$http_code}{$url}{$edate[0]}{$edate[1]}{$edate[2]}{$edate[3]}+ ++;
    3. When any element fails to parse, set it to undef. When the time comes to use those elements, then warn/skip if any of them are undefined. In your case, do it just before incrementing %STATUS. This method produces only one warning per line when multiple parse errors occur, and consolidates all the error handling (except setting undef) into one place.
    4. my $url = ( $entry[6] =~ m{^http://.*v1=(.*)&v2=(.*)&v3=.*$} ) ? "$1$2" : undef ; # ... More parsing code can go here. if ( grep {not defined $_} ( $http_code, $url, @edate[0..3] ) ) { warn "Skipping unparseable line $. :\n"; # ... and dump vars. next; } $STATUS{$http_code}{$url}{$edate[0]}{$edate[1]}{$edate[2]}{$edate[3]}+ ++;