in reply to Re^2: stripping whitespace gives an 'unitialized value' error
in thread stripping whitespace gives an 'unitialized value' error

If your input always follows a particular format, I often find it is useful to write a simple line-parsing sub that returns the correct info (if it exists in the line) and handles any lines that don't conform to the format.
If you use a regex to match the format rather than splitting, i generally don't find it is that much slower, especially if you precomile the regex (/o modifier), and it gives you the peace of mind that you don't get errors like this.

sub parse_format{ my $line = shift; my ($date, $number,); ## for returning while ($line =~ m/^ #linestart (\d+\/\d+\/\d+\s+\d+) # capture date and first num +ber \,\s+ #junk in the middle (\d+) # capture the last bit $ # end of string /ox ) { ($date,$number,) = ($1, $2,); } else { warn "\'$line\' did not conform to format. Skipping...\n"; } return ($date, $number,); }

Or something... I haven't tested this, *just as an example*!!

Just a something something...

Replies are listed 'Best First'.
Re^4: stripping whitespace gives an 'unitialized value' error
by JavaFan (Canon) on Jun 18, 2009 at 11:11 UTC
    The /o is hardly an optimization, and potentially dangerous. /o prevents recompilation of the regexp. However, since the regexp doesn't change, as it's not interpolating a variable, perl already knows the regexp hasn't changed, so no recompilation happens.

    Where it does make a difference is in situations like this:

    for my $word ("foo", "bar", "baz") { if ($line =~ /quux $word/o) {yada yada yada} }
    Now using the /o modifier prevents the regexp from being recompiled. It's faster. It's unlikely to be correct though.

    The only use for /o is the slight optimization where you have a regexp with an interpolating variable and where the variable doesn't change. Without /o perl takes the time to check whether the interpolated string has changed, and recompiles only when it has. With /o, perl skips this check. Typically, the check is dwarved by the execution of the regexp, but there are a few cases where it's a tiny optimization. Given the dangers of /o, I prefer to use qr instead.

    IMO, the use of /o should be a mandatory warning, asking the programmer if he really, really wants this. Because for every correct use of it I have seen, I've seen 100 incorrect uses, and 1000 programmers who don't know what /o does.