Re^3: stripping whitespace gives an 'unitialized value' error

If your input always follows a particular format, I often find it is useful to write a simple line-parsing sub that returns the correct info (if it exists in the line) and handles any lines that don't conform to the format.
If you use a regex to match the format rather than splitting, i generally don't find it is that much slower, especially if you precomile the regex (/o modifier), and it gives you the peace of mind that you don't get errors like this.

sub parse_format{
  my $line = shift;
  my ($date, $number,); ## for returning
  while ($line =~ m/^ #linestart
                    (\d+\/\d+\/\d+\s+\d+) # capture date and first num
+ber
                     \,\s+ #junk in the middle
                     (\d+) # capture the last bit
                     $ # end of string
                   /ox )
  {
    ($date,$number,) = ($1, $2,);
  } else {
    warn "\'$line\' did not conform to format. Skipping...\n";
  }
  return ($date, $number,);
}
[download]

Or something... I haven't tested this, *just as an example*!!

Just a something something...

Comment on Re^3: stripping whitespace gives an 'unitialized value' error Download Code

Replies are listed 'Best First'.
Re^4: stripping whitespace gives an 'unitialized value' error by JavaFan (Canon) on Jun 18, 2009 at 11:11 UTC
The `/o` is hardly an optimization, and potentially dangerous. `/o` prevents recompilation of the regexp. However, since the regexp doesn't change, as it's not interpolating a variable, perl already knows the regexp hasn't changed, so no recompilation happens. Where it does make a difference is in situations like this: `for my $word ("foo", "bar", "baz") { if ($line =~ /quux $word/o) {yada yada yada} }` [download] Now using the `/o` modifier prevents the regexp from being recompiled. It's faster. It's unlikely to be correct though. The only use for `/o` is the slight optimization where you have a regexp with an interpolating variable and where the variable doesn't change. Without `/o` perl takes the time to check whether the interpolated string has changed, and recompiles only when it has. With `/o`, perl skips this check. Typically, the check is dwarved by the execution of the regexp, but there are a few cases where it's a tiny optimization. Given the dangers of `/o`, I prefer to use `qr` instead. IMO, the use of `/o` should be a mandatory warning, asking the programmer if he really, really wants this. Because for every correct use of it I have seen, I've seen 100 incorrect uses, and 1000 programmers who don't know what `/o` does.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: stripping whitespace gives an 'unitialized value' error
by JavaFan (Canon) on Jun 18, 2009 at 11:11 UTC

/o

Where it does make a difference is in situations like this:

for my $word ("foo", "bar", "baz") {
    if ($line =~ /quux $word/o) {yada yada yada}
}
[download]

/o

The only use for /o is the slight optimization where you have a regexp with an interpolating variable and where the variable doesn't change. Without /o perl takes the time to check whether the interpolated string has changed, and recompiles only when it has. With /o, perl skips this check. Typically, the check is dwarved by the execution of the regexp, but there are a few cases where it's a tiny optimization. Given the dangers of /o, I prefer to use qr instead.

IMO, the use of /o should be a mandatory warning, asking the programmer if he really, really wants this. Because for every correct use of it I have seen, I've seen 100 incorrect uses, and 1000 programmers who don't know what /o does.

[reply]
[d/l]
[select]