| [reply] [d/l] |
If your input always follows a particular format, I often find it is useful to write a simple line-parsing sub that returns the correct info (if it exists in the line) and handles any lines that don't conform to the format.If you use a regex to match the format rather than splitting, i generally don't find it is that much slower, especially if you precomile the regex (/o modifier), and it gives you the peace of mind that you don't get errors like this.
sub parse_format{
my $line = shift;
my ($date, $number,); ## for returning
while ($line =~ m/^ #linestart
(\d+\/\d+\/\d+\s+\d+) # capture date and first num
+ber
\,\s+ #junk in the middle
(\d+) # capture the last bit
$ # end of string
/ox )
{
($date,$number,) = ($1, $2,);
} else {
warn "\'$line\' did not conform to format. Skipping...\n";
}
return ($date, $number,);
}
Or something... I haven't tested this, *just as an example*!!
Just a something something...
| [reply] [d/l] |
for my $word ("foo", "bar", "baz") {
if ($line =~ /quux $word/o) {yada yada yada}
}
Now using the /o modifier prevents the regexp from being recompiled. It's faster. It's unlikely to be correct though.
The only use for /o is the slight optimization where you have a regexp with an interpolating variable and where the variable doesn't change. Without /o perl takes the time to check whether the interpolated string has changed, and recompiles only when it has. With /o, perl skips this check. Typically, the check is dwarved by the execution of the regexp, but there are a few cases where it's a tiny optimization. Given the dangers of /o, I prefer to use qr instead.
IMO, the use of /o should be a mandatory warning, asking the programmer if he really, really wants this. Because for every correct use of it I have seen, I've seen 100 incorrect uses, and 1000 programmers who don't know what /o does. | [reply] [d/l] [select] |