use English qw( -no_match_vars );
...
my $bates_number_pattern = qr{ ... }x;
...
sub parse_bates_number {
my $bates_number = shift;
$bates_number =~ $bates_number_pattern
or die "Invalid Bates number: $bates_number\n";
return map { substr $bates_number,
$LAST_MATCH_START[$_],
$LAST_MATCH_END[$_] - $LAST_MATCH_START[$_] }
grep { defined $LAST_MATCH_START[$_] }
( 1 .. $#LAST_MATCH_START );
}
...
my ($prefix, $number) = parse_bates_number($bates_number);
I chose to use English to muffle the line noise a bit. I realized I didn't need to iterate the whole series of subgroups in the regular expression, I only needed to iterate through the last matched subgroup, so I used (1..$#LAST_MATCH_START) instead of (1..$#LAST_MATCH_END).
I tested it and it worked brilliantly. But I was bothered by the fact that I was parsing the Bates numbers twice: once with a regular expression pattern and then again with substr. The two matched substrings were already captured and stored in variables--some $m and $n from the regular expression match--and yet I was extracting them anew with a string function.
So I tried this and it, too, worked flawlessly:
no strict 'refs';
return map { $$_ }
grep { defined $LAST_MATCH_START[$_] }
( 1 .. $#LAST_MATCH_START );
Because $$_ is a symbolic reference, I'm forced to countermand strict 'refs', but this is a rare, legitimate use of symbolic references, don't you think?
Here's the revised script in its entirety:
And here's its output:
XYZ 123 00000123 XYZ 123 123
XYZ 123 00000456 XYZ 123 456
XYZ 123 00654321 XYZ 123 654321
XYZ 12 ST 00123456 XYZ 12 ST 123456
XYZ 123 ST 00654321 XYZ 123 ST 654321
XYZ U 123 00123456 XYZ U 123 123456
XYZ U 12 00654321 XYZ U 12 654321
XYZ V 1 00123456 XYZ V 1 123456
XYZ 12300654321 XYZ 123 654321
XYZ 00123456 XYZ 123456
XYZ 0654321 XYZ 654321
ABC-M-0123456 ABC-M- 123456
ABCD-00654321 ABCD- 654321
00000123456 123456
99999999999 99999999999
Invalid Bates number: BOGUS99
I'm not exactly sure why I used a BEGIN block. It seems right. Is it?
Thanks again!
Jim
|