Excellent! This is just what I was looking for: a way to obviate those nasty, repeated code assertions. I'd read a little about @+ and @- in perlvar and perlre, but I needed a concrete example that was personal to me to help make sense of them finally.

I soon had a function based on your suggestion.

use English qw( -no_match_vars ); ... my $bates_number_pattern = qr{ ... }x; ... sub parse_bates_number { my $bates_number = shift; $bates_number =~ $bates_number_pattern or die "Invalid Bates number: $bates_number\n"; return map { substr $bates_number, $LAST_MATCH_START[$_], $LAST_MATCH_END[$_] - $LAST_MATCH_START[$_] } grep { defined $LAST_MATCH_START[$_] } ( 1 .. $#LAST_MATCH_START ); } ... my ($prefix, $number) = parse_bates_number($bates_number);
I chose to use English to muffle the line noise a bit. I realized I didn't need to iterate the whole series of subgroups in the regular expression, I only needed to iterate through the last matched subgroup, so I used (1..$#LAST_MATCH_START) instead of (1..$#LAST_MATCH_END).

I tested it and it worked brilliantly. But I was bothered by the fact that I was parsing the Bates numbers twice: once with a regular expression pattern and then again with substr. The two matched substrings were already captured and stored in variables--some $m and $n from the regular expression match--and yet I was extracting them anew with a string function.

So I tried this and it, too, worked flawlessly:

no strict 'refs'; return map { $$_ } grep { defined $LAST_MATCH_START[$_] } ( 1 .. $#LAST_MATCH_START );
Because $$_ is a symbolic reference, I'm forced to countermand strict 'refs', but this is a rare, legitimate use of symbolic references, don't you think?

Here's the revised script in its entirety:

#!/usr/bin/perl # # parse_bates_numbers_v2.pl use strict; use warnings; use English qw( -no_match_vars ); BEGIN { my $bates_number_pattern = qr{ ^ # ( Prefix ) ( Number ) (?: ( XYZ\s\d{2,3}(?:\sST)? ) \s ( \d{8} ) | ( XYZ\s[UV]\s\d{1,3} ) \s ( \d{8} ) | ( XYZ\s\d{3} ) ( \d{8} ) | ( XYZ ) \s ( \d{7,8} ) | ( ABC-M- ) ( \d{7} ) | ( ABCD- ) ( \d{8} ) | ( ) ( \d{11} ) ) $ }x; '12345678901' =~ $bates_number_pattern or die "Invalid Bates number pattern:\n$bates_number_pattern\n +"; $#LAST_MATCH_END % 2 == 0 or die "Invalid number of parentheses in pattern: $#LAST_MATCH +_END\n"; sub parse_bates_number { my $bates_number = shift; $bates_number =~ $bates_number_pattern or die "Invalid Bates number: $bates_number\n"; no strict 'refs'; return map { $$_ } grep { defined $LAST_MATCH_START[$_] } ( 1 .. $#LAST_MATCH_START ); } } while (my $bates_number = <DATA>) { chomp $bates_number; my ($prefix, $number) = parse_bates_number($bates_number); printf "%-20s %-10s %12.0f\n", $bates_number, $prefix, $number; } exit 0; __END__ XYZ 123 00000123 XYZ 123 00000456 XYZ 123 00654321 XYZ 12 ST 00123456 XYZ 123 ST 00654321 XYZ U 123 00123456 XYZ U 12 00654321 XYZ V 1 00123456 XYZ 12300654321 XYZ 00123456 XYZ 0654321 ABC-M-0123456 ABCD-00654321 00000123456 99999999999 BOGUS99

And here's its output:

XYZ 123 00000123 XYZ 123 123 XYZ 123 00000456 XYZ 123 456 XYZ 123 00654321 XYZ 123 654321 XYZ 12 ST 00123456 XYZ 12 ST 123456 XYZ 123 ST 00654321 XYZ 123 ST 654321 XYZ U 123 00123456 XYZ U 123 123456 XYZ U 12 00654321 XYZ U 12 654321 XYZ V 1 00123456 XYZ V 1 123456 XYZ 12300654321 XYZ 123 654321 XYZ 00123456 XYZ 123456 XYZ 0654321 XYZ 654321 ABC-M-0123456 ABC-M- 123456 ABCD-00654321 ABCD- 654321 00000123456 123456 99999999999 99999999999 Invalid Bates number: BOGUS99
I'm not exactly sure why I used a BEGIN block. It seems right. Is it?

Thanks again!

Jim


In reply to Re^2: Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions by Jim
in thread Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions by Jim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.