What's the best way to match against multiple, carefully-arranged alternative patterns and capture multiple matched subexpressions in a finite number of variables (in this case, two: $pfx and $num)?

(The truth is, I can't figure out how to ask the question I mean to ask. Please infer and interpret liberally and generously.)

I did this. I don't like it because it depends on esoteric regular expression stuff and because there are a bunch of repeated assignments to the same two variables. Is there a better way to accomplish the same parsing task?

$ cat parse_bates_numbers.pl #!/usr/bin/perl # # parse_bates_numbers.pl use strict; use warnings; $\ = "\n"; $, = "\t"; my ($pfx, $num); while (my $bates_number = <DATA>) { chomp $bates_number; undef $pfx; undef $num; $bates_number =~ m{ \A (?: # XYZ 999 99999999 or XYZ 99 ST 99999999 or XYZ 999 ST 9999999 +9 (XYZ\s\d{2,3}(?:\sST)?) (?{ $pfx = $^N }) \s(\d{8}) (?{ $num + = $^N }) | # XYZ U 999 99999999 or XYZ U 99 99999999 or XYZ V 9 99999999 (XYZ\s[UV]\s\d{1,3}) (?{ $pfx = $^N }) \s(\d{8}) (?{ $num + = $^N }) | # XYZ 99999999999 (XYZ\s\d{3}) (?{ $pfx = $^N }) (\d{8}) (?{ $num + = $^N }) | # XYZ 99999999 or XYZ 9999999 (XYZ) (?{ $pfx = $^N }) \s(\d{7,8}) (?{ $num + = $^N }) | # ABC-M-9999999 (ABC-M-) (?{ $pfx = $^N }) (\d{7}) (?{ $num + = $^N }) | # ABCD-99999999 (ABCD-) (?{ $pfx = $^N }) (\d{8}) (?{ $num + = $^N }) | # 99999999999 () (?{ $pfx = $^N }) (\d{11}) (?{ $num + = $^N }) ) \z }x or die "Invalid Bates number $bates_number"; print $bates_number, $pfx, $num + 0; } exit 0; __END__ XYZ 123 00654321 XYZ 12 ST 00123456 XYZ 123 ST 00654321 XYZ U 123 00123456 XYZ U 12 00654321 XYZ V 1 00123456 XYZ 12300654321 XYZ 00123456 XYZ 0654321 ABC-M-0123456 ABCD-00654321 00000123456 $ perl ./parse_bates_numbers.pl | expand -20 XYZ 123 00654321 XYZ 123 654321 XYZ 12 ST 00123456 XYZ 12 ST 123456 XYZ 123 ST 00654321 XYZ 123 ST 654321 XYZ U 123 00123456 XYZ U 123 123456 XYZ U 12 00654321 XYZ U 12 654321 XYZ V 1 00123456 XYZ V 1 123456 XYZ 12300654321 XYZ 123 654321 XYZ 00123456 XYZ 123456 XYZ 0654321 XYZ 654321 ABC-M-0123456 ABC-M- 123456 ABCD-00654321 ABCD- 654321 00000123456 123456 $
Also, why do I have to declare the variables $pfx and $num outside the while loop for this to work properly?

Jim


In reply to Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions by Jim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.