comment on

I started thinking about using a few nested if statements to simplify the regexes used at each stage, but that looked more complicated than necessary. I settled on the following, which eliminates the code blocks (assignments to variables) in the middle of the regex (which, according to perlre, is "considered highly experimental, and may be changed or deleted without notice") and uses multiple regexes rather than a single one with alternation. IMO, eliminating the code blocks and laying out the patterns so the common parts are aligned makes them easier to read.

Note that I don't use the temporary lexical variables $pfx and $num because the regex is inside an if block and the captured strings are only used in a single print statement. If you're doing more than printing, you could add them back in as shown in the comment.

I also removed the die so the program would continue processing, but you should use whatever is most appropriate in your situation.

use strict;
use warnings;

while( my $bates_number = <DATA> )
{
    chomp $bates_number;

    if( $bates_number =~ m/^( XYZ \s \d{2,3} (?:\sST)? ) \s ( \d{8}   
+)$/x  ||
        $bates_number =~ m/^( XYZ \s [UV] \s \d{1,3}   ) \s ( \d{8}   
+)$/x  ||
        $bates_number =~ m/^( XYZ \s \d{3}             )    ( \d{8}   
+)$/x  ||
        $bates_number =~ m/^( XYZ                      ) \s ( \d{7,8} 
+)$/x  ||
        $bates_number =~ m/^( ABC-M-                   )    ( \d{7}   
+)$/x  ||
        $bates_number =~ m/^( ABCD-                    )    ( \d{8}   
+)$/x  ||
        $bates_number =~ m/^(                          )    ( \d{11}  
+)$/x    )

    {
        # my ( $pfx, $num ) = ( $1, $2 ); # could assign here
        printf( "%-20s  %-12s  % d\n", $bates_number, $1, $2 );
    }
    else
    {
        print "Invalid Bates number: $bates_number\n";
    }
}

__DATA__
XYZ 123 00654321
XYZ 12 ST 00123456
XYZ 123 ST 00654321
XYZ U 123 00123456
XYZ U 12 00654321
XYZ V 1 00123456
XYZ 12300654321
XYZ 00123456
XYZ 0654321
ABC-M-0123456
ABCD-00654321
00000123456
[download]

Output:

XYZ 123 00654321      XYZ 123        654321
XYZ 12 ST 00123456    XYZ 12 ST      123456
XYZ 123 ST 00654321   XYZ 123 ST     654321
XYZ U 123 00123456    XYZ U 123      123456
XYZ U 12 00654321     XYZ U 12       654321
XYZ V 1 00123456      XYZ V 1        123456
XYZ 12300654321       XYZ 123        654321
XYZ 00123456          XYZ            123456
XYZ 0654321           XYZ            654321
ABC-M-0123456         ABC-M-         123456
ABCD-00654321         ABCD-          654321
00000123456                          123456
[download]

In reply to Re: Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions by bobf
in thread Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions by Jim

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.