in reply to Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions
I started thinking about using a few nested if statements to simplify the regexes used at each stage, but that looked more complicated than necessary. I settled on the following, which eliminates the code blocks (assignments to variables) in the middle of the regex (which, according to perlre, is "considered highly experimental, and may be changed or deleted without notice") and uses multiple regexes rather than a single one with alternation. IMO, eliminating the code blocks and laying out the patterns so the common parts are aligned makes them easier to read.
Note that I don't use the temporary lexical variables $pfx and $num because the regex is inside an if block and the captured strings are only used in a single print statement. If you're doing more than printing, you could add them back in as shown in the comment.
I also removed the die so the program would continue processing, but you should use whatever is most appropriate in your situation.
use strict; use warnings; while( my $bates_number = <DATA> ) { chomp $bates_number; if( $bates_number =~ m/^( XYZ \s \d{2,3} (?:\sST)? ) \s ( \d{8} +)$/x || $bates_number =~ m/^( XYZ \s [UV] \s \d{1,3} ) \s ( \d{8} +)$/x || $bates_number =~ m/^( XYZ \s \d{3} ) ( \d{8} +)$/x || $bates_number =~ m/^( XYZ ) \s ( \d{7,8} +)$/x || $bates_number =~ m/^( ABC-M- ) ( \d{7} +)$/x || $bates_number =~ m/^( ABCD- ) ( \d{8} +)$/x || $bates_number =~ m/^( ) ( \d{11} +)$/x ) { # my ( $pfx, $num ) = ( $1, $2 ); # could assign here printf( "%-20s %-12s % d\n", $bates_number, $1, $2 ); } else { print "Invalid Bates number: $bates_number\n"; } } __DATA__ XYZ 123 00654321 XYZ 12 ST 00123456 XYZ 123 ST 00654321 XYZ U 123 00123456 XYZ U 12 00654321 XYZ V 1 00123456 XYZ 12300654321 XYZ 00123456 XYZ 0654321 ABC-M-0123456 ABCD-00654321 00000123456
Output:
XYZ 123 00654321 XYZ 123 654321 XYZ 12 ST 00123456 XYZ 12 ST 123456 XYZ 123 ST 00654321 XYZ 123 ST 654321 XYZ U 123 00123456 XYZ U 123 123456 XYZ U 12 00654321 XYZ U 12 654321 XYZ V 1 00123456 XYZ V 1 123456 XYZ 12300654321 XYZ 123 654321 XYZ 00123456 XYZ 123456 XYZ 0654321 XYZ 654321 ABC-M-0123456 ABC-M- 123456 ABCD-00654321 ABCD- 654321 00000123456 123456
|
|---|