comment on

This does the same thing as your code, though I don't know if it would extend correctly to other cases (if there are any other cases):

#!/usr/bin/perl

use strict;
use warnings;
$\ = "\n";
$, = "\t";

while (<DATA>) {
    chomp;
    my ( $pfx, $num );
    if ( /^\d+$/ ) {
        $pfx = '';
        ( $num = $_ ) =~ s/^0+//;
    }
    elsif ( /^\S+$/ ) {
        ( $pfx, $num ) = ( /(\D+)0+(\d+)/ );
    }
    else {
        my $last_space = rindex( $_, ' ' ) +1;
        ( $pfx = substr( $_, 0, $last_space )) =~ s/\s+$//;
        ( $num = substr( $_, $last_space )) =~ s/^([^0]*)0+//;
        $pfx .= " $1" if ( length( $1 ));
    }
    print $_, $pfx, $num;
}

__DATA__
XYZ 123 00654321
XYZ 12 ST 00123456
XYZ 123 ST 00654321
XYZ U 123 00123456
XYZ U 12 00654321
XYZ V 1 00123456
XYZ 12300654321
XYZ 00123456
XYZ 0654321
ABC-M-0123456
ABCD-00654321
00000123456
[download]

I like seeing the distinctions laid out in procedural conditions like this, rather than as a lengthy regex involving complex, perl-regex-specific features -- it just seems easier to read -- but that's just my personal preference.

update: Looking at the OP again, I realize that the specificity of the various patterns in the OP code is intended as a sort of sanity check on the input (die if there are no specific matches).

In that regard -- again, just my personal view -- it might be easier (more legible / maintainable) to apply sanity checks to the individual result strings ($pfx, $num) after they've been picked apart from the input string by the kind of generic logic I suggested here; e.g., add an if block like this just before the print statement:

    if ( $num !~ /^\d{1,7}$/ or
         $pfx !~ /^(?: ABC(?:D|-M)- |
                       XYZ(?:\s[UV])? (?:\s\d{1,3} (?:\sST)? )? )$/x )
+ {
        warn "Bad input at line $.\n";
        next;
    }
[download]

(Then again, that last regex addmittedly looks like the sort of thing that people usually point to as "line noise". I'm sure there are more legible ways of doing the same thing.)

In reply to Re: Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions by graff
in thread Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions by Jim

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.