in reply to Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions
I like seeing the distinctions laid out in procedural conditions like this, rather than as a lengthy regex involving complex, perl-regex-specific features -- it just seems easier to read -- but that's just my personal preference.#!/usr/bin/perl use strict; use warnings; $\ = "\n"; $, = "\t"; while (<DATA>) { chomp; my ( $pfx, $num ); if ( /^\d+$/ ) { $pfx = ''; ( $num = $_ ) =~ s/^0+//; } elsif ( /^\S+$/ ) { ( $pfx, $num ) = ( /(\D+)0+(\d+)/ ); } else { my $last_space = rindex( $_, ' ' ) +1; ( $pfx = substr( $_, 0, $last_space )) =~ s/\s+$//; ( $num = substr( $_, $last_space )) =~ s/^([^0]*)0+//; $pfx .= " $1" if ( length( $1 )); } print $_, $pfx, $num; } __DATA__ XYZ 123 00654321 XYZ 12 ST 00123456 XYZ 123 ST 00654321 XYZ U 123 00123456 XYZ U 12 00654321 XYZ V 1 00123456 XYZ 12300654321 XYZ 00123456 XYZ 0654321 ABC-M-0123456 ABCD-00654321 00000123456
update: Looking at the OP again, I realize that the specificity of the various patterns in the OP code is intended as a sort of sanity check on the input (die if there are no specific matches).
In that regard -- again, just my personal view -- it might be easier (more legible / maintainable) to apply sanity checks to the individual result strings ($pfx, $num) after they've been picked apart from the input string by the kind of generic logic I suggested here; e.g., add an if block like this just before the print statement:
(Then again, that last regex addmittedly looks like the sort of thing that people usually point to as "line noise". I'm sure there are more legible ways of doing the same thing.)if ( $num !~ /^\d{1,7}$/ or $pfx !~ /^(?: ABC(?:D|-M)- | XYZ(?:\s[UV])? (?:\s\d{1,3} (?:\sST)? )? )$/x ) + { warn "Bad input at line $.\n"; next; }
|
|---|