Jim has asked for the wisdom of the Perl Monks concerning the following question:
(The truth is, I can't figure out how to ask the question I mean to ask. Please infer and interpret liberally and generously.)
I did this. I don't like it because it depends on esoteric regular expression stuff and because there are a bunch of repeated assignments to the same two variables. Is there a better way to accomplish the same parsing task?
Also, why do I have to declare the variables $pfx and $num outside the while loop for this to work properly?$ cat parse_bates_numbers.pl #!/usr/bin/perl # # parse_bates_numbers.pl use strict; use warnings; $\ = "\n"; $, = "\t"; my ($pfx, $num); while (my $bates_number = <DATA>) { chomp $bates_number; undef $pfx; undef $num; $bates_number =~ m{ \A (?: # XYZ 999 99999999 or XYZ 99 ST 99999999 or XYZ 999 ST 9999999 +9 (XYZ\s\d{2,3}(?:\sST)?) (?{ $pfx = $^N }) \s(\d{8}) (?{ $num + = $^N }) | # XYZ U 999 99999999 or XYZ U 99 99999999 or XYZ V 9 99999999 (XYZ\s[UV]\s\d{1,3}) (?{ $pfx = $^N }) \s(\d{8}) (?{ $num + = $^N }) | # XYZ 99999999999 (XYZ\s\d{3}) (?{ $pfx = $^N }) (\d{8}) (?{ $num + = $^N }) | # XYZ 99999999 or XYZ 9999999 (XYZ) (?{ $pfx = $^N }) \s(\d{7,8}) (?{ $num + = $^N }) | # ABC-M-9999999 (ABC-M-) (?{ $pfx = $^N }) (\d{7}) (?{ $num + = $^N }) | # ABCD-99999999 (ABCD-) (?{ $pfx = $^N }) (\d{8}) (?{ $num + = $^N }) | # 99999999999 () (?{ $pfx = $^N }) (\d{11}) (?{ $num + = $^N }) ) \z }x or die "Invalid Bates number $bates_number"; print $bates_number, $pfx, $num + 0; } exit 0; __END__ XYZ 123 00654321 XYZ 12 ST 00123456 XYZ 123 ST 00654321 XYZ U 123 00123456 XYZ U 12 00654321 XYZ V 1 00123456 XYZ 12300654321 XYZ 00123456 XYZ 0654321 ABC-M-0123456 ABCD-00654321 00000123456 $ perl ./parse_bates_numbers.pl | expand -20 XYZ 123 00654321 XYZ 123 654321 XYZ 12 ST 00123456 XYZ 12 ST 123456 XYZ 123 ST 00654321 XYZ 123 ST 654321 XYZ U 123 00123456 XYZ U 123 123456 XYZ U 12 00654321 XYZ U 12 654321 XYZ V 1 00123456 XYZ V 1 123456 XYZ 12300654321 XYZ 123 654321 XYZ 00123456 XYZ 123456 XYZ 0654321 XYZ 654321 ABC-M-0123456 ABC-M- 123456 ABCD-00654321 ABCD- 654321 00000123456 123456 $
Jim
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions
by bobf (Monsignor) on Sep 08, 2007 at 21:02 UTC | |
|
Re: Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions
by lodin (Hermit) on Sep 08, 2007 at 20:41 UTC | |
by Jim (Curate) on Sep 09, 2007 at 22:45 UTC | |
by lodin (Hermit) on Sep 10, 2007 at 13:46 UTC | |
|
Re: Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions
by graff (Chancellor) on Sep 08, 2007 at 23:00 UTC | |
|
Re: Matching Multiple Alternative Patterns and Capturing Multiple Subexpressions
by johngg (Canon) on Sep 08, 2007 at 22:29 UTC | |
by Jim (Curate) on Sep 09, 2007 at 22:51 UTC |