in reply to Re^2: Question on Regular Expression
in thread Question on Regular Expression

Sometimes it's best to start simple with these things. Here's an alternate approach that seems to do what you seem to want done:

c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @test = qw(RC1XY RS); ;; for my $s (@test) { printf qq{'$s' -> }; my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ ([[:upper:]]+) (?: (\d+) ([[:upper:]]) ([[:upper:]]))? }xms; dd $p_r, $d1, $p_mc, $p_new_mc; } " 'RC1XY' -> ("RC", 1, "X", "Y") 'RS' -> ("RS", undef, undef, undef)
How does this match your basic requirements? What further elaborations and sophistications are needed? Do you really need named captures? Etc... (Update: You mention that you're using Perl 5.10, but both these examples, above and below, run the same for me under 5.8.9 and 5.14.4 as well as 5.10.1.)

Update: I notice you write that you want a "blank" (which I take to be an empty string) to be produced for sub-patterns that do not match. You will note that the example above yields undefined values for non-matching sub-patterns. Since the empty string and undef both have a false boolean value, I find it is usually just as easy to test and deal with undefs as with empty strings and so I usually avoid the extra effort to produce an empty string. If you really need them, here's a possible alternative approach:

c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @test = qw(RC1XY RS); ;; for my $s (@test) { printf qq{'$s' -> }; my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ ([[:upper:]]+) (\d*) ([[:upper:]]?) ([[:upper:]]?) }xms; dd $p_r, $d1, $p_mc, $p_new_mc; } " 'RC1XY' -> ("RC", 1, "X", "Y") 'RS' -> ("RS", "", "", "")

Replies are listed 'Best First'.
Re^4: Question on Regular Expression
by sjain (Initiate) on Dec 28, 2014 at 03:55 UTC

    Thanks for approach and I really appreciate your help on this

    Although you showed me the result what I wanted but I needed something more

    I guess I gave you simple example to illustrate the problem I faced. In my earlier example I used the regular expression as '(.*) (0-9) (A-Z) ((A-Z)'

    I ran the code snippet you gave me for 'RW12QW1XY' and it does not work and where as expected out come is as below

    'RW12QW1X' -> ("RW12QW", "1", "X", "")

      Please post your test code and output in something like the format shown below, and please use code tags; I can't really understand your regex without guessing! Please also give your expected/desired output; "it does not work" does not tell me very much.

      c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -e "my @test = qw(RC1XY RS RW12QW1X FOO FOOx FOO23PQ xFOO23PQXXX foo xyzz +y); ;; for my $s (@test) { printf qq{'$s' -> }; if (my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ ([[:upper:]]+) (\d*) ([[:upper:]]?) ([[:upper:]]?) }xms) { dd $p_r, $d1, $p_mc, $p_new_mc; } else { print qq{no match \n}; } } " 'RC1XY' -> ("RC", 1, "X", "Y") 'RS' -> ("RS", "", "", "") 'RW12QW1X' -> ("RW", 12, "Q", "W") 'FOO' -> ("FOO", "", "", "") 'FOOx' -> ("FOO", "", "", "") 'FOO23PQ' -> ("FOO", 23, "P", "Q") 'xFOO23PQXXX' -> ("FOO", 23, "P", "Q") 'foo' -> no match 'xyzzy' -> no match


      Give a man a fish:   <%-(-(-(-<

      You are not giving us enough information to work with. Please provide a large enough set of example inputs with their expected outputs to cover the expected combinations of actual input.

      Here are more example strings :

      1. Sample1Repeat1A -> ("Sample1Repeat", "1", "A", "")

      2, Sample2Repeat2 -> ("Sample2Repeat", "2", "", "")

      3. Sample3Repeat -> ("Sample3Repeat", "", "", "")

      4. 4SampleRepeat -> ("4SampleRepeat", "", "", "")

      5. 4SampleRepeat4 -> ("4SampleRepeat", "4", "", "")

      6. 5SampleRepeat5D -> ("4SampleRepeat", "5", "D", "")

      I hope these samples help giving more information.

Re^4: Question on Regular Expression
by sjain (Initiate) on Dec 28, 2014 at 13:13 UTC

    Here are more example strings :

    1. Sample1Repeat1A -> ("Sample1Repeat", "1", "A", "")

    2, Sample2Repeat2 -> ("Sample2Repeat", "2", "", "")

    3. Sample3Repeat -> ("Sample3Repeat", "", "", "")

    4. 4SampleRepeat -> ("4SampleRepeat", "", "", "")

    5. 4SampleRepeat4 -> ("4SampleRepeat", "4", "", "")

    6. 5SampleRepeat5D -> ("4SampleRepeat", "5", "D", "")

    I hope these samples help giving more information.

      I still don't understand if these are examples of what you are getting from a regex that "doesn't work" or examples of what you want to extract from the given strings. In any event, here's an approach that produces the given output. I don't say it's the most efficient or elegant.

      c:\@Work\Perl\monks>perl -wMstrict -e "my @test = qw( Sample1Repeat1A Sample2Repeat2 Sample3Repeat 4SampleRepeat 4SampleRepeat4 5SampleRepeat5D ); ;; for my $s (@test) { printf qq{'$s' -> }; if (my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ \b (.+?) (?: (\d*) (?: ([[:upper:]]?) ([[:upper:]]?) )? )? \b }xms) { printf qq{'$_' } for $p_r, $d1, $p_mc, $p_new_mc; print qq{\n}; } else { print qq{no match \n}; } } " 'Sample1Repeat1A' -> 'Sample1Repeat' '1' 'A' '' 'Sample2Repeat2' -> 'Sample2Repeat' '2' '' '' 'Sample3Repeat' -> 'Sample3Repeat' '' '' '' '4SampleRepeat' -> '4SampleRepeat' '' '' '' '4SampleRepeat4' -> '4SampleRepeat' '4' '' '' '5SampleRepeat5D' -> '5SampleRepeat' '5' 'D' ''
      (I just love playing regex Whack-A-Mole!)

      Update: sjain: I just saw your note above about not having access to Data::Dump, so I've changed the example code to a version that uses neither that module nor Data::Dumper, the output of which doesn't look so good in this instance.

      Further Update: I tried this regex with strings  RS RC1XY RW12QW1X and while the latter two parse ok,  RS does not, so AnonyMonk's approach below looks like a better one.


      Give a man a fish:   <%-(-(-(-<