in reply to Re: Question on Regular Expression
in thread Question on Regular Expression

I have removed the second "\K". But this does not seem to help.

Here is what I intened to do

Lets say I have a string "RC1XY" which has 4 parts and when matched it would be as follows

Part 1 : RC => captured in P_ROOTCODE

Part 2 : 1 => captured in DAY1

Part 3 : X => captured in P_MON_CODE

Part 4 : Y => captured in P_NEW_MON_CODE

But if the string is passed as "RS" (instead of "RC1XY"), I was expecting P_ROOTCODE to hold "RS" and rest of the captures (DAY1, P_MON_CODE, P_NEW_MON_CODE) being blank. But even P_ROOTCODE is blank due to this undefined behavior

Can you please let me know if any other alternative approach to capture different parts when the string (ex :"RS" ) is not matching with the pattern.

Hope I made clear what is intened and hoping for solution or alternative approach

Replies are listed 'Best First'.
Re^3: Question on Regular Expression
by AnomalousMonk (Archbishop) on Dec 27, 2014 at 19:57 UTC

    Sometimes it's best to start simple with these things. Here's an alternate approach that seems to do what you seem to want done:

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @test = qw(RC1XY RS); ;; for my $s (@test) { printf qq{'$s' -> }; my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ ([[:upper:]]+) (?: (\d+) ([[:upper:]]) ([[:upper:]]))? }xms; dd $p_r, $d1, $p_mc, $p_new_mc; } " 'RC1XY' -> ("RC", 1, "X", "Y") 'RS' -> ("RS", undef, undef, undef)
    How does this match your basic requirements? What further elaborations and sophistications are needed? Do you really need named captures? Etc... (Update: You mention that you're using Perl 5.10, but both these examples, above and below, run the same for me under 5.8.9 and 5.14.4 as well as 5.10.1.)

    Update: I notice you write that you want a "blank" (which I take to be an empty string) to be produced for sub-patterns that do not match. You will note that the example above yields undefined values for non-matching sub-patterns. Since the empty string and undef both have a false boolean value, I find it is usually just as easy to test and deal with undefs as with empty strings and so I usually avoid the extra effort to produce an empty string. If you really need them, here's a possible alternative approach:

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @test = qw(RC1XY RS); ;; for my $s (@test) { printf qq{'$s' -> }; my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ ([[:upper:]]+) (\d*) ([[:upper:]]?) ([[:upper:]]?) }xms; dd $p_r, $d1, $p_mc, $p_new_mc; } " 'RC1XY' -> ("RC", 1, "X", "Y") 'RS' -> ("RS", "", "", "")

      Thanks for approach and I really appreciate your help on this

      Although you showed me the result what I wanted but I needed something more

      I guess I gave you simple example to illustrate the problem I faced. In my earlier example I used the regular expression as '(.*) (0-9) (A-Z) ((A-Z)'

      I ran the code snippet you gave me for 'RW12QW1XY' and it does not work and where as expected out come is as below

      'RW12QW1X' -> ("RW12QW", "1", "X", "")

        Please post your test code and output in something like the format shown below, and please use code tags; I can't really understand your regex without guessing! Please also give your expected/desired output; "it does not work" does not tell me very much.

        c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -e "my @test = qw(RC1XY RS RW12QW1X FOO FOOx FOO23PQ xFOO23PQXXX foo xyzz +y); ;; for my $s (@test) { printf qq{'$s' -> }; if (my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ ([[:upper:]]+) (\d*) ([[:upper:]]?) ([[:upper:]]?) }xms) { dd $p_r, $d1, $p_mc, $p_new_mc; } else { print qq{no match \n}; } } " 'RC1XY' -> ("RC", 1, "X", "Y") 'RS' -> ("RS", "", "", "") 'RW12QW1X' -> ("RW", 12, "Q", "W") 'FOO' -> ("FOO", "", "", "") 'FOOx' -> ("FOO", "", "", "") 'FOO23PQ' -> ("FOO", 23, "P", "Q") 'xFOO23PQXXX' -> ("FOO", 23, "P", "Q") 'foo' -> no match 'xyzzy' -> no match


        Give a man a fish:   <%-(-(-(-<

        You are not giving us enough information to work with. Please provide a large enough set of example inputs with their expected outputs to cover the expected combinations of actual input.

        Here are more example strings :

        1. Sample1Repeat1A -> ("Sample1Repeat", "1", "A", "")

        2, Sample2Repeat2 -> ("Sample2Repeat", "2", "", "")

        3. Sample3Repeat -> ("Sample3Repeat", "", "", "")

        4. 4SampleRepeat -> ("4SampleRepeat", "", "", "")

        5. 4SampleRepeat4 -> ("4SampleRepeat", "4", "", "")

        6. 5SampleRepeat5D -> ("4SampleRepeat", "5", "D", "")

        I hope these samples help giving more information.

      Here are more example strings :

      1. Sample1Repeat1A -> ("Sample1Repeat", "1", "A", "")

      2, Sample2Repeat2 -> ("Sample2Repeat", "2", "", "")

      3. Sample3Repeat -> ("Sample3Repeat", "", "", "")

      4. 4SampleRepeat -> ("4SampleRepeat", "", "", "")

      5. 4SampleRepeat4 -> ("4SampleRepeat", "4", "", "")

      6. 5SampleRepeat5D -> ("4SampleRepeat", "5", "D", "")

      I hope these samples help giving more information.

        I still don't understand if these are examples of what you are getting from a regex that "doesn't work" or examples of what you want to extract from the given strings. In any event, here's an approach that produces the given output. I don't say it's the most efficient or elegant.

        c:\@Work\Perl\monks>perl -wMstrict -e "my @test = qw( Sample1Repeat1A Sample2Repeat2 Sample3Repeat 4SampleRepeat 4SampleRepeat4 5SampleRepeat5D ); ;; for my $s (@test) { printf qq{'$s' -> }; if (my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ \b (.+?) (?: (\d*) (?: ([[:upper:]]?) ([[:upper:]]?) )? )? \b }xms) { printf qq{'$_' } for $p_r, $d1, $p_mc, $p_new_mc; print qq{\n}; } else { print qq{no match \n}; } } " 'Sample1Repeat1A' -> 'Sample1Repeat' '1' 'A' '' 'Sample2Repeat2' -> 'Sample2Repeat' '2' '' '' 'Sample3Repeat' -> 'Sample3Repeat' '' '' '' '4SampleRepeat' -> '4SampleRepeat' '' '' '' '4SampleRepeat4' -> '4SampleRepeat' '4' '' '' '5SampleRepeat5D' -> '5SampleRepeat' '5' 'D' ''
        (I just love playing regex Whack-A-Mole!)

        Update: sjain: I just saw your note above about not having access to Data::Dump, so I've changed the example code to a version that uses neither that module nor Data::Dumper, the output of which doesn't look so good in this instance.

        Further Update: I tried this regex with strings  RS RC1XY RW12QW1X and while the latter two parse ok,  RS does not, so AnonyMonk's approach below looks like a better one.


        Give a man a fish:   <%-(-(-(-<