sjain has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone, I have a question related to regular expression and below are 2 code snippets. Can you please advise why they give diffferent output? What is wrong with code snippet 2 ?

Code Snippet 1

#!/usr/bin/perl use re qw(eval); my $pattern1; my $str1 = "RS"; $pattern1 = qr{(?<P_ROOTCODE>.*)(?{ push(@rc1, ${^MATCH}) })\K(?{ $p_r +ootcode = "@rc1" })(?<DAY1>[0-9])(?{ push(@rc2, ${^MATCH}) })(?<P_MON +_CODE>[A-Z])(?{ push(@rc3, ${^MATCH}) })$}; $str1 =~ m/$pattern1/; print "Value of p_rootcode in Pattern 1 is : $p_rootcode\n";

Code Snippet 2

#!/usr/bin/perl use re qw(eval); my $pattern2; my $str2 = "RS"; $pattern2 = qr{(?<P_ROOTCODE>.*)(?{ push(@rc1, ${^MATCH}) })\K(?{ $p_r +ootcode = "@rc1" })(?<DAY1>[0-9])(?{ push(@rc2, ${^MATCH}) })(?<P_MON +_CODE>[A-Z])(?{ push(@rc3, ${^MATCH}) })(?<P_NEW_MON_CODE>[A-Z])(?{ p +ush(@rc4, ${^MATCH}) })$}; $str2 =~ m/$pattern2/; print "Value of p_rootcode in Pattern 2 is : $p_rootcode\n";

Replies are listed 'Best First'.
Re: Question on Regular Expression
by choroba (Cardinal) on Dec 27, 2014 at 01:23 UTC
    In the second snippet, you are matching against $pattern1, but you only defined $pattern2. Changing it to $pattern2 doesn't change the output, though.

    Have you tried adding debug to the use re line? If I understand the documentation, you can remove eval from there, as it's useless for qr{}.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      If I understand the documentation, you can remove eval from there, as it's useless for qr{}.

      Well, as far as I can tell, the code the OP posted doesn't make use of the eval feature at all , the code section is a literal

      literal

      variable aka "dynamic" aka it throws error :)

      $ perl -E " $_ = 123; $f = '(?{ warn pos })'; m/\d$f/; " Eval-group not allowed at runtime, use re 'eval' in regex m/\d(?{ warn + pos })/ at -e line 1. $ perl -E " $_ = 123; $f = '(?{ warn pos })'; $g = qr/\d$f/; m/$g/" Eval-group not allowed at runtime, use re 'eval' in regex m/\d(?{ warn + pos })/ at -e line 1.

      I believe whatever the OPs real code, he got this error message so he added the "solution"

      I also believe there is no need for any of this stuff :) if all the OP is doing is building a data structure, simply match in a loop and use %+ or use Regexp::Grammars

        Can you please let me know how to make use of %+ so to match in a loop even if the submatch in a pattern string fails?

      As you pointed out, There was a typo in code snippet 2. I have changed $pattern1 with $pattern2 in code snippet 2. But I am getting different out. Is it because I am using version 5.10 ?

Re: Question on Regular Expression
by Anonymous Monk on Dec 27, 2014 at 01:09 UTC

    Hi everyone, I have a question related to regular expression and below are 2 code snippets. Can you please advise why they give diffferent output? What is wrong with code snippet 2 ?

    What is the difference between them?

    What output are they supposed to give?

    Why oneliner regex, why not you use /x? How can I hope to use regular expressions without creating illegible and unmaintainable code?

    Why did you use named patterns with $^MATCH? And with code callbacks?

    Why did you use \K twice?

    Have you heard of rxrx? It gives some indented explanations ... its close enough (or exact) to your pattern ...

      What is the difference between them? The difference is code snippet 2 is using one more named capture <P_NEW_MON_CODE>. What output are they supposed to give? Code snippet 1 is gving output as (value 'RS S') Value of p_rootcode in Pattern 1 is : RS R Code snippet 2 is gving output as below (no value printed) Value of p_rootcode in Pattern 2 is : Why oneliner regex, why not you use /x? How can I hope to use regular expressions without creating illegible and unmaintainable code? Sorry about that. Why did you use named patterns with $^MATCH? And with code callbacks? Named patterns used for saving the sub matches so that it can be used in later part of the code. Why did you use \K twice? Please ignore this as even if \K is not used still both code snippets are behaving same. Have you heard of rxrx? In this example it doesnt help why the output values are different.

Re: Question on Regular Expression
by CountZero (Bishop) on Dec 27, 2014 at 10:38 UTC
    Before we all spend lots of time trying to analyze your regexes, perhaps you can explain us what they are supposed to do?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Question on Regular Expression
by Anonymous Monk on Dec 27, 2014 at 00:38 UTC
    There is plenty of wrong with both. On my machine they give the same output - nothing.
Re: Question on Regular Expression
by Anonymous Monk on Dec 27, 2014 at 13:19 UTC
    This is what C programmers call 'undefined behaviour'. There is no point to try to explain why those regexes do what they do (whatever that is).

    perlre:

    There is a special form of this construct (look-behind), called "\K" (available since Perl 5.10.0), which causes the regex engine to "keep" everything it had matched prior to the "\K" and not include it in $& ($MATCH). This effectively provides variable-length look-behind. The use of "\K" inside of another look-around assertion is allowed, but the behaviour is currently not well defined.

    perlvar:

    In Perl v5.18 and earlier, it (${^MATCH}) is only guaranteed to return a defined value when the pattern was compiled or executed with the "/p" modifier. In Perl v5.20, the "/p" modifier does nothing, so "${^MATCH}" does the same thing as $MATCH.
    The OP is using 5.010.

      I have removed the second "\K". But this does not seem to help.

      Here is what I intened to do

      Lets say I have a string "RC1XY" which has 4 parts and when matched it would be as follows

      Part 1 : RC => captured in P_ROOTCODE

      Part 2 : 1 => captured in DAY1

      Part 3 : X => captured in P_MON_CODE

      Part 4 : Y => captured in P_NEW_MON_CODE

      But if the string is passed as "RS" (instead of "RC1XY"), I was expecting P_ROOTCODE to hold "RS" and rest of the captures (DAY1, P_MON_CODE, P_NEW_MON_CODE) being blank. But even P_ROOTCODE is blank due to this undefined behavior

      Can you please let me know if any other alternative approach to capture different parts when the string (ex :"RS" ) is not matching with the pattern.

      Hope I made clear what is intened and hoping for solution or alternative approach

        Sometimes it's best to start simple with these things. Here's an alternate approach that seems to do what you seem to want done:

        c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @test = qw(RC1XY RS); ;; for my $s (@test) { printf qq{'$s' -> }; my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ ([[:upper:]]+) (?: (\d+) ([[:upper:]]) ([[:upper:]]))? }xms; dd $p_r, $d1, $p_mc, $p_new_mc; } " 'RC1XY' -> ("RC", 1, "X", "Y") 'RS' -> ("RS", undef, undef, undef)
        How does this match your basic requirements? What further elaborations and sophistications are needed? Do you really need named captures? Etc... (Update: You mention that you're using Perl 5.10, but both these examples, above and below, run the same for me under 5.8.9 and 5.14.4 as well as 5.10.1.)

        Update: I notice you write that you want a "blank" (which I take to be an empty string) to be produced for sub-patterns that do not match. You will note that the example above yields undefined values for non-matching sub-patterns. Since the empty string and undef both have a false boolean value, I find it is usually just as easy to test and deal with undefs as with empty strings and so I usually avoid the extra effort to produce an empty string. If you really need them, here's a possible alternative approach:

        c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @test = qw(RC1XY RS); ;; for my $s (@test) { printf qq{'$s' -> }; my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ ([[:upper:]]+) (\d*) ([[:upper:]]?) ([[:upper:]]?) }xms; dd $p_r, $d1, $p_mc, $p_new_mc; } " 'RC1XY' -> ("RC", 1, "X", "Y") 'RS' -> ("RS", "", "", "")

Re: Question on Regular Expression
by Anonymous Monk on Dec 28, 2014 at 15:58 UTC

    Based on the discussion in this thread I've ignored the regexes in the OP and instead pieced together the various examples given.

    #!/usr/bin/env perl use warnings; use strict; use Test::More; sub parse { my ($str) = @_; my @m = $str =~ m{ ^ (?<P_ROOTCODE>\w+?) (?: (?<DAY1>\d) (?<P_MON_CODE>\w)? (?<P_NEW_MON_CODE>\w)? )? $ }x; note explain { $str => \%+ }; # debug output @m = map {$_//''} @m; # undef -> "" (optional) return \@m; } is_deeply parse("RS"), ["RS","","",""]; is_deeply parse("RC1XY"), ["RC","1","X","Y"]; is_deeply parse("RW12QW1X"), ["RW12QW", "1", "X", ""]; is_deeply parse("Sample1Repeat1A"), ["Sample1Repeat", "1", "A", ""]; is_deeply parse("Sample2Repeat2"), ["Sample2Repeat", "2", "", ""]; is_deeply parse("Sample3Repeat"), ["Sample3Repeat", "", "", ""]; is_deeply parse("4SampleRepeat"), ["4SampleRepeat", "", "", ""]; is_deeply parse("4SampleRepeat4"), ["4SampleRepeat", "4", "", ""]; is_deeply parse("5SampleRepeat5D"), ["5SampleRepeat", "5", "D", ""]; done_testing;

    Sample output:

    # { # 'RS' => { # 'P_ROOTCODE' => 'RS' # } # } ok 1 # { # 'RC1XY' => { # 'DAY1' => '1', # 'P_MON_CODE' => 'X', # 'P_NEW_MON_CODE' => 'Y', # 'P_ROOTCODE' => 'RC' # } # } ok 2 # { # 'RW12QW1X' => { # 'DAY1' => '1', # 'P_MON_CODE' => 'X', # 'P_ROOTCODE' => 'RW12QW' # } # } ok 3 ... ok 9 1..9

    If that still doesn't suit your needs, add more test cases.

      FWIW, here's another variation. It has the advantage of producing empty strings rather than undefined values for absent pattern elements, so no need for a conversion step. It also has the advantage, if such it be, of not using named captures with the possible overhead of tied-hashery. It passes all tests above.

      $string =~ m{ \A ([[:alnum:]]+?) (?= (?: \d+ [[:upper:]]{0,2})? \z) (\d*) ([[:upper:]]?) ([[:upper:]]?) \z }xms;


      Give a man a fish:   <%-(-(-(-<

      THANK YOU VERY MUCH !!! I really appreciate your prompt and quick response.