Question on Regular Expression

sjain has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Question on Regular Expression by choroba (Cardinal) on Dec 27, 2014 at 01:23 UTC
In the second snippet, you are matching against $pattern1, but you only defined $pattern2. Changing it to $pattern2 doesn't change the output, though. Have you tried adding `debug` to the `use re` line? If I understand the documentation, you can remove `eval` from there, as it's useless for `qr{}`. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re^2: Question on Regular Expression by Anonymous Monk on Dec 27, 2014 at 08:15 UTC
If I understand the documentation, you can remove eval from there, as it's useless for qr{}. Well, as far as I can tell, the code the OP posted doesn't make use of the eval feature at all , the code section is a literal literal Read more... (643 Bytes) variable aka "dynamic" aka it throws error :) `$ perl -E " $_ = 123; $f = '(?{ warn pos })'; m/\d$f/; " Eval-group not allowed at runtime, use re 'eval' in regex m/\d(?{ warn + pos })/ at -e line 1. $ perl -E " $_ = 123; $f = '(?{ warn pos })'; $g = qr/\d$f/; m/$g/" Eval-group not allowed at runtime, use re 'eval' in regex m/\d(?{ warn + pos })/ at -e line 1.` [download] I believe whatever the OPs real code, he got this error message so he added the "solution" I also believe there is no need for any of this stuff :) if all the OP is doing is building a data structure, simply match in a loop and use %+ or use Regexp::Grammars	[reply] [d/l] [select]
Re^3: Question on Regular Expression by sjain (Initiate) on Dec 27, 2014 at 14:17 UTC
Can you please let me know how to make use of %+ so to match in a loop even if the submatch in a pattern string fails?	[reply]
Re^4: Question on Regular Expression by Anonymous Monk on Dec 28, 2014 at 06:03 UTC
Re^5: Question on Regular Expression by sjain (Initiate) on Dec 28, 2014 at 13:32 UTC
Some notes below your chosen depth have not been shown here
Re^2: Question on Regular Expression by sjain (Initiate) on Dec 27, 2014 at 12:10 UTC
As you pointed out, There was a typo in code snippet 2. I have changed $pattern1 with $pattern2 in code snippet 2. But I am getting different out. Is it because I am using version 5.10 ?	[reply]
Re: Question on Regular Expression by Anonymous Monk on Dec 27, 2014 at 01:09 UTC
Hi everyone, I have a question related to regular expression and below are 2 code snippets. Can you please advise why they give diffferent output? What is wrong with code snippet 2 ? What is the difference between them? What output are they supposed to give? Why oneliner regex, why not you use /x? How can I hope to use regular expressions without creating illegible and unmaintainable code? Why did you use named patterns with $^MATCH? And with code callbacks? Why did you use \K twice? Have you heard of rxrx? It gives some indented explanations ... its close enough (or exact) to your pattern ... Read more... (2 kB) Read more... (2 kB)	[reply] [d/l] [select]
Re^2: Question on Regular Expression by sjain (Initiate) on Dec 27, 2014 at 12:45 UTC
What is the difference between them? The difference is code snippet 2 is using one more named capture <P_NEW_MON_CODE>. What output are they supposed to give? Code snippet 1 is gving output as (value 'RS S') Value of p_rootcode in Pattern 1 is : RS R Code snippet 2 is gving output as below (no value printed) Value of p_rootcode in Pattern 2 is : Why oneliner regex, why not you use /x? How can I hope to use regular expressions without creating illegible and unmaintainable code? Sorry about that. Why did you use named patterns with $^MATCH? And with code callbacks? Named patterns used for saving the sub matches so that it can be used in later part of the code. Why did you use \K twice? Please ignore this as even if \K is not used still both code snippets are behaving same. Have you heard of rxrx? In this example it doesnt help why the output values are different.	[reply]
Re: Question on Regular Expression by CountZero (Bishop) on Dec 27, 2014 at 10:38 UTC
Before we all spend lots of time trying to analyze your regexes, perhaps you can explain us what they are supposed to do? CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply]
Re: Question on Regular Expression by Anonymous Monk on Dec 27, 2014 at 00:38 UTC
There is plenty of wrong with both. On my machine they give the same output - nothing.	[reply]
Re: Question on Regular Expression by Anonymous Monk on Dec 27, 2014 at 13:19 UTC
This is what C programmers call 'undefined behaviour'. There is no point to try to explain why those regexes do what they do (whatever that is). perlre: There is a special form of this construct (look-behind), called "\K" (available since Perl 5.10.0), which causes the regex engine to "keep" everything it had matched prior to the "\K" and not include it in $& ($MATCH). This effectively provides variable-length look-behind. The use of "\K" inside of another look-around assertion is allowed, but the behaviour is currently not well defined. perlvar: In Perl v5.18 and earlier, it (`${^MATCH}`) is only guaranteed to return a defined value when the pattern was compiled or executed with the "/p" modifier. In Perl v5.20, the "/p" modifier does nothing, so "${^MATCH}" does the same thing as $MATCH. The OP is using 5.010.	[reply] [d/l]
Re^2: Question on Regular Expression by sjain (Initiate) on Dec 27, 2014 at 18:33 UTC
I have removed the second "\K". But this does not seem to help. Here is what I intened to do Lets say I have a string "RC1XY" which has 4 parts and when matched it would be as follows Part 1 : RC => captured in P_ROOTCODE Part 2 : 1 => captured in DAY1 Part 3 : X => captured in P_MON_CODE Part 4 : Y => captured in P_NEW_MON_CODE But if the string is passed as "RS" (instead of "RC1XY"), I was expecting P_ROOTCODE to hold "RS" and rest of the captures (DAY1, P_MON_CODE, P_NEW_MON_CODE) being blank. But even P_ROOTCODE is blank due to this undefined behavior Can you please let me know if any other alternative approach to capture different parts when the string (ex :"RS" ) is not matching with the pattern. Hope I made clear what is intened and hoping for solution or alternative approach	[reply]
Re^3: Question on Regular Expression by AnomalousMonk (Archbishop) on Dec 27, 2014 at 19:57 UTC
Sometimes it's best to start simple with these things. Here's an alternate approach that seems to do what you seem to want done: `c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @test = qw(RC1XY RS); ;; for my $s (@test) { printf qq{'$s' -> }; my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ ([[:upper:]]+) (?: (\d+) ([[:upper:]]) ([[:upper:]]))? }xms; dd $p_r, $d1, $p_mc, $p_new_mc; } " 'RC1XY' -> ("RC", 1, "X", "Y") 'RS' -> ("RS", undef, undef, undef)` [download] How does this match your basic requirements? What further elaborations and sophistications are needed? Do you really need named captures? Etc... (Update: You mention that you're using Perl 5.10, but both these examples, above and below, run the same for me under 5.8.9 and 5.14.4 as well as 5.10.1.) Update: I notice you write that you want a "blank" (which I take to be an empty string) to be produced for sub-patterns that do not match. You will note that the example above yields undefined values for non-matching sub-patterns. Since the empty string and undef both have a false boolean value, I find it is usually just as easy to test and deal with `undef`s as with empty strings and so I usually avoid the extra effort to produce an empty string. If you really need them, here's a possible alternative approach: `c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @test = qw(RC1XY RS); ;; for my $s (@test) { printf qq{'$s' -> }; my ($p_r, $d1, $p_mc, $p_new_mc) = $s =~ m{ ([[:upper:]]+) (\d*) ([[:upper:]]?) ([[:upper:]]?) }xms; dd $p_r, $d1, $p_mc, $p_new_mc; } " 'RC1XY' -> ("RC", 1, "X", "Y") 'RS' -> ("RS", "", "", "")` [download]	[reply] [d/l] [select]
Re^4: Question on Regular Expression by sjain (Initiate) on Dec 28, 2014 at 03:55 UTC
Re^5: Question on Regular Expression by AnomalousMonk (Archbishop) on Dec 28, 2014 at 05:57 UTC
Re^5: Question on Regular Expression by Anonymous Monk on Dec 28, 2014 at 04:41 UTC
Re^5: Question on Regular Expression by sjain (Initiate) on Dec 28, 2014 at 06:02 UTC
Re^4: Question on Regular Expression by sjain (Initiate) on Dec 28, 2014 at 13:13 UTC
Re^5: Question on Regular Expression by AnomalousMonk (Archbishop) on Dec 28, 2014 at 17:33 UTC
Re: Question on Regular Expression by Anonymous Monk on Dec 28, 2014 at 15:58 UTC
Based on the discussion in this thread I've ignored the regexes in the OP and instead pieced together the various examples given. #!/usr/bin/env perl use warnings; use strict; use Test::More; sub parse { my ($str) = @_; my @m = $str =~ m{ ^ (?<P_ROOTCODE>\w+?) (?: (?<DAY1>\d) (?<P_MON_CODE>\w)? (?<P_NEW_MON_CODE>\w)? )? $ }x; note explain { $str => \%+ }; # debug output @m = map {$_//''} @m; # undef -> "" (optional) return \@m; } is_deeply parse("RS"), ["RS","","",""]; is_deeply parse("RC1XY"), ["RC","1","X","Y"]; is_deeply parse("RW12QW1X"), ["RW12QW", "1", "X", ""]; is_deeply parse("Sample1Repeat1A"), ["Sample1Repeat", "1", "A", ""]; is_deeply parse("Sample2Repeat2"), ["Sample2Repeat", "2", "", ""]; is_deeply parse("Sample3Repeat"), ["Sample3Repeat", "", "", ""]; is_deeply parse("4SampleRepeat"), ["4SampleRepeat", "", "", ""]; is_deeply parse("4SampleRepeat4"), ["4SampleRepeat", "4", "", ""]; is_deeply parse("5SampleRepeat5D"), ["5SampleRepeat", "5", "D", ""]; done_testing; [download] Sample output: `# { # 'RS' => { # 'P_ROOTCODE' => 'RS' # } # } ok 1 # { # 'RC1XY' => { # 'DAY1' => '1', # 'P_MON_CODE' => 'X', # 'P_NEW_MON_CODE' => 'Y', # 'P_ROOTCODE' => 'RC' # } # } ok 2 # { # 'RW12QW1X' => { # 'DAY1' => '1', # 'P_MON_CODE' => 'X', # 'P_ROOTCODE' => 'RW12QW' # } # } ok 3 ... ok 9 1..9` [download] If that still doesn't suit your needs, add more test cases.	[reply] [d/l] [select]
Re^2: Question on Regular Expression by AnomalousMonk (Archbishop) on Dec 30, 2014 at 18:38 UTC
FWIW, here's another variation. It has the advantage of producing empty strings rather than undefined values for absent pattern elements, so no need for a conversion step. It also has the advantage, if such it be, of not using named captures with the possible overhead of tied-hashery. It passes all tests above. `$string =~ m{ \A ([[:alnum:]]+?) (?= (?: \d+ [[:upper:]]{0,2})? \z) (\d*) ([[:upper:]]?) ([[:upper:]]?) \z }xms;` [download] Give a man a fish: `<%-(-(-(-<`	[reply] [d/l] [select]
Re^2: Question on Regular Expression by sjain (Initiate) on Dec 29, 2014 at 21:50 UTC
THANK YOU VERY MUCH !!! I really appreciate your prompt and quick response.	[reply]