Re^2: Pattern matching

Replies are listed 'Best First'.
Re^3: Pattern matching by parv (Parson) on Nov 10, 2018 at 09:35 UTC
`m{ # Word boundary. \b # Start capture of pattern matched; ( # literal string "MODULE", MODULE # one or more space characters, \s+ # one or more A-Z letters (represented as character class), [A-Z]+ # one or more 0-9 digits, [0-9]+ # stop capture. ) # Zero or more space characters. \s* # 1-element character class, or "escaped" "(" (not start of captur +e); [(] # any & everything until ... .+? # ... literal ")". [)] }x # /x flag allows to expand the regex as you see above & mentio +ned elsewhere.` [download]	[reply] [d/l]
Re^3: Pattern matching by AnomalousMonk (Archbishop) on Nov 10, 2018 at 23:50 UTC
... can you explain to me how to read the patterns ... Because parv's regex contains nothing that is not supported by Perl version 5.6, the YAPE::Regex::Explain module can help. c:\@Work\Perl\monks>perl -wMstrict -le "use YAPE::Regex::Explain; ;; my $rx = qr{ \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] }x; ;; print YAPE::Regex::Explain->new($rx)->explain; " The regular expression: (?x-ims: \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] ) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- MODULE 'MODULE' ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [A-Z]+ any character of: 'A' to 'Z' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [0-9]+ any character of: '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [(] any character of: '(' ---------------------------------------------------------------------- .+? any character except \n (1 or more times (matching the least amount possible)) ---------------------------------------------------------------------- [)] any character of: ')' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: Pattern matching by parv (Parson) on Nov 11, 2018 at 01:46 UTC
Thanks to you & kevbot for posting about YAPE::Regex::Explain. That was what I wanted to do too before posting my explanation. Read more... (Y::R::E install was too much work consuming too much time in order to avoid the work) (1177 Bytes) In order to use the Y::R::E module installed in my own directory with system perl, I needed to set $PREL5LIB obviously. But ... `export PERL5LIB="/dir/lib/perl5"` [download] ... was not enough. I had to add 2 more sub-directories ... `export PERL5LIB="/dir/lib/perl5:/dir/lib/perl5/site_perl:/dir/lib/perl +5/site_perl/mach"` [download] ... why could perl not find the last two directory paths by itself in year 2018? (Yes, I am aware the virtues of installing, compiling my own perl. And I love that; had built multiple times on FreeBSD & CentOS.)	[reply] [d/l] [select]
Re^3: Pattern matching by kevbot (Vicar) on Nov 11, 2018 at 00:06 UTC
Hi nursyza, I see that parv already provided you with an explanation of the regex pattern for you. I wanted to let you know that you can use the YAPE::Regex::Explain module to provide an explanation of any regular expression pattern. Once you have the package installed you can do something like this at the command line to get the explanation for your pattern Read more... (4 kB) You may also want to look at perlre to get more familiar with regular expressions. UPDATE: As parv, soonix, and AnomalousMonk pointed out (in the replies to this node), the above usage of YAPE::Regex::Explain is not correct. Passing the regex as a double-quoted string caused problems. The following code gives the correct output `#!/usr/bin/env perl use strict; use warnings; use YAPE::Regex::Explain; my $re = qr/ \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] /x; my $exp = YAPE::Regex::Explain->new($re)->explain; print $exp; exit;` [download] Here is the output The regular expression: (?x-ims: \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] ) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- MODULE 'MODULE' ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [A-Z]+ any character of: 'A' to 'Z' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [0-9]+ any character of: '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- [(] any character of: '(' ---------------------------------------------------------------------- .+? any character except \n (1 or more times (matching the least amount possible)) ---------------------------------------------------------------------- [)] any character of: ')' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download]	[reply] [d/l] [select]
Re^4: Pattern matching by parv (Parson) on Nov 11, 2018 at 01:53 UTC
The output of your Y::R::E is much different than the one provided by AnomalousMonk. Yours is missing word boundary (\b) & space characters (\s). Is that due to problem with copy-paste or your version of Y::R::E module?	[reply]
Re^5: Pattern matching by soonix (Chancellor) on Nov 11, 2018 at 21:18 UTC
Most probably due to passing the regex as string instead of using qr. Besides: the /x flag is missing, too, for the same reason.	[reply]
Re^6: Pattern matching by AnomalousMonk (Archbishop) on Nov 11, 2018 at 21:28 UTC

m{
   #  Word boundary.
   \b
   #  Start capture of pattern matched;
   (
     #  literal string "MODULE",
     MODULE
     # one or more space characters,
     \s+
     #  one or more A-Z letters (represented as character class),
     [A-Z]+
     #  one or more 0-9 digits,
     [0-9]+
   #  stop capture.
   )
   #  Zero or more space characters.
   \s*
   #  1-element character class, or "escaped" "(" (not start of captur
+e);
   [(] 
   #  any & everything until ...
   .+?
   #  ... literal ")".
   [)]
 }x    #  /x flag allows to expand the regex as you see above & mentio
+ned elsewhere.
[download]

[reply]
[d/l]

... can you explain to me how to read the patterns ...

Because parv's regex contains nothing that is not supported by Perl version 5.6, the YAPE::Regex::Explain module can help.

c:\@Work\Perl\monks>perl -wMstrict -le
"use YAPE::Regex::Explain;
 ;;
 my $rx = qr{ \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] }x;
 ;;
 print YAPE::Regex::Explain->new($rx)->explain;
"
The regular expression:

(?x-ims: \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] )

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?x-ims:                 group, but do not capture (disregarding
                         whitespace and comments) (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n):
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    MODULE                   'MODULE'
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  [(]                      any character of: '('
----------------------------------------------------------------------
  .+?                      any character except \n (1 or more times
                           (matching the least amount possible))
----------------------------------------------------------------------
  [)]                      any character of: ')'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

Give a man a fish: <%-{-{-{-<

[reply]
[d/l]
[select]

Thanks to you & kevbot for posting about YAPE::Regex::Explain. That was what I wanted to do too before posting my explanation.

Read more... (Y::R::E install was too much work consuming too much time in order to avoid the work) (1177 Bytes)

In order to use the Y::R::E module installed in my own directory with system perl, I needed to set $PREL5LIB obviously. But ...

export PERL5LIB="/dir/lib/perl5"
[download]

... was not enough. I had to add 2 more sub-directories ...

export PERL5LIB="/dir/lib/perl5:/dir/lib/perl5/site_perl:/dir/lib/perl
+5/site_perl/mach"
[download]

... why could perl not find the last two directory paths by itself in year 2018? (Yes, I am aware the virtues of installing, compiling my own perl. And I love that; had built multiple times on FreeBSD & CentOS.)

[reply]
[d/l]
[select]

nursyza

I see that parv already provided you with an explanation of the regex pattern for you. I wanted to let you know that you can use the YAPE::Regex::Explain module to provide an explanation of any regular expression pattern. Once you have the package installed you can do something like this at the command line to get the explanation for your pattern

Read more... (4 kB)

You may also want to look at perlre to get more familiar with regular expressions.

UPDATE: As parv, soonix, and AnomalousMonk pointed out (in the replies to this node), the above usage of YAPE::Regex::Explain is not correct. Passing the regex as a double-quoted string caused problems.

The following code gives the correct output

#!/usr/bin/env perl

use strict;
use warnings;

use YAPE::Regex::Explain;

my $re = qr/ \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] /x;

my $exp = YAPE::Regex::Explain->new($re)->explain;

print $exp;

exit;
[download]

The regular expression:

(?x-ims: \b (MODULE \s+ [A-Z]+[0-9]+) \s* [(] .+? [)] )

matches as follows:
  
NODE                     EXPLANATION
----------------------------------------------------------------------
(?x-ims:                 group, but do not capture (disregarding
                         whitespace and comments) (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n):
----------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    MODULE                   'MODULE'
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  [(]                      any character of: '('
----------------------------------------------------------------------
  .+?                      any character except \n (1 or more times
                           (matching the least amount possible))
----------------------------------------------------------------------
  [)]                      any character of: ')'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

[reply]
[d/l]
[select]

The output of your Y::R::E is much different than the one provided by AnomalousMonk. Yours is missing word boundary (\b) & space characters (\s). Is that due to problem with copy-paste or your version of Y::R::E module?

[reply]

qr

[reply]