Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: Regular Expression Test

by Discipulus (Abbot)
on May 06, 2017 at 12:44 UTC ( #1189662=note: print w/replies, xml ) Need Help??


in reply to Re: Regular Expression Test
in thread Regular Expression Test

tybalt89++

I'd like to understand it.. ;=)

Is this a lex parser as noted in perlre ?

It just support the test cases? or can be extended to support more regex operators?

I was trying a raw approach but when i saw your i gave up.

Brilliant regex hacker!

UPDATE: my attempt with the simplicistic data sample proposed in OP (show representative data as corollary to Know your data principle), was something like:

use strict; use warnings; while(<DATA>){ chomp; my $rx; eval{ $rx = qr($_) }; if($@){print "-->$_<-- INVALID: ",lc $@=~s/$0 line \d+,//r; next +} print "$rx\t=>\t"; s/[\(\)]//g; foreach my $part (split /\|/){ if ($part=~/(\d+)?\[\d+\]/){ print join ' ', grep{/$rx/}$1.0 .. $1.9; print ' '; } else{print "$part "} } print "\n"; } __DATA__ (65|70) (3999[678]|4[1678]) 5[45] (6[4569]|7[01]) (((broken (65|70) # out (?^:(65|70)) => 65 70 (?^:(3999[678]|4[1678])) => 39996 39997 39998 41 46 47 48 (?^:5[45]) => 54 55 (?^:(6[4569]|7[01])) => 64 65 66 69 70 71 -->(((broken<-- INVALID: unmatched ( in regex; marked by <-- here in m +/((( <-- here broken/ at <data> line 5. (?^:(65|70)) => 65 70

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^3: Regular Expression Test
by tybalt89 (Prior) on May 06, 2017 at 18:44 UTC

    This is not just a "lex"-like scanner as noted in perlre, but a fully recursive parser (note the calls to expr() inside some of the actions). It will properly handle multiple levels of parentheses.

    The advice I would give is to instrument it up with debug prints to see what's happening, also adding a recursion level counter to track recursive calls.

    It is loosely based on this expression parser I wrote a while ago:

    #!/usr/bin/perl use strict; # mini.pl - modified Pratt parser by tybalt89 use warnings; # https://en.wikipedia.org/wiki/Pratt_parser sub error { die "ERROR ", s/\G/ <@_> /r, "\n" } sub expr # two statement parser - precedences: (3 **) (2 * /) (1 + -) { my $answer = /\G\s* ((?:\d+(?:\.\d*)?|\.\d+)(e[+-]?\d+)?) /gcxi ? $1 : /\G\s*\(/gc ? (expr(0), /\G\s*\)/gc || error 'missing )')[0] : /\G\s* - /gcx ? -expr(3) : # unary minus /\G\s* \+ /gcx ? +expr(3) : # unary plus error 'bad operand'; $answer = $_[0] <= 3 && /\G\s* \*\* /gcx ? $answer ** expr(3) : $_[0] <= 2 && /\G\s* \* /gcx ? $answer * expr(3) : $_[0] <= 2 && /\G\s* \/ /gcx ? $answer / expr(3) : $_[0] <= 1 && /\G\s* \+ /gcx ? $answer + expr(2) : $_[0] <= 1 && /\G\s* \- /gcx ? $answer - expr(2) : return $answer while 1; } for ( @ARGV ? @ARGV : scalar <> ) # source as commandline args or stdi +n { my $answer = expr(0); /\G\s*\z/gc ? print s/\s*\z/ = $answer\n/r : error 'incomplete parse +'; }

    which was also the basis for this Re: Parsing Boolean expressions

    but it doesn't need the precedence stuff.

      Tybalt, this arithmatic evaluator of your is much simpler and more useful than what I've managed to find on CPAN.

      I suggest you create a simpler interface for it - such as my $ans = expr($expression); - then make it in to a CPAN module.

Re^3: Regular Expression Test
by tybalt89 (Prior) on May 06, 2017 at 22:31 UTC

    Here's the same code with one debug print added to show what's going on at the beginning of each time around the "while 1" loop.

    #!/usr/bin/perl # http://perlmonks.org/?node_id=1189605 use strict; use warnings; $| = 1; sub err { die "ERROR: ", s/\G/<@_>/r, "\n" } sub crossmap { my ($left, $right) = @_; [ map { my $prefix = $_ ; map $prefix.$_, @$right } @$left ]; } sub expr { my $answer = [ '' ]; print("\ndebug ", s/\G/<<you are here with [@$answer]>>/r, " end " +), $answer = /\G\|/gc ? [ @$answer, @{ expr() } ] : /\G\d+/gc ? crossmap $answer, [ $& ] : /\G\[\d+\]/gc ? crossmap $answer, [ $& =~ /\d/g ] : /\G\(/gc ? ( crossmap($answer, expr()), /\G\)/gc || err "missing ')'")[0] : return $answer while 1; } while(<DATA>) { chomp; print "$_ => "; my $answer = expr(); /\G\z/gc or err "incomplete parse"; local $" = ','; print "@$answer\n"; } __DATA__ (65|70) (3[678]|4[1678]) 5[45] (6[4569]|7[01])

    It prints the following ( showing both where the current position in the input is (the \G) and the contents of the variable $answer at the time).:

    (65|70) => debug <<you are here with []>>(65|70) end debug (<<you are here with []>>65|70) end debug (65<<you are here with [65]>>|70) end debug (65|<<you are here with []>>70) end debug (65|70<<you are here with [70]>>) end debug (65|70<<you are here with [65 70]>>) end debug (65|70)<<you are here with [65 70]>> end 65,70 (3[678]|4[1678]) => debug <<you are here with []>>(3[678]|4[1678]) end debug (<<you are here with []>>3[678]|4[1678]) end debug (3<<you are here with [3]>>[678]|4[1678]) end debug (3[678]<<you are here with [36 37 38]>>|4[1678]) end debug (3[678]|<<you are here with []>>4[1678]) end debug (3[678]|4<<you are here with [4]>>[1678]) end debug (3[678]|4[1678]<<you are here with [41 46 47 48]>>) end debug (3[678]|4[1678]<<you are here with [36 37 38 41 46 47 48]>>) end + debug (3[678]|4[1678])<<you are here with [36 37 38 41 46 47 48]>> end + 36,37,38,41,46,47,48 5[45] => debug <<you are here with []>>5[45] end debug 5<<you are here with [5]>>[45] end debug 5[45]<<you are here with [54 55]>> end 54,55 (6[4569]|7[01]) => debug <<you are here with []>>(6[4569]|7[01]) end debug (<<you are here with []>>6[4569]|7[01]) end debug (6<<you are here with [6]>>[4569]|7[01]) end debug (6[4569]<<you are here with [64 65 66 69]>>|7[01]) end debug (6[4569]|<<you are here with []>>7[01]) end debug (6[4569]|7<<you are here with [7]>>[01]) end debug (6[4569]|7[01]<<you are here with [70 71]>>) end debug (6[4569]|7[01]<<you are here with [64 65 66 69 70 71]>>) end debug (6[4569]|7[01])<<you are here with [64 65 66 69 70 71]>> end 64 +,65,66,69,70,71

    I hope this is helpful. At each debug output you can see what the /\G.../gc has stepped over and what the new value of $answer is.

    My problem with explaining this type of parser is that I have been working with parsers like this for well over a year, and it all comes as second nature to me. If you have more specific questions I'll be willing to take a shot at answering them.

Re^3: Regular Expression Test
by LanX (Sage) on May 06, 2017 at 19:46 UTC
    Hi Discipulus

    tybalt89 is combining the \G meta with //gc modifiers

    • \G achors the search at the pos where the last regex left off
    • unless you used the //c modifier, the pos of a previous regex was erased when it didn't match
    This allows to split a big complicated regex into multiple perl statements.

    And after combining with recursive calls this permits to parse nested structures.

    See perldocs for examples and details of this technique (e.g. perlretut ) or Friedl's Regex Book.

    Or even better older threads in the monastery :)

    You shouldn't be despaired, the demonstrated code is neither easy to read nor maintain.

    It should better use //x modifiers and more comments, but obfuscated wizardry is of course cooler! ;-))

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      Could be worse :)

      #!/usr/bin/perl # http://perlmonks.org/?node_id=1189605 use strict; use warnings; $| = 1; sub err { die "ERROR: ", s/\G/<@_>/r, "\n" } sub crossmap # this is clear, isn't it? { my ($left, $right) = @_; [ map { my $prefix = $_ ; map $prefix.$_, @$right } @$left ]; } sub expr { my $answer = [ '' ]; while( /\G ( \| (?{ 'ALTERNATION' }) | \d+ (?{ 'STRING' }) | \[ \d+ \] (?{ 'CHARACTERCLASS' }) | \( (?{ 'PARENTHESIS' }) ) /gcx ) { goto $^R; ALTERNATION: $answer = [ @$answer, @{ expr() } ]; next; STRING: $answer = crossmap $answer, [ $1 ]; next; CHARACTERCLASS: $answer = crossmap $answer, [ $1 =~ /\d/g ]; next; PARENTHESIS: $answer = crossmap $answer, expr(); /\G\)/gc or err "missing ')'"; } return $answer; } while(<DATA>) { chomp; print "$_ => "; my $answer = expr(); /\G\z/gc or err "incomplete parse"; local $" = ','; print "@$answer\n"; } __DATA__ (65|70) (3[678]|4[1678]) 5[45] (6[4569]|7[01])
      You shouldn't be despaired, the demonstrated code is neither easy to read nor maintain.

      Depends on who's reading it, and who's maintaining it. If so.

      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
        Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you post live.

        originally from ...

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1189662]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2022-05-22 02:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (78 votes). Check out past polls.

    Notices?