Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: Parsing and translating Perl Regexes ( PPIx::Regexp::xplain Regexp::Debugger )

by Anonymous Monk
on Nov 05, 2013 at 19:18 UTC ( [id://1061359]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Parsing and translating Perl Regexes ( PPIx::Regexp::xplain Regexp::Debugger )
in thread Parsing and translating Perl Regexes

:) But if I do that I'll have to update my program, and I'm lazy and its hard to juggle :)

Ok, if you're willing to accept this is a roundabout way to report an issue, a work in progress that stalled few months ago, that started organically as a single subroutine walking the PPIx::Regexp tree and grew from there, slowly as I am learning my way around, into its current state, still in need of refactoring ...

I'll post the full code in two followups but here is an excerpt from ppixregexplain.pl from what I thought were bugs that received a "TODO.*BUG" note. If there are any inaccuracies thinkos typos you have been warned :)

ppixregexplain.pl:150:#~ #~ TODO REPORTBUG PPIx::Regexp::Token::Greed +iness NEEDS TO DIE / THERE NEEDS TO BE A SINGLE PPIx::Regexp::Token:: +Quantifier ppixregexplain.pl:325:#~ TODO REPORTBUG perlre/perlretut/re DO NOT LI +NK perldebguts which explains use re 'debug'; output ppixregexplain.pl:1165: "TODO warn REPORT BUG \\g10 UNRECOG +NIZED AS A PPIx::Regexp::Token::Backreference misparsed as PPIx::Reg +exp::Token::Literal (\\g10 is not \\10, can't be treated as octal)", ppixregexplain.pl:1172: push @{$$ret{start}}, 'ERROR warn T +ODO REPORTBUG octals NOT PARSED AS PPIx::Regexp::Node::Range'; ppixregexplain.pl:1470: push @{$$ret{start}}, "ERROR warn +TODO REPORT BUG THIS IS LITERAL ^ NOT NEGATION"; ppixregexplain.pl:1476: push @{$$ret{start}}, "ERROR warn +TODO REPORT BUG A LONE ^ IN A CHARCLASS IS A IS LITERAL ^ NOT NEGATIO +N"; _desc.pl:159:#~ THIS IS A LIE :) TODO REPORT BUG C<PPIx::Regexp::Struc +ture::Unknown> has no descendants. _desc.pl:176:#~ TODO REPORT BUG \g10 UNRECOGNIZED AS A REFERENCE mispa +rsed as PPIx::Regexp::Token::Literal _desc.pl:761:#~ TODO REPORT BUG? SHOULD BE RECOGNIZED AS Token::Modif +ier/GroupType::Modifier _desc.pl:931:'\P' => ['L<perlrecharclass/Unicode Properties>','TODO + warn REPORT BUG for PPIx::Regexp; \PP is \P{Prop} ; for example \PN +is \P{Number}; ' ], _desc.pl:932:'\p' => ['L<perlrecharclass/Unicode Properties>','TODO + warn REPORT BUG for PPIx::Regexp; \pP is \p{Prop} ; for example \pN +is \p{Number}; ' ],

In in furtherance of blind copying, the corresponding entries from my "test suite" (it tests my eyeball interface)

'qr/(?[ ( ( [c^] - [d^] ) | ( [^c] & ( [a] + [2] ) ) ) ])/', ## TODO R +EPORT misparsed as [^c] and [^d] "qr{ \\pP\\p{P}\\PP\\P{P} }x", ## TODO report bug misparsed single let +ter Unicode properties q{qr{[\200-\377\x00-\x1f]}}, ## TODO REPORTBUG NOT PARSED AS RANGE q{qr{[\0-\377]}}, ## TODO REPORTBUG NOT PARSED AS RANGE 'qr{ (?#legals) (?^) (?aa) (?a) (?d) (?l) (?u) (?p) (?i) (?m) (?s) (?x) (?-i) (?-m) (? +-s) (?-x) (?x) (?#NOTlegals) (?aad) (?ad) (?al) (?au) (?uu) (?ll) (?dd) (?-a) (?-d) (?-l) (?-u) (?- +p) (?-^) (?gcer) (?-gcer) }x',,, "m{(?i)a(?-i)B(?i)(?-i:B\\w(?i:q\\w))q}xi",# a case insensitive B case + sensitive
And one more
#~ a is case INsensitive #~ B is case SEnsitive my $re = PPIx::Regexp->new( <<'__GROAN__' ); m{ a (?i: (?-i)B (?i)a (?-i)B (?i)a (?-i)B (?i: a (?-i)B (?i)a (?-i)B (?i)a (?: (?-i)B B ) a [a-a] ) ) a $a_terpolats (?-i:$B_TERPOLATES) }xi __GROAN__

In the code below the modifiers propagation code is in the following definitions (you can copy/paste each line to find the sub definition)

sub xplain_modifiers sub PPIx::Regexp::Token::Modifier::xmods_explode sub PPIx::Regexp::xmods_propagate sub PPIx::Regexp::Element::xmerge_mods sub PPIx::Regexp::Node::xmods_susceptible sub PPIx::Regexp::Node::xmods

Replies are listed 'Best First'.
Re^4: Parsing and translating Perl Regexes ( PPIx::Regexp::xplain ppixregexplain.pl _desc.pl )
by Anonymous Monk on Nov 05, 2013 at 19:40 UTC

      OK, thanks. The object hierarchy may be what it is now, for good or ill. But to the extent I understand your code it is not only a documentation of PPIx::Regexp bugs, but a neat piece of work in its own right.

      I can't guarantee timing, since all sorts of things have come up in the last couple days, but:

      Which of the bugs can I fix without breaking your code? Lacking any preference from you I'll probably start with the \g10 thing, which I hope to be reasonably straightforward.

      Do I need to reserve some method names for your use? Like xplain()? I'm not sure how to document it, but I'm willing at least to avoid a few that you designate.

      Can you be a little clearer on the modifier propagation thing? Or at least point me to the relevant part of your postings? I remember worrying about this during the Perl::Critic integration, but it may be that I was so focused on dealing with 'use re "/xms";' that I missed something in the regex itself.

      Tom Wyant

        OK, thanks. The object hierarchy may be what it is now, for good or ill.

        I understand, naming things is hard and its not like I have better ideas, I still have puns (chits) in my code; you saved me so many months of work I'm just surprised you didn't go the extra mile :D I think I'm worth the effort , don't you? :P

        But to the extent I understand your code it is not only a documentation of PPIx::Regexp bugs, but a neat piece of work in its own right.

        :D

        Which of the bugs can I fix without breaking your code?

        Now, I think you can fix all of them without breaking anything :) that is if they are bugs, bugs you'd consider fixing ; I think I tended to use "bug" as euphemism for "now I have to think some more" ...

        Do I need to reserve some method names for your use? Like xplain()? I'm not sure how to document it, but I'm willing at least to avoid a few that you designate.

        Not sure that you actually have to reserve them , but xplain prefix sounds ok ... i'm very unsure about the whole what/should/go/where/oop deal, some would say don't play with other peoples namespace :)

        Can you be a little clearer on the modifier propagation thing?

        Hmmm, I'll try
        /(?i)foo/ is same as /foo/i, but the "foo" node doesn't know its case insensitive
        /foo/aad says semantics are "d" but "aa" and d are mutually exclusive (regexerror++)
        /\w/aia says semantics are "a" but "aa" and "a" are different

        So I guess its more of a wishlist; I tried to propagate parent modifiers to children using ->modifiers but ->modifiers discarded some information (like errors), so I ended up doing my own exploding :) I also guess the xplicit propagation might not fit with the purpose of tokens, so PPIx::Regexp wasn't doing propagation of modifers, so not much for you to do regarding propagation :)

        Thanks

      bah, can't fork/update the anonymous gist anymore, so here is a patch to explain /(?^:foo$)/m is the same as /(?-xism:foo$)/m

      --- ppixregexplain.pl 2014-02-04 23:58:11.265625000 -0800 +++ ppixregexplain-dodgy-bug.pl 2014-12-30 20:54:31.250000000 -0800 @@ -1159,6 +1159,10 @@ #~ delete @mods{@offers} ; ## OFFERS TRUMP ONNERS $mods{$_}=0 for @offers ; ## OFFERS TRUMP ONNERS + if( $con eq '^' ){ + $mods{$_}=0 for qw/ i m s x /; ## d-imsx http://perldoc. +perl.org/perlre.html#%28?^alupimsx%29 + } + @mods = ( @onners, @offers ); if( $notroot ){

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1061359]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-03-29 05:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found