in reply to Re: Re: Re: Re: Re: Re: my versus our in nested regex
in thread my versus our in nested regex

I think my vision of the way it would work is that a named capture would get stored exactly the same as if

(...)(?{ $var = $^N })

That is to say, if a lexical named $var was in scope, it would get the captured string, else a global of that name would. If two blocks named the same var, the second would override the first just as with normal assignment.

I've no idea what the dotNet syntax is (or even that it had such), but Enlil and I had a discusion about it somewhere a few months ago. Unfortunately, it was tucked down in teh bowels of a thread with an unrelated name, so I can't find it right now.

Off the top of my head, I think that

(?$var:...)

would work. I don't think it would conflict with anything else?

The one distinction I would make is that if an array was given rather than a scalar,

(?@array:...){1,10}

then the captured string would be pushed onto the array. This would allow for captures with repeatition specifiers to do something sensible.

Perhaps harder to think through is what happens when a regex backtracks through a capture. undef the $var and pop the @array maybe?


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Re: Re: Re: my versus our in nested regex
by demerphq (Chancellor) on Oct 18, 2003 at 21:41 UTC

    The dotNet syntax is provided (with problems associated) by Regex::Fields. It looks like

    (?<NAME>PATTERN)

    In dotNet the results are available through the regex object itself. In Perl I would assume that a special hash would be created for the results (In R::F its %{&}). I also suspect that a leftmost-outermost wins rule would be a reasonably useful rule if only one possibility was allowed, otherwise perhaps a hash of arrays would be cool. R::F supports also binding to implicit lexicals.

    Incidentally I understand from another post here that this module causes problems in that it makes global changes to how the regexes are handles. I cant say if this is true however.


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


      I don't like the idea of leftmost-outermost. That would severely slow the regex engine up if it has to decide whether it should store this capture or not each time it encounters one.

      I'm also not that keen on assigning to an implicit hash or array.

      That just means that again (as with $1, $2 etc.) I have to test for the value having been captured, and then re-assign it to where ever I want it before I call the next regex. The values captured from the first regex will either be over written (as would be the case if "the next regex" was actually the same one in a loop) or discarded if it is a different regex. The main reason for my wanting named regex captures is to avoid these two problems.

      Assigning them directly to where I want them seems simpler and more reliable to me.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

        I don't like the idea of leftmost-outermost. That would severely slow the regex engine up if it has to decide whether it should store this capture or not each time it encounters one.

        Presumably this would be resolved at the compile time of the regex, and IMO would be quite simple to resolve.

        Assigning them directly to where I want them seems simpler and more reliable to me.

        Hmm. I see your point (sort of) but I wonder how your angle would work with using qr// multiple times. For (not such a great) instance:

        my $sep=qr/(?<sep>[-\/\\])/; my $date=qr/(?<year>\d{4})$sep(?<month>\d{2})$sep(?<day>\d{2})/

        Hence the reason I thought of a hash of arrays. Also, the embedded $var name would _totally_ violate the quoting rules of perl, perhaps the idea that the names map to lexicals in scope (if defined) otherwise to ones it defines itself? (Which incidentally turns regex LHS's into a variant of a my(). I really dont see how this would work without seriously changing things. Mapping to a hash wouldn't have anywhere near the number of bizarre sideffects.


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi