in reply to Re: Re: Re: Re: Re: my versus our in nested regex
in thread my versus our in nested regex

I wonder what chance there is of getting a "capture to named vars" contruct added?

I mentioned it once on p5p, but to a deafenaning silence. It would be lovely though wouldn't it? Although I can see how there might be serious questions about how it should work. What should happen if there are identical named sections? Where should the results be stored? Possibly %+ or something? Also I can see some of the p5p saying "Perl is not going to use the dotNet syntax."

:-)


---
demerphq

    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi


Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Re: Re: my versus our in nested regex
by BrowserUk (Patriarch) on Oct 18, 2003 at 21:23 UTC

    I think my vision of the way it would work is that a named capture would get stored exactly the same as if

    (...)(?{ $var = $^N })

    That is to say, if a lexical named $var was in scope, it would get the captured string, else a global of that name would. If two blocks named the same var, the second would override the first just as with normal assignment.

    I've no idea what the dotNet syntax is (or even that it had such), but Enlil and I had a discusion about it somewhere a few months ago. Unfortunately, it was tucked down in teh bowels of a thread with an unrelated name, so I can't find it right now.

    Off the top of my head, I think that

    (?$var:...)

    would work. I don't think it would conflict with anything else?

    The one distinction I would make is that if an array was given rather than a scalar,

    (?@array:...){1,10}

    then the captured string would be pushed onto the array. This would allow for captures with repeatition specifiers to do something sensible.

    Perhaps harder to think through is what happens when a regex backtracks through a capture. undef the $var and pop the @array maybe?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

      The dotNet syntax is provided (with problems associated) by Regex::Fields. It looks like

      (?<NAME>PATTERN)

      In dotNet the results are available through the regex object itself. In Perl I would assume that a special hash would be created for the results (In R::F its %{&}). I also suspect that a leftmost-outermost wins rule would be a reasonably useful rule if only one possibility was allowed, otherwise perhaps a hash of arrays would be cool. R::F supports also binding to implicit lexicals.

      Incidentally I understand from another post here that this module causes problems in that it makes global changes to how the regexes are handles. I cant say if this is true however.


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi


        I don't like the idea of leftmost-outermost. That would severely slow the regex engine up if it has to decide whether it should store this capture or not each time it encounters one.

        I'm also not that keen on assigning to an implicit hash or array.

        That just means that again (as with $1, $2 etc.) I have to test for the value having been captured, and then re-assign it to where ever I want it before I call the next regex. The values captured from the first regex will either be over written (as would be the case if "the next regex" was actually the same one in a loop) or discarded if it is a different regex. The main reason for my wanting named regex captures is to avoid these two problems.

        Assigning them directly to where I want them seems simpler and more reliable to me.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!