in reply to Why are 5.10's named captures read only?

What I'd like to see is a way to get the positions of named captures in the string, analog to the @- and @+ variables.

Replies are listed 'Best First'.
Re^2: Why are 5.10's named captures read only? (step 3)
by tye (Sage) on Oct 20, 2008 at 13:42 UTC

    Yes, indeed.

    However, just like exposing only the string value of the named captures (as Perl 5.010 does) is convenient (for the user) but doesn't allow for the full abstraction of the feature (leaving no way to reliably find the offsets), so too would exposing the offsets not be the full abstaction and would still leave off a useful feature.

    I'd like a way to (reliably) get at the number of the (numbered) capture that matches the named capture. That would allow one to then get at the offsets for any named capture which would then allow one to get at the substring matched.

    I may end up parsing the regex myself, since I also would like to know when a capture is part of a look-ahead or look-behind. But the regex syntax has had so many enhancements added recently that parsing regexes currently looks like something that will require timely maintenance.

    - tye        

      I don't really understand how exposing the offsets and length would not be the full abstraction. Exposing the number of the numbered capture might still be a better interface even that way.

        I don't really understand how exposing the offsets and length would not be the full abstraction.

        Given a way to look up the number of a named match, it is easy to precisely determine the offsets (it is just an array look-up in @+ and @-). Just having a way to look up the offsets doesn't allow you to reliably determine the number of the match.

        So I would prefer that the mapping from name to number be exposed over the mapping from name to offsets. And one more hash that you use to look up offset inside of @+ and @- almost seems a better interface than exposing a pair of hashes or a hash that contains values that are two-element arrays anyway.

        - tye        

Re^2: Why are 5.10's named captures read only?
by blazar (Canon) on Oct 20, 2008 at 12:02 UTC

    ISTR (and would expect anyway) that in Perl 6 there's provision for all these kinda things, everything being an object, by means of suitable methods. I suppose that under Perl 5 you would expect yet another pair of special variables instead. But which ones? All of the good ones seem to be gone, and also many of the bad ones!!

    Perhaps, since AFAICT %_ is always free, it may have been chosen to hold the named captures instead of %+, and %+ and %- to hold the info you need, for analogy with @+ and @-... Before posting I was also thinking that perhaps %- were free, but it's not the case: it is... "%+ on steroids..."

    --
    If you can't understand the incipit, then please check the IPB Campaign.
      There are a gazillion variables "free" to choice from, unless you insist on one-character punctuation variables. Frankly, I don't think you need a one-character punctuation variable for this, and
      %{^MATCH_OFFSETS}
      will do fine. Personally, I'd like the values being arrays of arrays, the inner arrays 2 elements, the index of the start of the match, and the index just after the end of the match. (that is, similar to @- and @+). The outer array will hold as many captures with that name there are, so if you have:
      "abc" =~ /(?<l>[a-z])(?<l>[a-z])(?<l>[a-z])/
      the result is:
      %{^MATCH_OFFSETS} = ('l' => [[0, 1], [1, 2], [2. 3]]);

      I'm also pretty sure that if someone write a patch, it will be added to Perl.

        I personally believe it's clear enough that I don't insist but would certainly like... one-character punctuation variables which are nice, too! ;) I fully second the rest of your suggestion, BTW.

        If %- were not taken, I would have expected it to hold that kind of info. Methods would be the best thing though... I'm playing more and more with autobox but I know it doesn't fit too well in Perl 5's like typical mindset. (Apart the fact that its very docs warn about it not being real autoboxing!) And I don't know how would it square playing with such special variables as the ones we're discussing here...

        --
        If you can't understand the incipit, then please check the IPB Campaign.