in reply to Re: Re: Re: Code critique XS function for extracting a blessed regex's pattern.
in thread Code critique XS function for extracting a blessed regex's pattern.

Because it uses an undocumented "feature" of an undocumented quasi-type to provide functionality of questionable necessity to people writing ill-conceived code.

Them's pretty strong words you are using there dude.

First off there are many "feature"s of perl that are not properly documented. This is probably natural given that the code changes much faster than the documentation. Nevertheless you have a point. I will request Hugo make a decision on this, and if it is determined that it is a feature then I will provide a patch for perlop so that it becomes a documented feature. (As I said this is not uncommon at all.)

Second off, it seems that you have been stuck by the "since I can't see a good reason to do this there must not be a good reason" bug. One of my hobbies is writing an improved Dumper. Being able to correctly dump an object that is in fact a blessed qr// is very useful. Both for data storage purposes, also for development use.

Personally I don't think that an improved dumper is ill-conceived, and the functionality is required if the dumper is going to be complete.

Now, keep that in mind as you reconsider the issue of whether Regexp thingies should maintain their magic after being reblessed into another class. It can lead to inconsistent behavior. For instance:

I fail to see why this behaviour is inconsistent. One item is a regex, the other item is not. Since they are different the fact that they behave different can hardly come to a suprise to anyone. The only aspect of this that makes it seem inconsistent is that under normal circumstances you cant tell whats different. Your argument seems to amount to saying that "Since you cant distinguish a blessed scalar ref from a blessed qr// you shouldnt implement a way to do so." Which hardly seems like a logical position to take.

(All of which begs the question of whether we should even rely on qr// returning a blessed reference in the first place.)

I believe that it is a feature. And one that is exploited too. I think it is extremely unlikely that this behaviour will change, and if it does it will change over several versions as it must be deprecated first, then eliminated. Either way, the decision of Hugo will resolve this.

The whole mess gets even stickier when you consider that strings can be used in much the same way that precompiled regexes are.

Precisely the problem I am trying to address. How do I tell a string from a regex? Consider I might have a search routine. If you pass in a string it finds all the elements that equal that string exactly. If you pass in a regex it finds all the elements that match the regex. Being able to distinguish the two seems to be of obvious utility.

That's not very nice behavior given that it isn't, AFAIK, documented that you shouldn't write a Regexp package of your own.

I dont get it. This is exactly the behaviour I would expect given that it is not documented that you shouldn't write a Regexp package of your own.

--- demerphq
my friends call me, usually because I'm late....

  • Comment on Re: Re: Re: Re: Code critique XS function for extracting a blessed regex's pattern.

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Code critique XS function for extracting a blessed regex's pattern.
by sauoq (Abbot) on Feb 06, 2003 at 22:23 UTC
    Them's pretty strong words you are using there dude.

    Strong yes, but I didn't choose them lightly either. Here's an annotated version:

    Because it uses an undocumented "feature"(the fact that the thingies returned by qr// maintain their magic after being reblessed into another class is undocumented) of an undocumented quasi-type (the thingies themselves are references blessed into the non-existent Regexp package) to provide functionality of questionable necessity (even if someone chooses to rebless the refs returned by qr//, the need to differentiate them from other types of references should not be common and should be altogether avoidable) to people writing ill-conceived code (using the refs returned by qr// directly is ill-conceived; the benefits are minor and the drawback, code that will break if the underlying implementation of those Regexp thingies changes, is relatively great.)
    The implementation, IMHO, should change. The more people rely on the current implementation the harder that will be.

    If you really think this implementation is the way things should remain, go ahead and submit a doc patch. I suspect it has remained undocumented for almost 5 years(! since 5.005) not because no one had the time or inclination but because no one wants to lock us into the current way of doing things.

    Personally I don't think that an improved dumper is ill-conceived

    I think you misunderstood. My point was that anyone that can make use of this feature is probably writing ill-conceived code. I can understand adding this functionality to Dumper as a matter of completeness. I could also understand leaving it out of Dumper on the basis that references originating from qr// retaining their regexp magic when reblessed is an undocumented and questionable "feature".

    I fail to see why this behaviour is inconsistent. One item is a regex, the other item is not.

    What is a "regex?" One item had been a Regexp, but it was reblessed. Both items were blessed into the 'P' package. If we accept what perldoc -f bless tells us then both items were objects in the P package. The P package implemented its own stringification. According to perlop in reference to the binding operator, "if the right argument is an expression rather than a search pattern, substitution, or transliteration, it is interpreted as a search pattern at run time," so a module author should reasonably be able to expect their stringification to act as search pattern at run time when used as one. But that's not how the reference originating from qr// works. So, two references both blessed into the same class can act very differently depending only on their origin. Do you see now how that is inconsistent?

    I believe that it is a feature. And one that is exploited too. I think it is extremely unlikely that this behaviour will change, and if it does it will change over several versions as it must be deprecated first, then eliminated.

    I don't think of it as a feature. I don't know where it is exploited but I'd be curious to see any examples. I hope the behavior does change; I'd like to see Regexps elevated to a true Perl type (named REGEX per Larry's recommendation.) I don't think that changing it should be a long process requiring anything but a minimal deprecation period as it is essentially an implementation detail whose public exposure has so far remained undocumented.

    Precisely the problem I am trying to address. How do I tell a string from a regex?

    Ideally, it shouldn't be so difficult to do in the first place, right? You should be able to use code like if ref $r eq 'Regexp' or if UNIVERSAL::isa($r, 'Regexp') and the fact that you can't is a symptom rather than the real problem itself. You've got a work around for the symptom, but the real problem remains.

    I dont get it. This is exactly the behaviour I would expect given that it is not documented that you shouldn't write a Regexp package of your own.

    It isn't documented that you shouldn't write a Foo package of your own either and yet, if you do, the thingies returned by qr// won't suddenly get access to your Foo methods. As an illustration:

    #!/usr/bin/perl -w use strict; package Foo; sub foo { print "foo\n" } package Regexp; sub bar { print "bar\n" } package main; qr//->bar; # Works even though it probably shouldn't. qr//->foo; # Errors as expected.
    So, you must know about an undocumented feature in order to understand why qr//->bar; actually works in the above code. And that's the behavior you expect? Why on Earth would you expect that?

    By the way...

    $ perl -MB -le '$r=bless qr//,"P"; print B::svref_2object($r)->MAGIC-> +TYPE' r

    -sauoq
    "My two cents aren't worth a dime.";
    
      Your argument rests on the premise that this is a bug and not a feature, a postion you come to by there not being a Regexp.pm module, and the lack of documentation of this behaviour. Unfortunately this is from what my research reveals an incorrect conclusion. Somehow I missed a series of mails on P5P (embarrassingly as some of them were CC'd to me directly) that this behaviour is certainly considered to be a feature and that code has been applied to bleadperl (Patch #17813) that does something like what my XS above does. If it has not been done by someone else then I will provide documentation patches to describe the behaviour that Hugo (or Larry) decides is appropriate.

      However your position that this is not the best design is probably correct. If REGEX becomes a type, then presumably it would be a ref. Then it could be blessed and isa($foo,'REGEX') would work properly. I doubt that all this will happen though. I beleive the general mentality is that clean implementation arguments should be kept to Perl 6. Major changes like this probably wont happen in perl 5 line. Thats just my impression though.

      Anyway. that over, id like to say that im not entirely unsympathetic to your points. I dont think that this reply will do justice to my thoughts on your points. I regret not being able to discuss this over a beer. :-) Anyway here are some points I thought are interesting.

      I could also understand leaving it out of Dumper on the basis that references originating from qr//retaining their regexp magic when reblessed is an undocumented and questionable "feature"

      The type of dumper I have in mind is more for debugging/development purposes. Whether people should or should not rebless qr//'s is a moot point when you consider that it could happen by accident, and having an easy way to see that it has ("huh, whats that 'bless qr/foo/' doing there!?") just might save some poor soul (like me :-) from pulling whats left of their hair out. For instance Data::Dumper isnt very good with references to scalars, or with aliases, nor with read only values. All three can be the root of bizarreness and having a dumper that is able to display them distinctively is IMO a useful thing.

      So, two references both blessed into the same class can act very differently depending only on their origin. Do you see now how that is inconsistent?

      I dont agree with your analysis. The product of a qr//, blessed or not, is a search pattern. Theres nothing that says otherwise and much that indicates the contrary. As you keep pointing out this is undocumented behaviour generally speaking. Since nothing says that it ceases to be a search pattern once blessed I see no reason that that intrepretation is any better than its opposite, in fact I tend to lean the other way. Consider that no law in perl says that all members of the same class have to be of the same type. Its an interesting trick that works nicely with trees. Internal nodes are Hashes, leaf nodes are arrays. Things like that.

      I'd like to see Regexps elevated to a true Perl type (named REGEX per Larry's recommendation.) I don't think that changing it should be a long process requiring anything but a minimal deprecation period as it is essentially an implementation detail whose public exposure has so far remained undocumented

      I agree that the lack of documentation in this area is annoying, and should be corrected, and as I said earlier the implementation doesnt seem the cleanest, but its what we have, and it doesnt look to me from my trawling of P5P archives like its going to change in a big way. Larray suggested the REGEX idea two years ago or so. It never happened, and it doesnt look like its going to happen. In fact theres a bunch of code in the Regexp:: domain (Regexp::Common) Documenting Regexp and providing a baseclass would at least put the issue to rest. Incidentally as I was trawling I came across a quote that I thought was illuminating *grin*


        |> japhy wrote:
        |> :What is the requested/suggested namespace for modules dealing with regular
        |> :expressions?
        |> 
        |> I don't think it is useful to ask that here - the chances of consensus
        |> seem vanishingly small.
        

      Ideally, it shouldn't be so difficult to do in the first place, right? You shouldbe able to use code like if ref $r eq 'Regexp'or if UNIVERSAL::isa($r, 'Regexp')and the fact that you can'tis a symptom rather than the real problem itself. You've got a work around for the symptom, but the real problem remains.

      Looks like there was work done to address these deficiencies in bleadperl. I havent looked into in detail yet. I agree its an issue in 5.6, but a workaround is better than none.

      So, you mustknow about an undocumentedfeature in order to understand why qr//->bar;actually works in the above code.

      This is true. But I have to admit that I think that the average programmer that didnt know it would try perl -e "print ref qr//;" and see. :-)

      perl -MB -le '$r=bless qr//,"P"; print B::svref_2object($r)->MAGIC->TYPE'

      Yes I know. I just dont like it. It does basically the same thing as what the XS does, but from a perl side, and it doesnt expose the pattern or modifiers from what I can tell. And reblessing it, extracting the pattern, and unblessing just doesnt seem clean in comparison to the XS. Maybe thats just me. Although I suppose

      sub regexp($) { my $r=shift; return unless B::svref_2object($r)->MAGIC->TYPE =~/r/; my $pattern=ref $r eq 'Regexp' ? "$r" : ''.qr/$r/; if (wantarray) { my $mods; $pattern=~s/^\(\?([msix]*)(?:-[msix]+)?:/$mods=$1; ""/e or die "Error! $r $pattern"; chop $pattern; return ($pattern,$mods); } else { return $pattern } }
      does the same thing as the XS. But I think the XS would be a lot faster. :-)

      Anyway sauoq this has been a very interesting mail for me. I've learned a bunch in the process. Cheers,

      --- demerphq
      my friends call me, usually because I'm late....

        Your argument rests on the premise that this is a bug and not a feature, a postion you come to by there not being a Regexp.pm module, and the lack of documentation of this behaviour.

        Well, I think that may be oversimplifying my position a touch. I do admit that it would be a moot point if it were documented but I think I'd rather see it changed than documented. Well, to clarify, I think that there should be a Regexp module, it should be documented, and the behavior that these thingies retain magic through reblessing should be changed. I can see the usefulness of that behavior but I am not convinced that it outweighs the inconsistency it causes. More on that (but not much more) in a bit.

        I regret not being able to discuss this over a beer.

        Hear, hear! :-)

        The type of dumper I have in mind is more for debugging/development purposes.

        By all means then! A debugging tool absolutely should tell you what is really going on regardless of whether what is really going on is documented. You certainly have my... er... "blessing."

        I dont agree with your analysis. The product of a qr//, blessed or not, is a search pattern. Theres nothing that says otherwise and much that indicates the contrary.

        Here's the "more on that" that I promised... I think we mostly agree, actually. The product of qr// is a search pattern. I just find it disappointing that it is also a reference which acts differently than other references in a few (admittedly rare) cases. Because of this inconsistency we can't write a module with a constructor that takes any reference and still have the expectation that our stringification will act as a search pattern when the object is used on the right side of the binding operator.

        I fully understand that this isn't a huge loss. Aesthetically, it irks me though. Oh well. One man may see a hairy mole where another sees a beauty mark, I guess. Realistically, it is probably of little consequence. I can't even say that I've been bitten by it, but I do imagine there are a few that have been in one way or another.

        I've enjoyed this exchange as well, demerphq. Thanks for bringing it up.

        -sauoq
        "My two cents aren't worth a dime.";
        
        return unless B::svref_2object($r)->MAGIC->TYPE =~/r/;
        The object returned by svref_2object may not have a MAGIC method, so either wrap an eval {} around part of that or check that B::class(B::svref_2object($r)) eq "PVMG".