in reply to Re: Code critique XS function for extracting a blessed regex's pattern.
in thread Code critique XS function for extracting a blessed regex's pattern.

Wow! Im impressed, where were you when I wrote the first node? :-) Very nice idea indeed.

But, unfortunately it doesnt address the question im trying to solve. My question is this: given an arbitrary blessed scalar ref, how does one efficiently determine if the object is in fact a regex? Your solution, which i personally think is rather ingenious, solves "How do I make a blessed ref, when stringified, return the pattern?". Which is I think useful indeed, but unfortunately not what I need. (I recognize I may not have specified the requirement sufficiently.)

Even though this is a solution from the point of view of designing a class, it has the problem that its underlying concept, that of qr//ing the value, doesn't generalize. How do you detect a failure? There would be no way to determine if the wrapped object actually had produced regex, or just a ref stringified, or any number of other magic events.

Anyway, ++ for the idea...

--- demerphq
my friends call me, usually because I'm late....

  • Comment on Re: Re: Code critique XS function for extracting a blessed regex's pattern.

Replies are listed 'Best First'.
Re: Re: Re: Code critique XS function for extracting a blessed regex's pattern.
by sauoq (Abbot) on Feb 06, 2003 at 03:51 UTC
    My question is this: given an arbitrary blessed scalar ref, how does one efficiently determine if the object is in fact a regex?

    OK, I see the "problem" you are trying to solve now. I'm still not sure it is really worth solving though. In fact, your code might do more harm than good if its use became widespread. Why? Because it uses an undocumented "feature" of an undocumented quasi-type to provide functionality of questionable necessity to people writing ill-conceived code.

    Regexp thingies are a terrible kludge. They drift about in limbo, being neither entities of a true Perl type nor normal objects. Yes, you can play some tricks with them but that doesn't mean it is a good idea to do so. The fact that the blessed reference returned by qr// keeps its magical regular expression value after being reblessed into another class is probably not a good thing; it may even be a bug. Regardless, it is undocumented and we shouldn't rely on the behavior. (All of which begs the question of whether we should even rely on qr// returning a blessed reference in the first place.)

    If Regexp objects are elevated to a real Perl type someday, then code like

    my $r = bless qr/foo/, "MyPackage";
    probably won't even work and we'll be forced into writing code that is consistent with other types. Instead of getting a reference directly from qr// we'll have to take a reference to whatever it returns and bless that instead. There's no reason not to do that now. Code like
    my $r = bless \qr/foo/, "MyPackage";
    should continue to work even if Regexps are promoted to a real type. It does require that $$r is used when you want to get at the underlying regular expression but dereferencing isn't that much of an inconvenience, is it?.

    The whole mess gets even stickier when you consider that strings can be used in much the same way that precompiled regexes are.

    $ perl -le 'my $r = "bar"; print "yes" if "foobarbaz" =~ $r' yes
    Now, keep that in mind as you reconsider the issue of whether Regexp thingies should maintain their magic after being reblessed into another class. It can lead to inconsistent behavior. For instance:
    #!/usr/bin/perl -w use strict; package P; use overload '""' => sub { 'stringified' }; package main; local $\ = "\n"; my $regex = qr/match/; bless $regex, 'P'; my $plain = \my $t; bless $plain, 'P'; print '"stringified" matched $regex' if "stringified" =~ $regex; print '"stringified" matched $plain' if "stringified" =~ $plain; __END__ "stringified" matched $plain
    So, because of Regexps, not all references are created equal. Bummer.

    Yet another inconsistency due to the Regexp quasi-pseudo-sorta class is that you can write your own Regexp package and the things returned by qr// get access to your methods.

    #!/usr/bin/perl -w use strict; package Regexp; sub new { my $r; bless \$r } sub f { q("I'm a Regexp.") } package main; local $\ = "\n"; my $qr = qr/foo/; my $ob = Regexp->new(); print '$qr says, ', $qr->f; print '$ob says, ', $ob->f; print '$qr isa Regexp' if $qr->isa('Regexp'); print '$ob isa Regexp' if $ob->isa('Regexp'); print '$qr: ', $qr; print '$ob: ', $ob; __END__ $qr says, "I'm a Regexp." $ob says, "I'm a Regexp." $qr isa Regexp $ob isa Regexp $qr: (?-xism:foo) $ob: Regexp=SCALAR(0x805f148)
    That's not very nice behavior given that it isn't, AFAIK, documented that you shouldn't write a Regexp package of your own.

    All of this leads me to the conclusion that, if someone actually finds your XS code useful, they are almost certainly doing things that they ought not be doing anyway. ;-)

    -sauoq
    "My two cents aren't worth a dime.";
    
      Because it uses an undocumented "feature" of an undocumented quasi-type to provide functionality of questionable necessity to people writing ill-conceived code.

      Them's pretty strong words you are using there dude.

      First off there are many "feature"s of perl that are not properly documented. This is probably natural given that the code changes much faster than the documentation. Nevertheless you have a point. I will request Hugo make a decision on this, and if it is determined that it is a feature then I will provide a patch for perlop so that it becomes a documented feature. (As I said this is not uncommon at all.)

      Second off, it seems that you have been stuck by the "since I can't see a good reason to do this there must not be a good reason" bug. One of my hobbies is writing an improved Dumper. Being able to correctly dump an object that is in fact a blessed qr// is very useful. Both for data storage purposes, also for development use.

      Personally I don't think that an improved dumper is ill-conceived, and the functionality is required if the dumper is going to be complete.

      Now, keep that in mind as you reconsider the issue of whether Regexp thingies should maintain their magic after being reblessed into another class. It can lead to inconsistent behavior. For instance:

      I fail to see why this behaviour is inconsistent. One item is a regex, the other item is not. Since they are different the fact that they behave different can hardly come to a suprise to anyone. The only aspect of this that makes it seem inconsistent is that under normal circumstances you cant tell whats different. Your argument seems to amount to saying that "Since you cant distinguish a blessed scalar ref from a blessed qr// you shouldnt implement a way to do so." Which hardly seems like a logical position to take.

      (All of which begs the question of whether we should even rely on qr// returning a blessed reference in the first place.)

      I believe that it is a feature. And one that is exploited too. I think it is extremely unlikely that this behaviour will change, and if it does it will change over several versions as it must be deprecated first, then eliminated. Either way, the decision of Hugo will resolve this.

      The whole mess gets even stickier when you consider that strings can be used in much the same way that precompiled regexes are.

      Precisely the problem I am trying to address. How do I tell a string from a regex? Consider I might have a search routine. If you pass in a string it finds all the elements that equal that string exactly. If you pass in a regex it finds all the elements that match the regex. Being able to distinguish the two seems to be of obvious utility.

      That's not very nice behavior given that it isn't, AFAIK, documented that you shouldn't write a Regexp package of your own.

      I dont get it. This is exactly the behaviour I would expect given that it is not documented that you shouldn't write a Regexp package of your own.

      --- demerphq
      my friends call me, usually because I'm late....

        Them's pretty strong words you are using there dude.

        Strong yes, but I didn't choose them lightly either. Here's an annotated version:

        Because it uses an undocumented "feature"(the fact that the thingies returned by qr// maintain their magic after being reblessed into another class is undocumented) of an undocumented quasi-type (the thingies themselves are references blessed into the non-existent Regexp package) to provide functionality of questionable necessity (even if someone chooses to rebless the refs returned by qr//, the need to differentiate them from other types of references should not be common and should be altogether avoidable) to people writing ill-conceived code (using the refs returned by qr// directly is ill-conceived; the benefits are minor and the drawback, code that will break if the underlying implementation of those Regexp thingies changes, is relatively great.)
        The implementation, IMHO, should change. The more people rely on the current implementation the harder that will be.

        If you really think this implementation is the way things should remain, go ahead and submit a doc patch. I suspect it has remained undocumented for almost 5 years(! since 5.005) not because no one had the time or inclination but because no one wants to lock us into the current way of doing things.

        Personally I don't think that an improved dumper is ill-conceived

        I think you misunderstood. My point was that anyone that can make use of this feature is probably writing ill-conceived code. I can understand adding this functionality to Dumper as a matter of completeness. I could also understand leaving it out of Dumper on the basis that references originating from qr// retaining their regexp magic when reblessed is an undocumented and questionable "feature".

        I fail to see why this behaviour is inconsistent. One item is a regex, the other item is not.

        What is a "regex?" One item had been a Regexp, but it was reblessed. Both items were blessed into the 'P' package. If we accept what perldoc -f bless tells us then both items were objects in the P package. The P package implemented its own stringification. According to perlop in reference to the binding operator, "if the right argument is an expression rather than a search pattern, substitution, or transliteration, it is interpreted as a search pattern at run time," so a module author should reasonably be able to expect their stringification to act as search pattern at run time when used as one. But that's not how the reference originating from qr// works. So, two references both blessed into the same class can act very differently depending only on their origin. Do you see now how that is inconsistent?

        I believe that it is a feature. And one that is exploited too. I think it is extremely unlikely that this behaviour will change, and if it does it will change over several versions as it must be deprecated first, then eliminated.

        I don't think of it as a feature. I don't know where it is exploited but I'd be curious to see any examples. I hope the behavior does change; I'd like to see Regexps elevated to a true Perl type (named REGEX per Larry's recommendation.) I don't think that changing it should be a long process requiring anything but a minimal deprecation period as it is essentially an implementation detail whose public exposure has so far remained undocumented.

        Precisely the problem I am trying to address. How do I tell a string from a regex?

        Ideally, it shouldn't be so difficult to do in the first place, right? You should be able to use code like if ref $r eq 'Regexp' or if UNIVERSAL::isa($r, 'Regexp') and the fact that you can't is a symptom rather than the real problem itself. You've got a work around for the symptom, but the real problem remains.

        I dont get it. This is exactly the behaviour I would expect given that it is not documented that you shouldn't write a Regexp package of your own.

        It isn't documented that you shouldn't write a Foo package of your own either and yet, if you do, the thingies returned by qr// won't suddenly get access to your Foo methods. As an illustration:

        #!/usr/bin/perl -w use strict; package Foo; sub foo { print "foo\n" } package Regexp; sub bar { print "bar\n" } package main; qr//->bar; # Works even though it probably shouldn't. qr//->foo; # Errors as expected.
        So, you must know about an undocumented feature in order to understand why qr//->bar; actually works in the above code. And that's the behavior you expect? Why on Earth would you expect that?

        By the way...

        $ perl -MB -le '$r=bless qr//,"P"; print B::svref_2object($r)->MAGIC-> +TYPE' r

        -sauoq
        "My two cents aren't worth a dime.";
        
Re^3: Code critique XS function for extracting a blessed regex's pattern.
by adrianh (Chancellor) on Feb 06, 2003 at 00:49 UTC
    My question is this: given an arbitrary blessed scalar ref, how does one efficiently determine if the object is in fact a regex?

    Once you bless a Regexp into another class it isn't a Regexp anymore... try:

    <update>As demerphq kindly pointed out I lied :-) Can you spot the silly mistake in the "demonstration" below :-)</update>

    my $bqr=bless qr/^blessed$/,"Foo"; print "no match for $bqr\n" unless "normal" =~ m/$bqr/;

    :-)

    I guess you could subclass it (although this is something I've never tried) - in which case

    UNIVERSAL::isa($qr, 'Regexp')

    would be the right solution.

      Once you bless a Regexp into another class it isn't a Regexp anymore...

      Nope. The magic doesn't go away. As you can see.

      sub t { printf "%10s %s /%s/\n", $_[0], ($_[0]=~/$_[1]/ ? "=~" : "!="), $_[1]; }; $bqr=bless qr/^blessed$/,"Foo"; $qr=qr/^normal$/; foreach $rex ($bqr,$qr) { t($_,$rex) foreach qw(normal blessed); } __END__ normal != /Foo=SCALAR(0x1abf1d8)/ blessed =~ /Foo=SCALAR(0x1abf1d8)/ normal =~ /(?-xism:^normal$)/ blessed != /(?-xism:^normal$)/
      In fact i think its considered a feature. The possibilities are kinda interesting. :-)

      --- demerphq
      my friends call me, usually because I'm late....

        D'oh - learn something new every day! Thanks for putting this poor fool straight :-)

        (Moral - don't type code examples at 1am. Bad adrian)