demerphq has asked for the wisdom of the Perl Monks concerning the following question:

Hi. After a few years of Perl hacking, and learning ive finally started exploring the perl guts, and written my first XS routine. For some background the problem this routine tries to solve is the following:
my $qr=qr/^normal$/; my $bqr=bless qr/^blessed$/,"Foo"; print "Normal : $qr\n"; print "Blessed: $bqr\n";
which outputs the less than useful
Normal : (?-xism:^normal$) Blessed: Foo=SCALAR(0x1acf020)
which means that even though $bqr is a regex you cant tell that it is from the perl point of view, at least not very easily. (You could hypothetically rebless the object into the class "Regexp" and then stringify it, and see if it matched /^(?[misx]+:/, but even that wouldn't be 100% safe, and if you had to do it to a lot of blessed items then it could get mighty slow.)

So I ripped the code that stringifies a regex out of PL_sv_2pv and turned it into the following XS sub. I was hoping that some of the perlguts/xs hackers out there would critique my code. Incidentally If you download Scalar::Util then this code should work just fine if placed into the Scalar::Util section of Util.xs that comes with that package. (Of course minor modifications are required for lib/Scalar/Util.pm to export the sub, I can publish them and my tests if anybody sees the need.)

Question, the code in PL_sv_2pv caches the stringified pattern (along with the modifiers it was compiled under) in mg->ptr. This code also does this. I couldnt decide if this was the correct behaviour or not. I lean towards thinking it is. There are a few other questions in comments in the code.

And as a warning, I've only compiled and tested this against 5.6.1 on a Win32 machine (I see no reason the later should be an issue in this case, I mention it only for completeness), I have no idea (yet) if it will work on newer or older versions of perl.

SV * regexp(sv) SV * sv PROTOTYPE: $ PREINIT: STRLEN patlen; char *pattern; char reflags[6]; int left; PPCODE: { /* Checks if a reference is a regex or not. If the parameter is not a ref, or is not the result of a qr// then returns undef. Otherwise in list context it returns the pattern and the modifiers, in scalar context it returns the pattern just as it would if the qr// was blessed into the package Regexp and stringified normally. */ if (SvMAGICAL(sv)) { /* is this if needed??? Why?*/ mg_get(sv); } if(!SvROK(sv)) { /* bail if we dont have a ref. */ XSRETURN_UNDEF; } patlen=0; left=0; if (SvTHINKFIRST(sv)) { sv = (SV*)SvRV(sv); if (sv) { MAGIC *mg; if (SvTYPE(sv)==SVt_PVMG) { if ( ((SvFLAGS(sv) & (SVs_OBJECT|SVf_OK|SVs_GMG|SVs_SMG|SVs_RMG)) == (SVs_OBJECT|SVs_RMG)) && (mg = mg_find(sv, 'r'))) { /* Housten, we have a regex! */ regexp *re = (regexp *)mg->mg_obj; I32 gimme = GIMME_V; if ( gimme == G_ARRAY ) { /* we are in list/array context so stringify the modifiers that apply. We ignore "negati +ve modifiers" in this scenario. Also we dont c +ache the modifiers. AFAICT there isnt anywhere f +or them to go. :-( */ char *fptr = "msix"; char ch; U16 reganch = (U16)((re->reganch & PMf_COMPILE +TIME) >> 12); while((ch = *fptr++)) { if(reganch & 1) { reflags[left++] = ch; } reganch >>= 1; } /* return the pattern and the modifiers */ + XPUSHs(sv_2mortal(newSVpvn(re->precomp,re->pre +len))); XPUSHs(sv_2mortal(newSVpvn(reflags,left))); XSRETURN(2); } else { /* Non array/list context. So we build up +the stringified form just as PL_sv_2pv does +, and like it we also cache the result. T +he entire content of the if() is directly +taken from PL_sv_2pv */ if (!mg->mg_ptr ) { char *fptr = "msix"; char ch; int right = 4; U16 reganch = (U16)((re->reganch & PMf +_COMPILETIME) >> 12); while((ch = *fptr++)) { if(reganch & 1) { reflags[left++] = ch; } else { reflags[right--] = ch; } reganch >>= 1; } if(left != 4) { reflags[left] = '-'; left = 5; } mg->mg_len = re->prelen + 4 + left; New(616, mg->mg_ptr, mg->mg_len + 1 + +left, char); Copy("(?", mg->mg_ptr, 2, char); Copy(reflags, mg->mg_ptr+2, left, char +); Copy(":", mg->mg_ptr+left+2, 1, char); Copy(re->precomp, mg->mg_ptr+3+left, r +e->prelen, char); mg->mg_ptr[mg->mg_len - 1] = ')'; mg->mg_ptr[mg->mg_len] = 0; } /* return the pattern in (?msix:..) format + */ XPUSHs(sv_2mortal(newSVpvn(mg->mg_ptr, mg- +>mg_len))); XSRETURN(1); } } } } } /* 'twould appear it aint a regex, so return undef/empty list */ XSRETURN_UNDEF; }
Thoughts for future work: Id like to make the code that wraps the original pattern in the (?msix:) be smarter. For instance
my $qr=qr/^foo$/; while (length $qr < 1000) { print "$qr\n"; $qr=eval "qr/$qr/" or die $@; }
produces
(?-xism:^foo$) (?-xism:(?-xism:^foo$)) (?-xism:(?-xism:(?-xism:^foo$))) (?-xism:(?-xism:(?-xism:(?-xism:^foo$)))) (?-xism:(?-xism:(?-xism:(?-xism:(?-xism:^foo$))))) (?-xism:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:^foo$)))))) ...
Which bothers me a touch. :-) (Note that the above XS routine would do the same thing.)

At the very least the wrapper shouldn't be added if its completely redundant. Of course this means parsing the pattern quite carefully, but I think its doable. tye made the point (which I misunderstood at the time) that during the (normal) parsing of the pattern it should be possible to "reduce" the options to avoid ones that are redundant. However this approach is certainly beyond my C/XS skills right now. I have to admit that im leaning towards providing a block for this in a perl wrapper to the XS instead. (Ideas and assistance welcome ;-)

Thanks in advance to any tips, suggestions, beefs, complaints or advice.

Note As I reviewed this node before posting I noticed that i could probably remove the PREINIT block, and put the variables within into a more restricted scope. But I don't have the time to test this change, and I know the posted code does work so i've left it...

--- demerphq
/me thinks that red Jaguar E-types are cool, and he doesn't even drive!

Replies are listed 'Best First'.
Re: Code critique XS function for extracting a blessed regex's pattern.
by sauoq (Abbot) on Feb 05, 2003 at 21:58 UTC

    That seems like an awful lot of work to go to to replace a single line in package Foo...

    package Foo; use overload '""' => sub { qr/$_[0]/ }; package main; my $qr=qr/^normal$/; my $bqr=bless qr/^blessed$/,"Foo"; print "Normal : $qr\n"; print "Blessed: $bqr\n";

    -sauoq
    "My two cents aren't worth a dime.";
    
      Wow! Im impressed, where were you when I wrote the first node? :-) Very nice idea indeed.

      But, unfortunately it doesnt address the question im trying to solve. My question is this: given an arbitrary blessed scalar ref, how does one efficiently determine if the object is in fact a regex? Your solution, which i personally think is rather ingenious, solves "How do I make a blessed ref, when stringified, return the pattern?". Which is I think useful indeed, but unfortunately not what I need. (I recognize I may not have specified the requirement sufficiently.)

      Even though this is a solution from the point of view of designing a class, it has the problem that its underlying concept, that of qr//ing the value, doesn't generalize. How do you detect a failure? There would be no way to determine if the wrapped object actually had produced regex, or just a ref stringified, or any number of other magic events.

      Anyway, ++ for the idea...

      --- demerphq
      my friends call me, usually because I'm late....

        My question is this: given an arbitrary blessed scalar ref, how does one efficiently determine if the object is in fact a regex?

        OK, I see the "problem" you are trying to solve now. I'm still not sure it is really worth solving though. In fact, your code might do more harm than good if its use became widespread. Why? Because it uses an undocumented "feature" of an undocumented quasi-type to provide functionality of questionable necessity to people writing ill-conceived code.

        Regexp thingies are a terrible kludge. They drift about in limbo, being neither entities of a true Perl type nor normal objects. Yes, you can play some tricks with them but that doesn't mean it is a good idea to do so. The fact that the blessed reference returned by qr// keeps its magical regular expression value after being reblessed into another class is probably not a good thing; it may even be a bug. Regardless, it is undocumented and we shouldn't rely on the behavior. (All of which begs the question of whether we should even rely on qr// returning a blessed reference in the first place.)

        If Regexp objects are elevated to a real Perl type someday, then code like

        my $r = bless qr/foo/, "MyPackage";
        probably won't even work and we'll be forced into writing code that is consistent with other types. Instead of getting a reference directly from qr// we'll have to take a reference to whatever it returns and bless that instead. There's no reason not to do that now. Code like
        my $r = bless \qr/foo/, "MyPackage";
        should continue to work even if Regexps are promoted to a real type. It does require that $$r is used when you want to get at the underlying regular expression but dereferencing isn't that much of an inconvenience, is it?.

        The whole mess gets even stickier when you consider that strings can be used in much the same way that precompiled regexes are.

        $ perl -le 'my $r = "bar"; print "yes" if "foobarbaz" =~ $r' yes
        Now, keep that in mind as you reconsider the issue of whether Regexp thingies should maintain their magic after being reblessed into another class. It can lead to inconsistent behavior. For instance:
        #!/usr/bin/perl -w use strict; package P; use overload '""' => sub { 'stringified' }; package main; local $\ = "\n"; my $regex = qr/match/; bless $regex, 'P'; my $plain = \my $t; bless $plain, 'P'; print '"stringified" matched $regex' if "stringified" =~ $regex; print '"stringified" matched $plain' if "stringified" =~ $plain; __END__ "stringified" matched $plain
        So, because of Regexps, not all references are created equal. Bummer.

        Yet another inconsistency due to the Regexp quasi-pseudo-sorta class is that you can write your own Regexp package and the things returned by qr// get access to your methods.

        #!/usr/bin/perl -w use strict; package Regexp; sub new { my $r; bless \$r } sub f { q("I'm a Regexp.") } package main; local $\ = "\n"; my $qr = qr/foo/; my $ob = Regexp->new(); print '$qr says, ', $qr->f; print '$ob says, ', $ob->f; print '$qr isa Regexp' if $qr->isa('Regexp'); print '$ob isa Regexp' if $ob->isa('Regexp'); print '$qr: ', $qr; print '$ob: ', $ob; __END__ $qr says, "I'm a Regexp." $ob says, "I'm a Regexp." $qr isa Regexp $ob isa Regexp $qr: (?-xism:foo) $ob: Regexp=SCALAR(0x805f148)
        That's not very nice behavior given that it isn't, AFAIK, documented that you shouldn't write a Regexp package of your own.

        All of this leads me to the conclusion that, if someone actually finds your XS code useful, they are almost certainly doing things that they ought not be doing anyway. ;-)

        -sauoq
        "My two cents aren't worth a dime.";
        
        My question is this: given an arbitrary blessed scalar ref, how does one efficiently determine if the object is in fact a regex?

        Once you bless a Regexp into another class it isn't a Regexp anymore... try:

        <update>As demerphq kindly pointed out I lied :-) Can you spot the silly mistake in the "demonstration" below :-)</update>

        my $bqr=bless qr/^blessed$/,"Foo"; print "no match for $bqr\n" unless "normal" =~ m/$bqr/;

        :-)

        I guess you could subclass it (although this is something I've never tried) - in which case

        UNIVERSAL::isa($qr, 'Regexp')

        would be the right solution.

Re: Code critique XS function for extracting a blessed regex's pattern.
by demerphq (Chancellor) on Feb 05, 2003 at 13:43 UTC
    Podmaster has kindly informed me that this code works fine on 5.7 and 5.8.

    He also pointed out that I should mention that on Win32 if you have nmake.exe that you will need to patch the makefile.pl that comes with Scalar::Util so that it uses nmake instead of make. (Almost the last line in the script.) Othwerwise it will fail a pretest and claim incorrectly that the code can't be built on your system.

    --- demerphq
    my friends call me, usually because I'm late....

Re: Code critique XS function for extracting a blessed regex's pattern.
by jryan (Vicar) on Feb 05, 2003 at 22:00 UTC

    The problem is with your initial code. Bless expects a reference, and a "regex object" is just a string. If you pass a string to bless, it will force the string to become a reference, and that is why you get something that looks likeFoo=SCALAR(0x1acf020). However, if you pass bless a reference to the "regex object", then everything is fine. Observe:

    my $qr=qr/^normal$/; my $bqr=bless \(qr/^blessed$/),"Foo"; print "Normal : $qr\n"; print "Blessed: $$bqr\n";
      Bless expects a reference, and a "regex object" is just a string.

      Errr... not so. Witness:

      $ perl -le 'print ref qr/foo/' Regexp

      -sauoq
      "My two cents aren't worth a dime.";