Hi. After a few years of Perl hacking, and learning ive finally started exploring the perl guts, and written my first XS routine. For some background the problem this routine tries to solve is the following:
my $qr=qr/^normal$/; my $bqr=bless qr/^blessed$/,"Foo"; print "Normal : $qr\n"; print "Blessed: $bqr\n";
which outputs the less than useful
Normal : (?-xism:^normal$) Blessed: Foo=SCALAR(0x1acf020)
which means that even though $bqr is a regex you cant tell that it is from the perl point of view, at least not very easily. (You could hypothetically rebless the object into the class "Regexp" and then stringify it, and see if it matched /^(?[misx]+:/, but even that wouldn't be 100% safe, and if you had to do it to a lot of blessed items then it could get mighty slow.)

So I ripped the code that stringifies a regex out of PL_sv_2pv and turned it into the following XS sub. I was hoping that some of the perlguts/xs hackers out there would critique my code. Incidentally If you download Scalar::Util then this code should work just fine if placed into the Scalar::Util section of Util.xs that comes with that package. (Of course minor modifications are required for lib/Scalar/Util.pm to export the sub, I can publish them and my tests if anybody sees the need.)

Question, the code in PL_sv_2pv caches the stringified pattern (along with the modifiers it was compiled under) in mg->ptr. This code also does this. I couldnt decide if this was the correct behaviour or not. I lean towards thinking it is. There are a few other questions in comments in the code.

And as a warning, I've only compiled and tested this against 5.6.1 on a Win32 machine (I see no reason the later should be an issue in this case, I mention it only for completeness), I have no idea (yet) if it will work on newer or older versions of perl.

SV * regexp(sv) SV * sv PROTOTYPE: $ PREINIT: STRLEN patlen; char *pattern; char reflags[6]; int left; PPCODE: { /* Checks if a reference is a regex or not. If the parameter is not a ref, or is not the result of a qr// then returns undef. Otherwise in list context it returns the pattern and the modifiers, in scalar context it returns the pattern just as it would if the qr// was blessed into the package Regexp and stringified normally. */ if (SvMAGICAL(sv)) { /* is this if needed??? Why?*/ mg_get(sv); } if(!SvROK(sv)) { /* bail if we dont have a ref. */ XSRETURN_UNDEF; } patlen=0; left=0; if (SvTHINKFIRST(sv)) { sv = (SV*)SvRV(sv); if (sv) { MAGIC *mg; if (SvTYPE(sv)==SVt_PVMG) { if ( ((SvFLAGS(sv) & (SVs_OBJECT|SVf_OK|SVs_GMG|SVs_SMG|SVs_RMG)) == (SVs_OBJECT|SVs_RMG)) && (mg = mg_find(sv, 'r'))) { /* Housten, we have a regex! */ regexp *re = (regexp *)mg->mg_obj; I32 gimme = GIMME_V; if ( gimme == G_ARRAY ) { /* we are in list/array context so stringify the modifiers that apply. We ignore "negati +ve modifiers" in this scenario. Also we dont c +ache the modifiers. AFAICT there isnt anywhere f +or them to go. :-( */ char *fptr = "msix"; char ch; U16 reganch = (U16)((re->reganch & PMf_COMPILE +TIME) >> 12); while((ch = *fptr++)) { if(reganch & 1) { reflags[left++] = ch; } reganch >>= 1; } /* return the pattern and the modifiers */ + XPUSHs(sv_2mortal(newSVpvn(re->precomp,re->pre +len))); XPUSHs(sv_2mortal(newSVpvn(reflags,left))); XSRETURN(2); } else { /* Non array/list context. So we build up +the stringified form just as PL_sv_2pv does +, and like it we also cache the result. T +he entire content of the if() is directly +taken from PL_sv_2pv */ if (!mg->mg_ptr ) { char *fptr = "msix"; char ch; int right = 4; U16 reganch = (U16)((re->reganch & PMf +_COMPILETIME) >> 12); while((ch = *fptr++)) { if(reganch & 1) { reflags[left++] = ch; } else { reflags[right--] = ch; } reganch >>= 1; } if(left != 4) { reflags[left] = '-'; left = 5; } mg->mg_len = re->prelen + 4 + left; New(616, mg->mg_ptr, mg->mg_len + 1 + +left, char); Copy("(?", mg->mg_ptr, 2, char); Copy(reflags, mg->mg_ptr+2, left, char +); Copy(":", mg->mg_ptr+left+2, 1, char); Copy(re->precomp, mg->mg_ptr+3+left, r +e->prelen, char); mg->mg_ptr[mg->mg_len - 1] = ')'; mg->mg_ptr[mg->mg_len] = 0; } /* return the pattern in (?msix:..) format + */ XPUSHs(sv_2mortal(newSVpvn(mg->mg_ptr, mg- +>mg_len))); XSRETURN(1); } } } } } /* 'twould appear it aint a regex, so return undef/empty list */ XSRETURN_UNDEF; }
Thoughts for future work: Id like to make the code that wraps the original pattern in the (?msix:) be smarter. For instance
my $qr=qr/^foo$/; while (length $qr < 1000) { print "$qr\n"; $qr=eval "qr/$qr/" or die $@; }
produces
(?-xism:^foo$) (?-xism:(?-xism:^foo$)) (?-xism:(?-xism:(?-xism:^foo$))) (?-xism:(?-xism:(?-xism:(?-xism:^foo$)))) (?-xism:(?-xism:(?-xism:(?-xism:(?-xism:^foo$))))) (?-xism:(?-xism:(?-xism:(?-xism:(?-xism:(?-xism:^foo$)))))) ...
Which bothers me a touch. :-) (Note that the above XS routine would do the same thing.)

At the very least the wrapper shouldn't be added if its completely redundant. Of course this means parsing the pattern quite carefully, but I think its doable. tye made the point (which I misunderstood at the time) that during the (normal) parsing of the pattern it should be possible to "reduce" the options to avoid ones that are redundant. However this approach is certainly beyond my C/XS skills right now. I have to admit that im leaning towards providing a block for this in a perl wrapper to the XS instead. (Ideas and assistance welcome ;-)

Thanks in advance to any tips, suggestions, beefs, complaints or advice.

Note As I reviewed this node before posting I noticed that i could probably remove the PREINIT block, and put the variables within into a more restricted scope. But I don't have the time to test this change, and I know the posted code does work so i've left it...

--- demerphq
/me thinks that red Jaguar E-types are cool, and he doesn't even drive!


In reply to Code critique XS function for extracting a blessed regex's pattern. by demerphq

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.