in reply to Access via substr refs 2000 times slower

You're triggering substr's lvalue return, which involves magic.

use Devel::Peek; { my $subRef = \substr $string, 0; Dump($$subRef); } { my $subRef = \(''.substr $string, 0); Dump($$subRef); }
SV = PVLV(0x1834fdc) at 0x1831820 REFCNT = 2 FLAGS = (PADMY,GMG,SMG,pPOK) IV = 0 NV = 0 PV = 0x18208ec ""\0 CUR = 0 LEN = 4 MAGIC = 0x182443c MG_VIRTUAL = &PL_vtbl_substr MG_TYPE = PERL_MAGIC_substr(x) TYPE = x TARGOFF = 0 TARGLEN = 0 TARG = 0x236dc8 SV = PV(0x238e44) at 0x236dc8 REFCNT = 2 FLAGS = (POK,pPOK) PV = 0x182eca4 ""\0 CUR = 0 LEN = 4 SV = PV(0x238e80) at 0x1831808 REFCNT = 2 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x183646c ""\0 CUR = 0 LEN = 4

By changing to

my $subRef = \(''.substr $string, 0);

I get

0.0120670795440674 0.0105710029602051 0.010854959487915

Update: scalar also works, and doesn't have the (albeit minute) overhead of calling concat.

my $subRef = \scalar substr $string, 0;

Replies are listed 'Best First'.
Re^2: Access via substr refs 2000 times slower
by BrowserUk (Patriarch) on Dec 28, 2008 at 11:53 UTC

    Trouble is, either of those cause copying of the referenced substring. Effectively just giving you a reference to an anonymous scalar that is a copy of the substring. You might just as well do:

    my $substr = substr $bigstring, $start, $length; func( \$substr );

    The purpose of taking a substr ref was to avoid copying large chunks of large string, and allow the large string to be modified in place via that reference.

    That said. It seems that taking an (lvalue) substr ref also also triggers copying these days. Albeit with attached magic that means that changes made to the copy also get applied to the original substring. Which is a bit cockeyed.

    It never used to, but obviously has for some time--at least since 5.8.6. I'm surprised that I've never noticed it before now. It kind of devalues the purpose of taking a reference to a substring. Methinks whomever made the change did not really get Lvalue refs.

    I feel the need to write some XS to (again), give me the ability to to pass a reference to a substring around with causing that substring to be copied.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      It never used to, but obviously has for some time--at least since 5.8.6

      At least since 5.6.0.

      From what I can tell, 'get' magic works by storing the value from the magic handler into the SV, allowing the code that follows to ignore magic. (mg_get: Do magic after a value is retrieved from the SV.) That would explain the copy.

      Similarly, 'set' magic works by passing the value in the SV to the magic handler, allowing the code that preceeds to ignore magic. (mg_set: Do magic after a value is assigned to the SV.) That would also create a copy.

      If my understanding is correct, that means the problem isn't related to lvalue substr but with magic in general.

      I wonder if this change is tied to the work that was done allowing copy on write strings. Modifying a large string in place via a reference doesn't look like it would play well with the idea of having multiple variables using copy-on-write so they can share one actual copy of a large string.

        Dunno! According to ikegami, lvalue refs have always caused the substring to be copied at least as long as I've been using Perl--circa 5.6.1. Though I'd have sworn they never used to.

        Also, despite the copying, using the ref as an lvalue still modifies the original string in-place:

        $s = 'the quick brown fox';; $r = \substr $s, 10, 5;; $$r = 'green';; print $s;; the quick green fox

        The same is true for multiple copies of the lvalue ref:

        $r2 = $r;; $$r2 = 'orange';; print $s;; the quick orange fox

        And from my reading of the dump of an lvalue ref, it carries all the information needed to access the substring. A reference to the original SV, and the offset & length of the substring:

        print Dump $r2;; SV = RV(0x186d8d0) at 0x196d088 REFCNT = 1 FLAGS = (ROK) RV = 0x196d0dc SV = PVLV(0x186c9f4) at 0x196d0dc REFCNT = 2 FLAGS = (PADMY,GMG,SMG,pPOK) IV = 0 NV = 0 PV = 0x19be29c "orange"\0 ######## This seems to be redundant to m +e. CUR = 6 LEN = 7 MAGIC = 0x182a75c MG_VIRTUAL = &PL_vtbl_substr MG_TYPE = PERL_MAGIC_substr(x) TYPE = x TARGOFF = 10 ### Offset TARGLEN = 5 ### Length TARG = 0x2350bc SV = PV(0x2354ac) at 0x2350bc ### The original SV REFCNT = 2 FLAGS = (POK,pPOK) PV = 0x182a7bc "the quick orange fox"\0 CUR = 20 LEN = 21

        So making a copy of the substring seems redundant and profligate, as well a dashed inconvenient for my purposes.

        I'm also very skeptical of there being any real benefits to the Holy COW for Perl anyway.

        It is far more efficient to request a single block of pages from the OS and then REP MOVSD dst, src to duplicate it on mass, than request it page by page and copy it piecemeal everytime a reference count gets changed; or string is used in a numeric context; or a number is interpolated into a string; or any of the myriad other 'read-only' touches to memory that would necessitate COW being invoked.

        One expensive kernel call and one relatively fast user-space operation, versus dozens, hundreds or thousands of expensive ring3-ring0-ring3 transitions, not to mention the cost of the cache flushes.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        It's not related to COW. For example, let's look at the following snippet from (the arbitrarily chosen) pp_concat function ("." operator):

        else { /* TARG == left */ STRLEN llen; SvGETMAGIC(left); /* or mg_get(left) may happen here */ if (!SvOK(TARG)) { if (left == right && ckWARN(WARN_UNINITIALIZED)) report_uninit(right); sv_setpvn(left, "", 0); } (void)SvPV_nomg_const(left, llen); /* Needed to set UTF8 flag */ lbyte = !DO_UTF8(left); if (IN_BYTES) SvUTF8_off(TARG); }

        SvGETMAGIC(left) is what calls the 'get' magic and stores the result in the SV if the LHS is magical. If it didn't work that way, the following sv_setpvn, SvPV_nomg_const and DO_UTF8 would all have to be handled by the magic. It doesn't make sense to have every type of magic and every tied variable handle every perlapi and internal Perl function that might affect it.

        You could shorten the life of the copy to where it's needed, but Perl doesn't even do that for lexical variables. Their PV remains allocated when the variable is out of scope.

        >perl -MDevel::Peek -e"sub f { my $s; Dump $s if $i++; $s='abc' } f;f" SV = PV(0x238e44) at 0x182ee40 REFCNT = 1 FLAGS = (PADBUSY,PADMY) PV = 0x18250bc "abc"\0 CUR = 3 LEN = 4