BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

I have a blessed SV * that when it runs out of space I need to prepend an extra character at the front. I can do this by mallocing a new chunk of ram, copying the contents from the old sv_pv to the starting at offset 1 and then assigning the address of the new space back into the sv_pv. Simple, but rather inefficient.

What I would like to do is prepend an extra 100 characters to the front (as above) and then juggle the PVX, CUR and LEN fields of the SV to allow perl to see only one of the new bytes. Then on subsequent occasions when I need a new characters, adjust PVX and CUR to provide it from the reserve with out needing to realloc and copy. Seems logical, but my attempts to make it work keeps traping.

Is this possible without breaking when the SV is used by Perl? Are there any macros to help? Or examples? Or can someone show me how to do it?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: XS Prepending space to an SVs PV
by creamygoodness (Curate) on Apr 26, 2006 at 12:33 UTC

    The key concept that salva exploits in his excellent example is the "offset OK hack". When you call substr in Perl and lop a few characters off the front of a scalar... instead of reallocating and copying, Perl performs the following steps:

    1. Store the number of bytes to get lopped in the scalar's IV.
    2. Move the SvPVX pointer forwards.
    3. Set the scalar's OOK flag.
    4. Turn off the scalar's IOK flag.
    5. Adjust the scalar's CUR and LEN to reflect the change.

    $ perl -MDevel::Peek -e \ 'my $toes = "potatoes"; substr($toes, 0, 4, ""); Dump($toes);' SV = PVIV(0x1801a20) at 0x1801434 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,OOK,pPOK) IV = 4 (OFFSET) PV = 0x300b64 ( "pota" . ) "toes"\0 CUR = 4 LEN = 5

    See the section "Offsets" in perlguts for a thorough explanation, as well as the sv_chop function in perlapi. Perl_sv_chop in the sv.c source code is only a few lines long, so you might also want to snoop that.

    --
    Marvin Humphrey
    Rectangular Research ― http://www.rectangular.com
Re: XS Prepending space to an SVs PV
by salva (Canon) on Apr 26, 2006 at 11:39 UTC
    I had to do something similar on my Tie::Array::Packed module:
    static char * my_sv_unchop(pTHX_ SV *sv, STRLEN size) { STRLEN len; char *pv = SvPV(sv, len); IV off = SvOOK(sv) ? SvIVX(sv) : 0; if (!size) return pv; if (off >= size) { SvLEN_set(sv, SvLEN(sv) + size); SvCUR_set(sv, len + size); SvPV_set(sv, pv - size); if (off == size) SvFLAGS(sv) &= ~SVf_OOK; else SvIV_set(sv, off - size); } else if (len + size <= off + SvLEN(sv)) { if (off) { SvLEN_set(sv, SvLEN(sv) + off); SvFLAGS(sv) &= ~SVf_OOK; } SvCUR_set(sv, len + size); SvPV_set(sv, pv - off); Move(pv, pv + size - off, len, char); } else { SV *tmp = sv_2mortal(newSV(len + size)); STRLEN tmp_len; char *tmp_pv; SvPOK_on(tmp); tmp_pv = SvPV(tmp, tmp_len); Move(pv, tmp_pv + size, len, char); SvCUR_set(tmp, size + len); sv_setsv(sv, tmp); } return SvPVX(sv); }
    this function inserts size bytes in front of the string efficiently and returns a pointer to the start of the new pv.

    You would probably want to modify it so that more bytes than requested are reserved on the string to optimize consecutive insertions.

    Note that sv_setsv is used to only copy the string contents once as it steals the SV string memory from the source SV when it is a mortal.

      modified to accept an additional argument that indicates how much extra space should be allocated when reallocation of the string memory is required:
      static char * my_sv_unchop(pTHX_ SV *sv, STRLEN size, STRLEN reserve) { STRLEN len; char *pv = SvPV(sv, len); IV off = SvOOK(sv) ? SvIVX(sv) : 0; if (!size) return pv; if (off >= size) { SvLEN_set(sv, SvLEN(sv) + size); SvCUR_set(sv, len + size); SvPV_set(sv, pv - size); if (off == size) SvFLAGS(sv) &= ~SVf_OOK; else SvIV_set(sv, off - size); } else { size += reserve; if ((size < reserve) || (len + size < size)) Perl_croak(aTHX_ "panic: memory wrap"); if (len + size <= off + SvLEN(sv)) { SvCUR_set(sv, len + size); SvPV_set(sv, pv - off); Move(pv, pv + size - off, len, char); if (off) { SvLEN_set(sv, SvLEN(sv) + off ); SvFLAGS(sv) &= ~SVf_OOK; } } else { SV *tmp = sv_2mortal(newSV(len + size)); char *tmp_pv; SvPOK_on(tmp); tmp_pv = SvPV_nolen(tmp); Move(pv, tmp_pv + size, len, char); SvCUR_set(tmp, len + size); sv_setsv(sv, tmp); } if (reserve) sv_chop(sv, SvPVX(sv) + reserve); } return SvPVX(sv); }

        Many thanks (again) for this code, it is very generous of you and so much easier to learn from than the pure reference material of the 'guts and 'api docs.

        I set about adapting the above sub to my needs which are slightly less demanding that yours. The naming of sv_chop is (historical; set in stone; not of your making), slightly confusing with respect to Perl's chop as they operate on different ends of the string. That makes sv_unchop even more confusing :)

        Anyway, whilst reading the docs on sv_chop, I noticed sv_insert and relating that back to creamygoodness' explanation and demonstration that used substr to prepend to a string, I thought I'd see if I could use that to simplify things a little. What I came up with is this:

        if( *a == 9 ) { ## Time to prepend another byte if( SvOOK( n ) ) { ## If we've some reserve left use it SvLEN_set( n, SvLEN( n ) +1 ); SvCUR_set( n, ++l ); SvPV_set( n, --a ); } else { ## else insert 100 more bytes and use sv_chop ## to reserve 99 of them for later char pad[100] = { 0, }; ## Initialise the reserve to all z +eros sv_insert( n, 0, 0, pad, 100 ); sv_chop( n, SvPVX( n ) + 99 ); a = SvPVX( n ); ++l; } } ...

        Whether that could be useful to you I don't know, but if you have the time to cast an eye over it and tell me if you see any obvious fopars... thanks again.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      I had to do something similar on my Tie::Array::Packed module

      I dont mind saying I wish you had sent me a mail about this, I would have been very happy to make Tie::Array::PackedC a pureperl backend to Tie::Array::Packed. (I guess some of the things I did in Tie::Array::PackedC didnt really translate over to XS.)

      As it is im overjoyed that you did Tie::Array::Packed. Its pretty much exactly what I would have written if I had taken the time to implement what I wanted in XS, except probably better. :-)

      Anyway, I wonder how hard it would be to make a wrapper or something so that yours is used if its available, falling back to mine if its not...

      ---
      $world=~s/war/peace/g

        I dont mind saying I wish you had sent me a mail about this

        yes, I should have done it, my apologies for being so unpolite!

        I wonder how hard it would be to make a wrapper or something so that yours is used if its available, falling back to mine if its not...

        Quite easy I think, as your package implements a superset of the API provided by mine... I will try to do it.

        update: Tie::Array::Packed::Auto

Re: XS Prepending space to an SVs PV
by vkon (Curate) on Apr 26, 2006 at 11:13 UTC
    reading perldoc perlguts isn't is what you need:
    You can get and set the current length of the string stored in an +SV with the following macros: SvCUR(SV*) SvCUR_set(SV*, I32 val) You can also get a pointer to the end of the string stored in the +SV with the macro: SvEND(SV*) But note that these last three macros are valid only if "SvPOK()" +is true.
    also, search for SvCUR in perldoc perlguts

    And, if all else fails, one can always do RTFS :):):)

      Getting and setting the length isn't the problem, especially as it is normally done for you. The problem is manipulating the combination of the 3 fields so that although the reality is

      malloc'd space [ | | | |s|o|m|e| |d|a|t|a| |h|e|r|e|\0] ^ SV--->PVX---------------| CUR-------------->|..........................| = 15 LEN------|.....................................| = 19

      which is the reverse situation to the normal thing whereby the extra space is at the end rather than the beginning; the rest of the codebase will respect these settings.

      I've tried setting it up like this, but I get traps. I need to know whether I am setting it up wrong, or whether the rest of the codebase will always assume that CUR and LEN will start at the same place (ie. The same place pointed at by PVX)?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        This - famous Perl guts illustrated, and $5, will make you a coffee.

        I suspect you get traps because you substitute Perl's allocated memory with your own?

        Allocating more place than needed for an SV is why SvGROW exists:

        Although Perl will automatically grow strings for you, if you need +to force Perl to allocate more memory for your SV, you can use the mac +ro SvGROW(SV*, STRLEN newlen)
        Then, reading further in perlguts gives all your required manipulations:
        Offsets Perl provides the function "sv_chop" to efficiently remove charact +ers from the beginning of a string; you give it an SV and a pointer to somewhere inside the PV, and it discards everything before the poi +nter. The efficiency comes by means of a little hack: instead of actuall +y removing the characters, "sv_chop" sets the flag "OOK" (offset OK) + to signal to other functions that the offset hack is in effect, and i +t puts the number of bytes chopped off into the IV field of the SV. It th +en moves the PV pointer (called "SvPVX") forward that many bytes, and adjusts "SvCUR" and "SvLEN".