in reply to XS: Manipulating refcounts

First of all note that I do not have much experience with C and/or XS.

So what I'm saying might make sense or not.

What you are doing is a bit dangerous... (Or atleast I think so).

You are creating an Artist and then creating a referenceto that artist (which s not a perl structure). So what you are doing is saving the pointer of *artist in $$artist.

This can easily go wrong... For example: $$artist = 0; or $$artitst--;. If you then call your C code then you will - most likely - get a segmentation fault because you cannot access that memory space...

I think a better way to do this would be to store the actual Artist object in an array or so in the C code and return an SV/RV with the index in the array.

So far for the artists, for the CDs I think it would be easier to store the SV (after incrementing it's reference count ofcourse), and not the Artist...

This would also eliminate the need to store the RV/SV in the Artist object...

At the moment I haven't played enough with the code to give you a full (and working) example of what I mean. (so consider this as some initial feedback - some of which might be incorrect)... I will try to do this later today

Replies are listed 'Best First'.
Re^2: XS: Manipulating refcounts
by Animator (Hermit) on Sep 20, 2006 at 11:19 UTC

    I played a bit with it...

    This is the only thing I can come up with:

    • sv_setref_pv() upgrades perlref to an RV and returns it,
    • you then store this SV in artist->perlobj,
    • you return the SV. Because of some XS magic (I think) a new SV is created with the contents of the perlref SV,
    • then later in your code you increase the reference counter of the original SV. (perlref - but this is not the same SV as $artist, and even if it were then it would still be wrong since you should be incrementing the reference counter of $$artist).

    What you should be doing instead (I think) is store the thing it refences to in perlobj and increase the counter of that.
    That is: use:

    sv_setref_pv(perlref, "Artist", (void*)artist); artist->perlobj = SvRV(perlref);
    instead of:
    artist->perlobj = sv_setref_pv(perlref, "Artist", (void*)artist);

    The attached code does that and is (IMHO) a bit easier to understand...

    The output of the code:

    Creating artist: Johnny Houseburner Refcount: $artist: 1 Refcount: $$artist: 1 Creating CD (artist = Johnny Houseburner) Refcount: $artist: 1 Refcount: $$artist: 2 Refcount: $cd: 1 Refcount: $$cd: 1 undef $artist Artist name via cd: Johnny Houseburner undef $cd CD::DESTROY called. Artist::DESTROY called (Johnny Houseburner) End block End code

    (Small note: this code is still easy to segfault: all you need to do is change $$artist.)

      Beautifully done, Animator. Your rewrite establishes that what I'd hoped to do is possible. It's clearer IMHO as well. :)

      My crucial error was misunderstanding the return value of sv_setref_pv. I'd thought it returned the object rather than the reference. Once artist->perlobj is assigned the correct value, everything falls into place, as you show. Now I can go implement this algo. Brilliant!

      you return the SV. Because of some XS magic (I think) a new SV is created with the contents of the perlref SV,

      Here's how things progress behind the scenes as control leaves new_artist and we enter XS_main_new_artist, the glue function generated by Inline::C/xsubpp:

      XS(XS_main_new_artist) { dXSARGS; if (items != 1) Perl_croak(aTHX_ "Usage: main::new_artist(name)"); PERL_UNUSED_VAR(cv); /* -W */ { char * name = (char *)SvPV_nolen(ST(0)); SV * RETVAL; RETVAL = new_artist(name); ST(0) = RETVAL; sv_2mortal(ST(0)); } XSRETURN(1); }
      1. The return value from new_artist is copied to RETVAL, which is copied in turn to ST(0), the top of the perl stack.
      2. ST(0) is "mortalized" by calling sv_2mortal( ST(0) ). That's essentially a delayed decrement of its refcount. It won't be freed right away as would be the case if you'd called SvREFCNT_dec -- instead, the cleanup will occur a bit later, after we've had the chance to do something with it.
      3. XSRETURN(1) is invoked, indicating that a single value is being returned via the stack, and control leaves XS land, heading back into world of stock Perl OPs.
      4. The Perl assignment operator = pops the top item off the stack and triggers a call to SvSetMagicSV, and the value of the popped SV is copied into the lexical variable $artist. Now the SV that we created back in new_artist can be freed. (Note that since the Perl interpreter can often tell when an SV is about to be freed, it will probably "steal" the contents and avoid doing unneeded memory allocation and copying).

      Now, consider that in my original program, the mortalized SV on the stack is the same SV that's housed in the Artist struct. I don't fully understand how reference counts affect things in such cases, but that was definitely not what I intended. It does not surprise me that undesirable consequences arose.

      Thanks for your help,

      --
      Marvin Humphrey
      Rectangular Research ― http://www.rectangular.com
Re^2: XS: Manipulating refcounts
by creamygoodness (Curate) on Sep 20, 2006 at 16:20 UTC

    Whatever your level with experience with C/XS I think you've tracked down the critical bug. :) More on that in a bit.

    You're absolutely right that this data structure should use SVs and not C structs. The very idea of using C structs when an ordinary hash would do is ridiculous! But then it wouldn't illustrate the problem I'm trying to solve.

    This technique is being developed for use with Apache Lucy, which will be a large C library with both Perl and Ruby bindings. The "object" member of the struct will actually be a void*, and it could be a Perl object, or a Ruby object, or eventually, something else. (Actually, I don't know exactly whether or how it will work with Ruby -- hopefully it will. :) ) All the Perl-specific C routines need to stay in the XS bindings and cannot be used in the C core. Dereferencing an actual SV and getting at its contents is not an option. However, we need some way of working with native garbage collection, and this is it.

    With regards to whether or not storing a pointer is dangerous... it's common practice, even in core modules. You can make DB_File segfault by decrementing the pointer in the tied object! So don't do that. :)

    #!/usr/bin/perl use strict; use warnings; use DB_File; my %hash; my $tied = tie %hash, 'DB_File', "garbagefile"; $hash{foo} = "bar"; print STDERR $tied->FETCH('foo') . "\n"; $$tied--; print STDERR $tied->FETCH('foo') . "\n"; __END__ Outputs: slothbear:~/perltest marvin$ perl ptrobj_segfault.plx bar Bus error slothbear:~/perltest marvin$
    --
    Marvin Humphrey
    Rectangular Research ― http://www.rectangular.com

      Ofcourse there are (CORE) modules doing it... But it's not that because they do it you should do it too... And in my opinion the tied hash from DB_File and your scalar are two completly differnt things...

      The way to make DB_File segfault is to get the tied object and then change it.

      The way to make your code segfault is to either change the thing it references to or pass the wrong thing to one of your other routines.

      For example:

      my $artist2; my $cd = new_cd($artist2);
      This code will segfault. Sure in this example it is (or atleast should be) obvious why it segfaults... But if you are passing $artist to several subroutines and maybe forgetting it/passing the wrong thing once then it is not that obvious...

      To prevent $$artist from being changed you could use: SvREADONLY_on(SvRV(perlref));.

      But this still leaves the other issue... As in, passing the wrong variable/undef to new_cd(); (for example)...

      I guess it comes down to:

      • Who will be using it?
      • How familiar are they with segfaults?
      • Do they expect to see a segfault in perl?
      • Will they think about your code as causing the segfault? (I guess that comes down to: how well is it documented)
      • Do you really care about the memory that an extra list would use?

      Update: removed accidental link.

        The standard way of throwing a meaningful error rather than segfaulting when supplied with bad input of the type you describe is to vet arguments using sv_derived_from, which is the perlapi function which implements isa. Manually inserted, such argument checking would look something like this...
        /* Constructor for CD. */ SV* new_cd(SV *artist_sv) { if (!sv_derived_from(artist_sv, "Artist") { Croak("Not an Artist"); } Artist *artist = INT2PTR( Artist*, SvIV(SvRV(artist_sv)) ); /* ... */ }

        That stops anything but Artist and its subclasses from getting through. If somebody does something like bless a hash into the "Artist" package and submit that we'll still get a segfault, but that's less likely to happen inadvertently.

        Typically, the argument-checking code is not typed in manually, but is inserted by xsubpp via a typemap which spares the programmer from the error-prone drudgery of repeating that code over and over.

        Rest assured that my production code always implements such checks. I agree that the example code is dangerous. I left out the safety code because the sample was already too long -- long enough to dissuade demerphq from looking at it, for example.

        --
        Marvin Humphrey
        Rectangular Research ― http://www.rectangular.com