in reply to Re: Perl XS binding to a struct with an array of chars*
in thread Perl XS binding to a struct with an array of chars*

Hi Rob!

I think both the OP and you are not quite right about allocating memory for EdjeMessageStringSet. In my post below, see:

EdjeMessageStringSet* m = (EdjeMessageStringSet*) safemalloc( sizeof(E +dje_Message_String_Set) + (count-1)*sizeof(char *) ); if (m == NULL) croak("Failed to allocate memory in _new function\n");
First, we have to talk about what this means:
typedef struct { int count; char *str[1]; // may be fine with [] or even just char* str (no su +bscript at all) } Edje_Message_String_Set;
Normally a type has a fixed size that is known at compile time and sizeof() works just fine. That is not true in this case. Normally, I would have expected to use a fixed size Edje_Message_String_Set, like this:
typedef struct { int count; char** string_array; } Edje_Message_String_Set;
Now Edje_Message_String_Set is a fixed size. To instance one, you allocate memory for the type with sizeof(). Then you allocate memory for an array of pointers to strings with size of count. You put the address of this dynamically allocated array into the variable string_array.

In theory, you can save one call to malloc() for dynamic array allocation and potentially the space for the char**, by allocating the dynamic array directly inside the Edje_Message_String_Set type rather than having a pointer to the dynamically allocated array.

This means that you can't use a memory allocation I/F that says give me say 5 of type X. Give a "number of a type" and a type is not sufficient here because the sizeof() the type is actually unknown at compile time.

To put some numbers on this. Rob, you have a 64 bit machine, so for count of 1, we get 16 bytes. 8 bytes for int and 8 bytes for pointer to char. If say count==3. Then we need to allocate an additional 16 bytes for 2 more char*'s. Hence, we arrive at my safemalloc() math shown above. The first 16 bytes comes from sizeof(Edje_Message_String_Set) but that is only the minimum size to cover the size of 1. Now if we had allocated say 2 * sizeof(Edje_Message_String_Set), that gives us enough space for count==3, not 2!

Replies are listed 'Best First'.
Re^3: Perl XS binding to a struct with an array of chars*
by syphilis (Archbishop) on Nov 24, 2022 at 01:10 UTC
    Normally a type has a fixed size that is known at compile time and sizeof() works just fine. That is not true in this case.

    Well, I think that the struct, as presented in the original post, does have a definite size (of 16 bytes).
    AFAIK specifying char *str[1] is equivalent to char * str.
    I just went with the spec provided, even though it looked rather odd.

    It did occur to me that the OP might have intended char ** string_array (as you've suggested), and I probably should have pressed MaxPerl about that.
    But, either way, the struct has a definite size - and we can assign memory to it based on that size (which is 16 bytes, on my Windows 11 64-bit system).

    I've no experience with structs that might require varying amounts of memory that can't be known until runtime. (I don't assume that such cases never arise.)

    I expect that the OP's strings have been created separately.
    Therefore, the number and size of them has no impact on the struct's memory allocation - because the struct just takes a pointer to the array of strings, no matter how large that array is.

      structs that might require varying amounts of memory

      There is a somewhat common use case where a 0 length or length 1 array is used at the end of a struct so that a variable amount of payload can be carried around by the struct. The technique is hardly ever needed in C++ land, there are better ways to handle the problem.

      One interesting use case in C land is to provide data hiding. Use a common header struct the provides high level management information, then a variable sized "tail" that gets cast to the struct type that represents the hidden data. That makes the payload data opaque to client code and avoids making two allocations when creating an instance of the struct. Kinda a poor man's C++ really!

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
      Yes, this is a weird duck. Quack. Quack.
      If I play further with this, I am tempted to see if char* str[]; is a valid syntax or not - having the blank dimension would certainly be nice as a "heads up". But yeah, from the OP's code, he expects the dynamic array to extend past the "starter header" as a continuation of the single element defined in the structure.

      Some OP code:

      for (index = 0; index <= count; index++) { tmp = *av_fetch(val_arr,index,0); string = SvPVutf8(tmp,len); message->str[index] = savepvn(string,len); }
      Of course since the whole purpose of XS is to write C code, something like this would be more appropriate (untested):
      int n = count; char* p = &m.str; while (n--) { ....blah... *p++ = savepvn(string,len); }
      Using an index is a mess because this often will result in code that calculates index*sizeOfelement+arrayBase to get to the address. Having a pointer eliminates this calculation. Incrementing the pointer (p++) by sizeOfelement is very fast because that is a constant which is actually part of the machine instruction (not some number that is read from data memory). This instruction is typically called something like "add immediate to register". while (n--) runs very fast because there are special op codes that increment or decrement by one (this is much faster than incrementing the pointer because the instruction is shorter). There are also special op codes that check for zero or non-zero.

      So far, I've learned some things that were new to me about XS. So this has been interesting. However, it is not at all clear that besides being a good intellectual exercise that this will result achieving a significant end goal. This is a lot of complication to make a clone of a Perl Array of Strings in a different format. For all I know, it could be "good enough" to leave the Perl data structures "as is" and write good C code to access the data for the intended but as of yet unstated purpose/application.

      To proceed any further, we'd need to know more about what this is being used for.

        I am tempted to see if char* str[]; is a valid syntax or not

        I think it's valid, but it does make things tricky ... and maybe it's that trickiness that inspired this thread in the first place.
        It gets a lot easier if we're allowed to declare the struct as:
        struct _Edje_Message_String_Set { int count; AV * str; };
        Here's a demo using that very struct declaration.
        # # use strict; use warnings; use Inline C => Config => BUILD_NOISY => 1, CLEAN_AFTER_BUILD => 0, USING => 'ParseRegExp', ; use Inline C => <<'EOC'; struct _Edje_Message_String_Set { int count; AV * str; }; typedef struct _Edje_Message_String_Set EdjeMessageStringSet; void struct_size(void) { printf("Size of _Edje_Message_String_Set struct: %d\n", sizeof(EdjeMessageStringSet) ); } EdjeMessageStringSet * _new(AV * val_arr) { EdjeMessageStringSet *message; /* int i; */ /* SV ** elem; */ Newx(message, 1, EdjeMessageStringSet); if(message == NULL) croak("Failed to allocate memory in _new function"); message->count = av_len(val_arr) + 1; message->str = val_arr; return message; } void _iterate(EdjeMessageStringSet * strs) { int i; SV ** elem; for(i = 0; i < strs->count; i++) { elem = av_fetch(strs->str, i, 0); printf("%s\n", SvPV_nolen(*elem)); } } void DESTROY(EdjeMessageStringSet * x) { Safefree(x); printf("destroyed _new EdjeMessageStringSet*\n"); } void foo(AV * arref) { EdjeMessageStringSet *m; m = _new(arref); _iterate(m); DESTROY(m); } EOC struct_size(); my @in = ("hello foo", "hello bar","hello world", "goodbye"); # The XSub foo() will create a new EdjeMessageStringSet object # using _new(), then pass that object to _iterate() which # prints out all of the strings contained in the object. # Finally, foo() calls DESTROY() which frees the memory that # was assigned to create the EdjeMessageStringSet object. foo(\@in);
        For me, that outputs:
        Size of _Edje_Message_String_Set struct: 16 hello foo hello bar hello world goodbye destroyed _new EdjeMessageStringSet*