http://qs1969.pair.com?node_id=11148270


in reply to Perl XS binding to a struct with an array of chars*

Hi MaxPerl,

It looks to me that:
typedef Edje_Message_String_Set EdjeMessageStringSet; should instead be: typedef struct _Edje_Message_String_Set EdjeMessageStringSet;
Also the following looks wrong to me:
message = malloc(sizeof(Edje_Message_String) + count * sizeof(char *)) +;
The variable "message" is a pointer to a EdjeMessageStringSet type, and memory should be assigned to it as:
message = malloc(sizeof(EdjeMessageStringSet)); or New(0, message, 1, EdjeMessageStringSet); or (in modern perl) Newx(message, 1, EdjeMessageStringSet);
I recommend working your way through the various issues using Inline::C.
It's not a quick fix but, as you become familiar with it, it enables you to test out various options.
It also enables you to post a demo script of problems you are facing. Others can then run that demo to reproduce (and hopefully assist with) the issue.

Here's a little Inline::C demonstrating some basics:
# struct.pl # use strict; use warnings; use Inline C => Config => BUILD_NOISY => 1, CLEAN_AFTER_BUILD => 0, USING => 'ParseRegExp', ; use Inline C => <<'EOC'; struct _Edje_Message_String_Set { int count; char *str[1]; }; typedef struct _Edje_Message_String_Set EdjeMessageStringSet; void struct_size(void) { printf("Size of _Edje_Message_String_Set struct: %d\n", sizeof(EdjeMessageStringSet) ); } EdjeMessageStringSet * _new(char * class, int count, AV * val_arr) { /* Can't be accessed directly from perl unless a typemap is provided */ EdjeMessageStringSet *message; int index; char *string; STRLEN len; Newx(message, 1, EdjeMessageStringSet); if(message == NULL) croak("Failed to allocate memory in _new function"); /* do other stuff ... */ printf("returning EdjeMessageStringSet* from _new\n"); return message; } void DESTROY(EdjeMessageStringSet * x) { /* Can't be accessed directly from perl unless a typemap is provided. Must currently be explicitly called as the EdjeMessageStringSet* object is currently "unblessed". */ Safefree(x); printf("destroyed _new EdjeMessageStringSet*\n"); } void foo(char * pv, int in, AV * arref) { EdjeMessageStringSet *m; m = _new(pv, in, arref); DESTROY(m); } EOC struct_size(); foo("hello world", 2, [1, 2]);
After the script has been compiled, it outputs (for me):
Size of _Edje_Message_String_Set struct: 16 returning EdjeMessageStringSet* from _new destroyed _new EdjeMessageStringSet*
If you look in the ./_Inline/build/ directory you'll find a folder that contains (amongst other things) the XS file that Inline::C automatically generated and used.

Cheers,
Rob

Replies are listed 'Best First'.
Re^2: Perl XS binding to a struct with an array of chars*
by MaxPerl (Acolyte) on Nov 20, 2022 at 16:44 UTC

    Dear Rob,

    Thank you so much for your hint with Inline:C. I will try this at the next opportunity

    I don't know why, but I solved my problem with the following:

    _new(class,count, val_arr) char *class int count AV *val_arr PREINIT: EdjeMessageStringSet *message; int index; SV *tmp; char *string; STRLEN len; CODE: Newx(message,1,EdjeMessageStringSet); Renewc(message,count+2, char*,EdjeMessageStringSet); if (message == NULL) croak("Failed to allocate memory in _new function\n"); message->count = count+1; for (index = 0; index <= count; index++) { tmp = *av_fetch(val_arr,index,0); string = SvPVutf8(tmp,len); message->str[index] = savepvn(string,len); } RETVAL = message; OUTPUT: RETVAL

    I have to allocate memory for the Edje_Message_String_Set struct and for the array of strings, because it is only at runtime visible, how many strings are in the array. In C this goes through malloc(sizeof(Edje_Message_String_Set) + count * sizeof(char*)). In Perl this works with the Renewc function. The only thing I don't understand is, why I had to reallocate count+1 items of char*. It should be count+1 :-S (because count starts at 1, and the index of the given Perl array starts at 0). (in a C example there is even malloced count-1 * sizeof(char*)...). But whatever, I am happy that it works now...

      I also don't understand this issue with count. If you have something like this:
      typedef struct { int count; char *str[]; } Edje_Message_String_Set;
      You would call malloc for needed memory thusly:
      malloc( sizeof(Edje_Message_String_Set) + (count-1)*sizeof(char *) );
      The sizeof(Edje_Message_String_Set) includes enough space for one integer and one pointer to char. So that is all you need for count==1. If you need an array of 2 pointers to char, then you have to allocate space for one more char*. One pointer to char is included in the smallest struct that you are able to allocate space for. You put a dimension of [1] on the array of pointers. I am not sure that you need that and a blank dimension (no number) may work? This has nothing to do with whether program indices start at zero or one - this just about how much memory do you need for X number of strings?.

      I don't see who manages the destruction of one of these things? Also who manages the memory for the strings themselves? I guess you are doing a shallow copy instead of a clone. Also, the index "for" loop looks pretty weird to me because it looks like it exceeds allocated memory bounds.

      I am curious - what sort of problem are you trying to solve with your XS code?

      Update:
      This memory allocation code looks completely wrong to me:
      I haven't used these Perl memory allocation functions, but from looking at the doc's...

      Newx(message,1,EdjeMessageStringSet); Renewc(message,count+2, char*,EdjeMessageStringSet); if (message == NULL) croak("Failed to allocate memory in _new function\n");
      Ok, with comments:
      void Newx(void* ptr, int nitems, type) void* safemalloc(size_t size) void* Renewc( void *ptr, int size, type, cast ) void Safefree(void* ptr) Newx(message,1,EdjeMessageStringSet); // allocate space for 1 EdjeMessageStringSet // this is enough for just 1 pointer to char // i.e. ok if count==1 // memory is not initialized. Renewc(message,count+2, char*, EdjeMessageStringSet); // Leak some memory from the heap. // Allocate enough space for count+2 char* // copy contents of memory from previous Newx() operation // to this newly allocated memory // Then throw pointer to this new memory away // For extra confusion, also calls "free" on the original pointer!! if (message == NULL) croak("Failed to allocate memory in _new function\n"); // Of course should have checked this after the Newx().
      Ok, so after this, message is a pointer to memory that has already been freed. Some subsequent malloc() could see this memory reassigned to that request and then you are in real trouble! What is saving the day here is that right after this newly unallocated memory block, there is an allocated memory block. Some stuff got copied into this block, but the address of this block got thrown away. So for at least a short time, you can use more memory at the address of "message".

      One issue here is that we are "cheating" by declaring a type whose size can and does actually change! You can't allocate memory for this thing using a method that expects to allocate X number of Y things. So, something like this is needed:

      EdjeMessageStringSet* m = (EdjeMessageStringSet*) safemalloc( sizeof(E +dje_Message_String_Set) + (count-1)*sizeof(char *) ); if (m == NULL) croak("Failed to allocate memory in _new function\n");
        "Renewc" The XSUB-writer's interface to the C "realloc" function, with cast. Memory obtained by this should ONLY be freed with "Safefree". void Renewc(void* ptr, int nitems, type, cast)

        So actually not how you commented it. The Renewc function modifies the pointer rather than returning anything.

        Basically the OP's new version of the code works because sizeof(int) < 2*sizeof(char*)

        The renewc is unneeded though, it could just be

        message= (EdjeMessageStringSet*) safemalloc( sizeof(EdjeMessageStringSet) + (count-1)*sizeof(char*) )

        For this code, the Newx type casting & sizing behavior is simply the wrong API to try to use.

      sorry, some typos: "The only thing I don't understand is, why I had to reallocate count+2 items of char*. It should be count-1 :-S (because count starts at 1, and the index of the given Perl array starts at 0)..."

        MaxPerl, I haven't managed to get my head around the exact requirements of the XS code, so I can't comment much more.
        I noticed in your original post that you had a DESTROY function, and I just wanted to add that I couldn't see any evidence that it will be called automagically.
        You may find that you have to explicitly call DESTROY (or Safefree) in your code, in order to avoid memory leaks.

        Cheers,
        Rob
Re^2: Perl XS binding to a struct with an array of chars*
by Marshall (Canon) on Nov 23, 2022 at 19:48 UTC
    Hi Rob!

    I think both the OP and you are not quite right about allocating memory for EdjeMessageStringSet. In my post below, see:

    EdjeMessageStringSet* m = (EdjeMessageStringSet*) safemalloc( sizeof(E +dje_Message_String_Set) + (count-1)*sizeof(char *) ); if (m == NULL) croak("Failed to allocate memory in _new function\n");
    First, we have to talk about what this means:
    typedef struct { int count; char *str[1]; // may be fine with [] or even just char* str (no su +bscript at all) } Edje_Message_String_Set;
    Normally a type has a fixed size that is known at compile time and sizeof() works just fine. That is not true in this case. Normally, I would have expected to use a fixed size Edje_Message_String_Set, like this:
    typedef struct { int count; char** string_array; } Edje_Message_String_Set;
    Now Edje_Message_String_Set is a fixed size. To instance one, you allocate memory for the type with sizeof(). Then you allocate memory for an array of pointers to strings with size of count. You put the address of this dynamically allocated array into the variable string_array.

    In theory, you can save one call to malloc() for dynamic array allocation and potentially the space for the char**, by allocating the dynamic array directly inside the Edje_Message_String_Set type rather than having a pointer to the dynamically allocated array.

    This means that you can't use a memory allocation I/F that says give me say 5 of type X. Give a "number of a type" and a type is not sufficient here because the sizeof() the type is actually unknown at compile time.

    To put some numbers on this. Rob, you have a 64 bit machine, so for count of 1, we get 16 bytes. 8 bytes for int and 8 bytes for pointer to char. If say count==3. Then we need to allocate an additional 16 bytes for 2 more char*'s. Hence, we arrive at my safemalloc() math shown above. The first 16 bytes comes from sizeof(Edje_Message_String_Set) but that is only the minimum size to cover the size of 1. Now if we had allocated say 2 * sizeof(Edje_Message_String_Set), that gives us enough space for count==3, not 2!

      Normally a type has a fixed size that is known at compile time and sizeof() works just fine. That is not true in this case.

      Well, I think that the struct, as presented in the original post, does have a definite size (of 16 bytes).
      AFAIK specifying char *str[1] is equivalent to char * str.
      I just went with the spec provided, even though it looked rather odd.

      It did occur to me that the OP might have intended char ** string_array (as you've suggested), and I probably should have pressed MaxPerl about that.
      But, either way, the struct has a definite size - and we can assign memory to it based on that size (which is 16 bytes, on my Windows 11 64-bit system).

      I've no experience with structs that might require varying amounts of memory that can't be known until runtime. (I don't assume that such cases never arise.)

      I expect that the OP's strings have been created separately.
      Therefore, the number and size of them has no impact on the struct's memory allocation - because the struct just takes a pointer to the array of strings, no matter how large that array is.

      Cheers,
      Rob
        structs that might require varying amounts of memory

        There is a somewhat common use case where a 0 length or length 1 array is used at the end of a struct so that a variable amount of payload can be carried around by the struct. The technique is hardly ever needed in C++ land, there are better ways to handle the problem.

        One interesting use case in C land is to provide data hiding. Use a common header struct the provides high level management information, then a variable sized "tail" that gets cast to the struct type that represents the hidden data. That makes the payload data opaque to client code and avoids making two allocations when creating an instance of the struct. Kinda a poor man's C++ really!

        Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
        Yes, this is a weird duck. Quack. Quack.
        If I play further with this, I am tempted to see if char* str[]; is a valid syntax or not - having the blank dimension would certainly be nice as a "heads up". But yeah, from the OP's code, he expects the dynamic array to extend past the "starter header" as a continuation of the single element defined in the structure.

        Some OP code:

        for (index = 0; index <= count; index++) { tmp = *av_fetch(val_arr,index,0); string = SvPVutf8(tmp,len); message->str[index] = savepvn(string,len); }
        Of course since the whole purpose of XS is to write C code, something like this would be more appropriate (untested):
        int n = count; char* p = &m.str; while (n--) { ....blah... *p++ = savepvn(string,len); }
        Using an index is a mess because this often will result in code that calculates index*sizeOfelement+arrayBase to get to the address. Having a pointer eliminates this calculation. Incrementing the pointer (p++) by sizeOfelement is very fast because that is a constant which is actually part of the machine instruction (not some number that is read from data memory). This instruction is typically called something like "add immediate to register". while (n--) runs very fast because there are special op codes that increment or decrement by one (this is much faster than incrementing the pointer because the instruction is shorter). There are also special op codes that check for zero or non-zero.

        So far, I've learned some things that were new to me about XS. So this has been interesting. However, it is not at all clear that besides being a good intellectual exercise that this will result achieving a significant end goal. This is a lot of complication to make a clone of a Perl Array of Strings in a different format. For all I know, it could be "good enough" to leave the Perl data structures "as is" and write good C code to access the data for the intended but as of yet unstated purpose/application.

        To proceed any further, we'd need to know more about what this is being used for.
        Cheers,
        Marshall