http://qs1969.pair.com?node_id=11148347


in reply to Re^3: Perl XS binding to a struct with an array of chars*
in thread Perl XS binding to a struct with an array of chars*

Yes, this is a weird duck. Quack. Quack.
If I play further with this, I am tempted to see if char* str[]; is a valid syntax or not - having the blank dimension would certainly be nice as a "heads up". But yeah, from the OP's code, he expects the dynamic array to extend past the "starter header" as a continuation of the single element defined in the structure.

Some OP code:

for (index = 0; index <= count; index++) { tmp = *av_fetch(val_arr,index,0); string = SvPVutf8(tmp,len); message->str[index] = savepvn(string,len); }
Of course since the whole purpose of XS is to write C code, something like this would be more appropriate (untested):
int n = count; char* p = &m.str; while (n--) { ....blah... *p++ = savepvn(string,len); }
Using an index is a mess because this often will result in code that calculates index*sizeOfelement+arrayBase to get to the address. Having a pointer eliminates this calculation. Incrementing the pointer (p++) by sizeOfelement is very fast because that is a constant which is actually part of the machine instruction (not some number that is read from data memory). This instruction is typically called something like "add immediate to register". while (n--) runs very fast because there are special op codes that increment or decrement by one (this is much faster than incrementing the pointer because the instruction is shorter). There are also special op codes that check for zero or non-zero.

So far, I've learned some things that were new to me about XS. So this has been interesting. However, it is not at all clear that besides being a good intellectual exercise that this will result achieving a significant end goal. This is a lot of complication to make a clone of a Perl Array of Strings in a different format. For all I know, it could be "good enough" to leave the Perl data structures "as is" and write good C code to access the data for the intended but as of yet unstated purpose/application.

To proceed any further, we'd need to know more about what this is being used for.
Cheers,
Marshall

Replies are listed 'Best First'.
Re^5: Perl XS binding to a struct with an array of chars*
by syphilis (Archbishop) on Nov 24, 2022 at 11:48 UTC
    I am tempted to see if char* str[]; is a valid syntax or not

    I think it's valid, but it does make things tricky ... and maybe it's that trickiness that inspired this thread in the first place.
    It gets a lot easier if we're allowed to declare the struct as:
    struct _Edje_Message_String_Set { int count; AV * str; };
    Here's a demo using that very struct declaration.
    # struct.pl # use strict; use warnings; use Inline C => Config => BUILD_NOISY => 1, CLEAN_AFTER_BUILD => 0, USING => 'ParseRegExp', ; use Inline C => <<'EOC'; struct _Edje_Message_String_Set { int count; AV * str; }; typedef struct _Edje_Message_String_Set EdjeMessageStringSet; void struct_size(void) { printf("Size of _Edje_Message_String_Set struct: %d\n", sizeof(EdjeMessageStringSet) ); } EdjeMessageStringSet * _new(AV * val_arr) { EdjeMessageStringSet *message; /* int i; */ /* SV ** elem; */ Newx(message, 1, EdjeMessageStringSet); if(message == NULL) croak("Failed to allocate memory in _new function"); message->count = av_len(val_arr) + 1; message->str = val_arr; return message; } void _iterate(EdjeMessageStringSet * strs) { int i; SV ** elem; for(i = 0; i < strs->count; i++) { elem = av_fetch(strs->str, i, 0); printf("%s\n", SvPV_nolen(*elem)); } } void DESTROY(EdjeMessageStringSet * x) { Safefree(x); printf("destroyed _new EdjeMessageStringSet*\n"); } void foo(AV * arref) { EdjeMessageStringSet *m; m = _new(arref); _iterate(m); DESTROY(m); } EOC struct_size(); my @in = ("hello foo", "hello bar","hello world", "goodbye"); # The XSub foo() will create a new EdjeMessageStringSet object # using _new(), then pass that object to _iterate() which # prints out all of the strings contained in the object. # Finally, foo() calls DESTROY() which frees the memory that # was assigned to create the EdjeMessageStringSet object. foo(\@in);
    For me, that outputs:
    Size of _Edje_Message_String_Set struct: 16 hello foo hello bar hello world goodbye destroyed _new EdjeMessageStringSet*
    Cheers,
    Rob
      Once I start, I can't stop ;-)
      I still don't know how to deal with:
      struct _Edje_Message_String_Set { int count; char * str[]; };
      Is it possible to handle that ?

      However, for:
      struct _Edje_Message_String_Set { int count; char ** str; };
      I have:
      # struct_char.pl # use strict; use warnings; use Inline C => Config => BUILD_NOISY => 1, CLEAN_AFTER_BUILD => 0, USING => 'ParseRegExp', ; use Inline C => <<'EOC'; struct _Edje_Message_String_Set { int count; char ** str; }; typedef struct _Edje_Message_String_Set EdjeMessageStringSet; void struct_size(void) { printf("Size of _Edje_Message_String_Set struct: %d\n", sizeof(EdjeMessageStringSet) ); } EdjeMessageStringSet * _new(AV * val_arr) { EdjeMessageStringSet *message; int i; SV ** elem; Newx(message, 1, EdjeMessageStringSet); if(message == NULL) croak("Failed to allocate memory in _new function"); message->count = av_len(val_arr) + 1; Newx(message->str, message->count, char*); for(i = 0; i < message->count; i++) { elem = av_fetch(val_arr, i, 0); message->str[i] = SvPV_nolen(*elem); } return message; } void _iterate(EdjeMessageStringSet * strs) { int i; for(i = 0; i < strs->count; i++) { printf("%s\n", strs->str[i]); } } void DESTROY(EdjeMessageStringSet * x) { Safefree(x->str); printf("Safefreed EdjeMessageStringSet object->str\n"); Safefree(x); printf("Safefreed EdjeMessageStringSet object\n"); } void foo(AV * arref) { EdjeMessageStringSet *m; m = _new(arref); _iterate(m); DESTROY(m); } EOC struct_size(); my @in = ("hello foo", "hello bar","hello world", "goodbye", '1', '2', + '3'); # The XSub foo() will create a new EdjeMessageStringSet object # using _new(), then pass that object to _iterate() which # prints out all of the strings contained in the object. # Finally, foo() calls DESTROY() which frees the memory that # was assigned to create the EdjeMessageStringSet object. foo(\@in);
      Which (after compiling) outputs:
      Size of _Edje_Message_String_Set struct: 16 hello foo hello bar hello world goodbye 1 2 3 Safefreed EdjeMessageStringSet object->str Safefreed EdjeMessageStringSet object
      Cheers,
      Rob
        struct _Edje_Message_String_Set { int count; char * str[]; };
        Is it possible to handle that ?

        Yes, it sure is!

        I actually decided that this is the best way to implement the idea of the array of pointer within the struct itself. My reasoning is that the [] gives a big clue that there is no storage in the struct at all for the array of pointers to strings.

        In my implementation below, I allocated one more slot for a NULL pointer. That is so that more traditional C pointer iteration style can be used as an alternative to using some counter based upon the "count".

        Also not that the Perl print statements come before the C print statements! That of course has to be due some buffering weirdness, but I didn't figure out how to defeat that behaviour. Also for some reason, the API function, av_count() wasn't available on my version so a simple workaround calculation was used.

        #MessageStorageInsideStruct 11/25/2022 # #https://www.perlmonks.org/?node_id=11148268 # # # use strict; use warnings; use Inline C => Config => BUILD_NOISY => 1, CLEAN_AFTER_BUILD => 1, USING => 'ParseRegExp', ; use Inline "C"; struct_size(); my @in = ("hello foo", "hello bar","hello world", "goodbye", '1', '2', + '3'); foo(\@in); print "back inside Perl program!!!!\n"; print "printout is not in time order you expected!\n\n"; =EXAMPLE RUN back inside Perl program!!!! printout is not in time order you expected! Size of _Edje_Message_String_Set structure: 8 bytes Address of the Message Struct: 000000000308FAB8 Address of the Pointer Array: 000000000308FAC0 setting element 0 string: hello foo's pointer is at Address 000000000308FAC0 setting element 1 string: hello bar's pointer is at Address 000000000308FAC8 setting element 2 string: hello world's pointer is at Address 000000000308FAD0 setting element 3 string: goodbye's pointer is at Address 000000000308FAD8 setting element 4 string: 1's pointer is at Address 000000000308FAE0 setting element 5 string: 2's pointer is at Address 000000000308FAE8 setting element 6 string: 3's pointer is at Address 000000000308FAF0 Dumping an EdjeMessageStringSet Count = 7 1) hello foo 2) hello bar 3) hello world 4) goodbye 5) 1 6) 2 7) 3 Destroying an EdjeMessageStringSet =cut __END__ __C__ /*********** Start of C Code ********/ struct _Edje_Message_String_Set { int count; //On 64 bit machine, this is 8 bytes char *str[]; //str has no "size" and is not counted in sizeof(_Edj +e_Message_String_Set) }; typedef struct _Edje_Message_String_Set EdjeMessageStringSet; void struct_size(void) { printf("Size of _Edje_Message_String_Set structure: %d bytes\n", sizeof(EdjeMessageStringSet) ); } /************/ EdjeMessageStringSet * _new(AV* val_arr) { int count = av_len(val_arr) + 1; // newer av_count() not avail thi +s version // sizeof(Edje_Message_String_Set) only has the integer,count of 8 b +ytes // space for the array of pointers must be allocated by safemalloc // remember to add one more slot for a NULL pointer EdjeMessageStringSet* message = (EdjeMessageStringSet*) safemalloc ( sizeof(EdjeMessageStringSet) + ( +count+1)*sizeof(char*) ); printf ("Address of the Message Struct: %p\n",message); + if(message == NULL) croak("Failed to allocate memory for message in _new function") +; message->count = count; char** p = &(message->str[0]); printf ("Address of the Pointer Array: %p\n", p); int i; for(i= 0; i < message->count; i++) { printf ("setting element %d\n",i); SV** elem = av_fetch(val_arr, i, 0); if (elem==NULL) croak ("bad SV elem value in _new function"); char* string = SvPVutf8_nolen(*elem); printf ("string: %s's pointer is at Address %p\n",string,p); *p++ = savepv(string); } *p = NULL; //Can use either count or NULL pointer as a loop variab +le return message; } /******************/ void _iterate (EdjeMessageStringSet* m) { printf ("Dumping an EdjeMessageStringSet\n"); printf ("Count = %d\n",m->count); int i = 1; char** p = &(m->str[0]); while (*p) {printf ("%d) %s\n",i++,*p++);} } void DESTROY(EdjeMessageStringSet* m) { printf ("Destroying an EdjeMessageStringSet\n"); char** p = &(m->str[0]); while(*p){Safefree(*p++);} //zap eaxh cloned string Safefree(m); //zap main structure } /************/ void foo(AV * arref) { EdjeMessageStringSet* m = _new(arref); _iterate(m); DESTROY(m); }

        I thought I'd have a crack at demoing how char *[] might hang together. So first I downloaded your code, then realised I didn't have Inline or Inline::C installed, so I set about doing that using cpanm. The Inline install was quick and trouble free. The Inline::C install however has reached:

        ... Building and testing Tie-IxHash-1.23 ... OK Successfully installed Tie-IxHash-1.23 Building and testing Pegex-0.75 ... OK Successfully installed Pegex-0.75 --> Working on Win32::Mutex Fetching http://www.cpan.org/authors/id/C/CJ/CJM/Win32-IPC-1.11.tar.gz + ... OK Configuring Win32-IPC-1.11 ... OK Building and testing Win32-IPC-1.11 ... OK Successfully installed Win32-IPC-1.11 Building and testing Inline-C-0.82 ...

        and has been sitting there for the last half hour with no apparent progress. This is on Windows 10 and Strawberry Perl 5.32.1:

        Summary of my perl5 (revision 5 version 32 subversion 1) configuration +: Platform: osname=MSWin32 osvers=10.0.19042.746 archname=MSWin32-x86-multi-thread-64int uname='Win32 strawberry-perl 5.32.1.1 #1 Sun Jan 24 12:17:47 2021 +i386'

        Ideas?

        Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
        I have Inline::C working on my system now. Before proceeding further, I decided to make a little test to see what malloc() is doing. Often more memory is allocated than requested and I wanted to see if I could get an idea of "how much more?". Sometimes there is a lucky accident where the code works although it is not guaranteed to work.

        I have 2 programs.
        1) 32 bit gcc independent of Perl
        2) 64 bit gcc that is part of my 64 bit Perl 5.24 installation

        Code for both is shown below.
        For 32 bit version, I am not surprised to see 64 bit instead of 32 bit alignment (lower 3 address bits always zero). 64 bit alignment also appears to be the case for the 64 bit gcc code as well. Anyway the alignment drives the absolute minimum memory allocation unit. In both cases, 8 bytes (64 bit alignment). The question is then whether there is additional space due to the quanta that malloc() uses to track allocation?

        The 32 bit standalone version does the exact same thing every run. I don't know why there is the difference between 2nd and 1st allocation, but after that we see the pattern of 16 bytes given for a single byte requested.

        The 64 bit inline C does something different every run! For all I know this could be some kind of Perl security feature?

        Anyway, I thought the results relevant to our discussion about allocation of this weird type.

        #include <stdio.h> #include <stdlib.h> //testing malloc() this is 32 bit gcc - separate from Perl void testMalloc(void) { char* x = (char *) malloc(1); printf (" Byte Starting Memory Addr is %p\n",x); char* y = (char *) malloc(1); printf ("Next Byte Starting Memory Addr is %p\n",y); printf ("difference in bytes between byte2 and byte1 = %d\n",y-x); char* z = (char *) malloc(1); printf ("Next Byte Starting Memory Addr is %p\n",z); printf ("difference in bytes between byte3 and byte2 = %d\n",z-y); char* alpha = (char *) malloc(1); printf ("Next Byte Starting Memory Addr is %p\n",alpha); printf ("difference in bytes between byte4 and byte3 = %d\n",z-y); free(x); free(y); free(z); free(alpha); return; } int main(int argc, char *argv[]) { testMalloc(); exit(0); } /* Byte Starting Memory Addr is 00C92FD8 Next Byte Starting Memory Addr is 00C90CC8 difference in bytes between byte2 and byte1 = -8976 Next Byte Starting Memory Addr is 00C90CD8 difference in bytes between byte3 and byte2 = 16 Next Byte Starting Memory Addr is 00C90CE8 difference in bytes between byte4 and byte3 = 16 */ /* Byte Starting Memory Addr is 00B22FD8 Next Byte Starting Memory Addr is 00B20CC8 difference in bytes between byte2 and byte1 = -8976 Next Byte Starting Memory Addr is 00B20CD8 difference in bytes between byte3 and byte2 = 16 Next Byte Starting Memory Addr is 00B20CE8 difference in bytes between byte4 and byte3 = 16 */
        ################################################# # testing malloc 64 bit Perl use strict; use warnings; use Inline C => Config => BUILD_NOISY => 1, CLEAN_AFTER_BUILD => 0, USING => 'ParseRegExp', ; use Inline "C"; testMalloc(); =OUTPUT: Byte Starting Memory Addr is 000000000308CE68 Next Byte Starting Memory Addr is 000000000308D258 difference in bytes between byte2 and byte1 = 1008 Next Byte Starting Memory Addr is 000000000308D018 difference in bytes between byte3 and byte2 = -576 Next Byte Starting Memory Addr is 000000000308D0A8 difference in bytes between byte4 and byte3 = -576 =cut =AnotherRun: Byte Starting Memory Addr is 00000000031157D8 Next Byte Starting Memory Addr is 0000000003115B98 difference in bytes between byte2 and byte1 = 960 Next Byte Starting Memory Addr is 0000000003115E98 difference in bytes between byte3 and byte2 = 768 Next Byte Starting Memory Addr is 0000000003115F88 difference in bytes between byte4 and byte3 = 768 =cut =yet Another Run Byte Starting Memory Addr is 0000000002EA6CF8 Next Byte Starting Memory Addr is 0000000002EA6EA8 difference in bytes between byte2 and byte1 = 432 Next Byte Starting Memory Addr is 0000000002EA7AA8 difference in bytes between byte3 and byte2 = 3072 Next Byte Starting Memory Addr is 0000000002EA7C58 difference in bytes between byte4 and byte3 = 3072 =cut =one more time Byte Starting Memory Addr is 000000000313CE38 Next Byte Starting Memory Addr is 000000000313C5F8 difference in bytes between byte2 and byte1 = -2112 Next Byte Starting Memory Addr is 000000000313C868 difference in bytes between byte3 and byte2 = 624 Next Byte Starting Memory Addr is 000000000313CA48 difference in bytes between byte4 and byte3 = 624 =cut __END__ __C__ void testMalloc(void) { char* x = (char *) malloc(1); printf (" Byte Starting Memory Addr is %p\n",x); char* y = (char *) malloc(1); printf ("Next Byte Starting Memory Addr is %p\n",y); printf ("difference in bytes between byte2 and byte1 = %d\n",y-x); char* z = (char *) malloc(1); printf ("Next Byte Starting Memory Addr is %p\n",z); printf ("difference in bytes between byte3 and byte2 = %d\n",z-y); char* alpha = (char *) malloc(1); printf ("Next Byte Starting Memory Addr is %p\n",alpha); printf ("difference in bytes between byte4 and byte3 = %d\n",z-y); free(x); free(y); free(z); free(alpha); return; }