downer has asked for the wisdom of the Perl Monks concerning the following question:

I have some C code which performs varbyte compression on ints. I have inline'd this code, and wish to manage the output and input with some perl. I know that the varbyte works in other contexts, but when i compare the length of the output to the length of the input, I guess that something is clearly wrong. I have no experience with binary data in perl, i just assumed that if i didnt touch it, it'd remain as is. here is the C which i know works properly in other contexts:
#include <math.h> #include <stdio.h> #include <stdlib.h> #include <string.h> typedef struct vbyte_record{ unsigned char* vbyte_result; short vbyte_len; } oneRecord; void displayBits(unsigned char*, int); oneRecord vbyte_compress(int); /*display bits of an unsigned char value*/ void displayBits(unsigned char* value, int len){ int shift = 7; unsigned char mask = '1' << shift; unsigned i, j; for(j=0; j<len; j++){ for(i=1; i<=8; i++){ printf("%c", value[j] & mask ? '1': '0'); value[j] <<= 1; } } return; } /* compress an integer into vbyte; first check how many bytes this integer needs; then set the highest bit of all byte to 1 except the lowest byte; encode each byte respectively */ oneRecord vbyte_compress(int number){ short index = (short) floor(log10(number)/log10(128)); //number of +bytes needed unsigned char* result; short i; int remainder = number; div_t temp; unsigned char mask = (char) 1 << 7; //used to set highest bit to 1 oneRecord record; result = (unsigned char*) malloc(sizeof(char)*(index+1)); /* if there are more than one byte; encode the higher byte */ if(index > 0){ for(i=index; i>=1; i--){ temp = div(remainder, (int)pow(128, i)); result[index-i] = (char) temp.quot | mask; remainder = temp.rem; } } /*encode the lowest byte*/ result[index] = (char) remainder; record.vbyte_result = result; record.vbyte_len = index+1; return record; } char* varbyte(int number) { oneRecord record; int decom_num; record = vbyte_compress(number); return record.vbyte_result; }
this line in my perl gives surprising results:
$compressed.= varbyte($x); print "$x: ",length($x)," compressed ",length($compressed), "\n";

Replies are listed 'Best First'.
Re: keeping binary data raw (char*)
by tye (Sage) on Oct 26, 2007 at 15:15 UTC

    I was sad to see that Inline::C's documentation doesn't even really hint at what happens if your C function returns a value of type "char*".

    If you find your ExtUtils/typemap file you'll find the following lines (not together):

    char * T_PV INPUT T_PV $var = ($type)SvPV_nolen($arg) OUTPUT T_PV sv_setpv((SV*)$arg, $var);

    so returning a "char*" calls sv_setpv(). If you read perlguts you'll see a disappointingly vague hint that sv_setpv() sets the length of the scalar's string value based on strlen(str), which isn't appropriate for your "binary" string.

    So you instead want to return a SV* type of value and (again, it is sad here that Inline::C doesn't even show how to do this) use something more like return svNEWpvn( str, len );.

    So now you also have several places that could use documentation patches. /:

    - tye        

      I was sad to see that Inline::C's documentation doesn't even really hint at what happens if your C function returns a value of type "char*".

      In defense of Inline::C:
      1) I wonder if Inline::C is under any obligation to document this;
      2) There are a number of examples in perldoc Inline::C-Cookbook that *do* deal with newSVpv (usually as a newSVpvf)

      Cheers,
      Rob

        I don't get where you think the term "obligation" applies. Based on "obligation" I think you could delete most if not all of the Inline::C documentation.

        Rather than expecting the vague hints in Inline::C to lead to (a huge jump) ExtUtils/typemap then to perlguts then perlapi then "man strlen" to finally have the (C) '\0' character mentioned, I think it would be more than prudent for Inline::C to document this important and not obvious restriction and skip "typemap", "sv_setpv()", and "strlen()" and just say that a return value of type "char*" only works for strings terminated by "\0" (and not containing any non-terminal "\0" characters).

        I also think an example of SV* and return newSVpvn( str, len ); deserves to be in the base manual, not burried in the "cookbook" (or the documentation of the restrictions on "char*" should point directly to an appropriate "cookbook" example). It would certainly be better than the example that uses SvPVX(), a macro that should mostly never be used.

        But, of course, I don't find the author obligated to do much of anything.

        - tye        

        newSVpvf works perfectly. i wish i was good enough at C or C++ where i didnt need to make wrappers in perl for stuff like this. perl's I/O is so much more intuitive that either of these two.
Re: keeping binary data raw
by ikegami (Patriarch) on Oct 26, 2007 at 15:14 UTC

    "Surprising results" is a rather useless diagnostic, especially when the program can't easily be run.

    .= means "append to". Did you check what's initially in $compressed?

    Update: Oh! Is the return value of your function NUL-terminated? char* is treated as a pointer to a NUL-terminated string. I think you need to return an SV.

    Update: Doh! tye posted a more detailed version of what I said in my update while I was writting it. Please read his post.

Re: keeping binary data raw
by syphilis (Archbishop) on Oct 26, 2007 at 15:49 UTC
    Hi downer,

    Damn intriguing question (imho).

    Fwiw, I think you'll find that if you change the Inline::C incantation of varbyte() to:
    SV * varbyte(int number) { oneRecord record; //int decom_num; record = vbyte_compress(number); return newSVpv(record.vbyte_result, record.vbyte_len); }
    then you'll find it does what you want.

    Furthermore, I suspect that therein lies a reason that the vbyte_record structure contains the vbyte_len element.

    Cheers,
    Rob
      this seems to create a result much closer to what i would expect (where the compressed length is < the input length). however, i can't claim i understand the whole newSVpv() thing. I dont need the vbyte_len returned, this is simply an internal value i used for printing out a representation of the compressed numbers. however, when i just try to do a newSVpv with one value, it doesnt compile.

        vbyte_result points to an array of characters. You can't do anything with an array unless you know its size. How is Perl suppose to know how big the array is? For the same reason you need to know the length of the array to print it, you need to know the length of the array to create the Perl string the contains it. That's why you need to pass both the data and the length of the data to newSVpv.

Re: keeping binary data raw
by polettix (Vicar) on Oct 26, 2007 at 15:18 UTC
    You should try to post a self-contained minimal example that shows the problem and that can be easily downloaded for other monks to try. This is likely to boost the support level you're going to receive on this issue.

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Io ho capito... ma tu che hai detto?